Multimodal sentiment analysis: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
m Alter: journal. Add: chapter, pages, isbn. You can use this bot yourself. Report bugs here.
not the journal
Line 11:
=== Audio Features ===
 
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16}}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]]{{disambiguation needed|date=June 2018}} histogram, beat sum, strongest beat, pause duration, and [[pitch accent|pitch]].<ref name="s1" /> [[OpenSMILE]]<ref>{{cite journal |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journal=Ieeexplore.ieee.org |pages=1 |date=2009 |doi=10.1109/ACII.2009.5349350 |url=http://ieeexplore.ieee.org/document/5349350|isbn=978-1-4244-4800-5 }}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal |last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |url=https://dl.acm.org/citation.cfm?id=2070509 |publisher=ACM|chapter=Towards multimodal sentiment analysis |isbn=9781450306416 }}</ref>
 
=== Visual Features ===
 
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |pages=873 |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisher=Association for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1" /> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2" /> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journal=Ieeexplore.ieee.org |url=https://ieeexplore.ieee.org/document/7477553/}}</ref>
 
== Fusion Techniques ==
Line 31:
=== Hybrid Fusion ===
 
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4" /> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=Ieeexplore.ieee.org |date=2017 |url=https://ieeexplore.ieee.org/abstract/document/7983051/}}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Peng |first2=Haiyun |last3=Hussain |first3=Amir |last4=Howard |first4=Newton |last5=Cambria |first5=Erik |title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis |journal=Neurocomputing |date=October 2017 |volume=261 |pages=217–230 |doi=10.1016/j.neucom.2016.09.117}}</ref>
 
== Applications ==