Content deleted Content added
Citation bot (talk | contribs) m Alter: journal. Add: chapter, pages, isbn. You can use this bot yourself. Report bugs here. |
not the journal |
||
Line 11:
=== Audio Features ===
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16}}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]]{{disambiguation needed|date=June 2018}} histogram, beat sum, strongest beat, pause duration, and [[pitch accent|pitch]].<ref name="s1" /> [[OpenSMILE]]<ref>{{cite journal |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journal=
=== Visual Features ===
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |pages=873 |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisher=Association for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1" /> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2" /> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journal=
== Fusion Techniques ==
Line 31:
=== Hybrid Fusion ===
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4" /> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=
== Applications ==
|