Revision as of 23:27, 20 June 2018 edit Citation bot (talk \| contribs) Bots 5,868,579 edits m Alter: journal. Add: chapter, pages, isbn. You can use this bot yourself. Report bugs here. ← Previous edit		Revision as of 23:28, 20 June 2018 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,426 edits not the journal Next edit →
Line 11: === Audio Features === [[Feeling\|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal \|last1=Chung-Hsien Wu \|last2=Wei-Bin Liang \|title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels \|journal=IEEE Transactions on Affective Computing \|date=January 2011 \|volume=2 \|issue=1 \|pages=10–21 \|doi=10.1109/T-AFFC.2010.16}}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum\| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]]{{disambiguation needed\|date=June 2018}} histogram, beat sum, strongest beat, pause duration, and [[pitch accent\|pitch]].<ref name="s1" /> [[OpenSMILE]]<ref>{{cite journal \|last1=Eyben \|first1=Florian \|last2=Wöllmer \|first2=Martin \|last3=Schuller \|first3=Björn \|title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication \|journal=~~Ieeexplore.ieee.org~~ \|pages=1 \|date=2009 \|doi=10.1109/ACII.2009.5349350 \|url=http://ieeexplore.ieee.org/document/5349350\|isbn=978-1-4244-4800-5 }}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal \|last1=Morency \|first1=Louis-Philippe \|last2=Mihalcea \|first2=Rada \|last3=Doshi \|first3=Payal \|title=Towards multimodal sentiment analysis: harvesting opinions from the web \|date=14 November 2011 \|pages=169–176 \|doi=10.1145/2070481.2070509 \|url=https://dl.acm.org/citation.cfm?id=2070509 \|publisher=ACM\|chapter=Towards multimodal sentiment analysis \|isbn=9781450306416 }}</ref> === Visual Features === One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Hazarika \|first3=Devamanyu \|last4=Majumder \|first4=Navonil \|last5=Zadeh \|first5=Amir \|last6=Morency \|first6=Louis-Philippe \|title=Context-Dependent Sentiment Analysis in User-Generated Videos \|journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) \|pages=873 \|date=2017 \|doi=10.18653/v1/p17-1081 \|url=https://doi.org/10.18653/v1/P17-1081 \|publisher=Association for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1" /> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2" /> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal \|title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication \|journal=~~Ieeexplore.ieee.org~~ \|url=https://ieeexplore.ieee.org/document/7477553/}}</ref> == Fusion Techniques == Line 31: === Hybrid Fusion === Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4" /> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)\|modality]].<ref>{{cite journal \|last1=Shahla \|first1=Shahla \|last2=Naghsh-Nilchi \|first2=Ahmad Reza \|title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication \|journal=~~Ieeexplore.ieee.org~~ \|date=2017 \|url=https://ieeexplore.ieee.org/abstract/document/7983051/}}</ref><ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Peng \|first2=Haiyun \|last3=Hussain \|first3=Amir \|last4=Howard \|first4=Newton \|last5=Cambria \|first5=Erik \|title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis \|journal=Neurocomputing \|date=October 2017 \|volume=261 \|pages=217–230 \|doi=10.1016/j.neucom.2016.09.117}}</ref> == Applications ==

Multimodal sentiment analysis: Difference between revisions