Revision as of 10:39, 15 June 2018 edit Pigsonthewing (talk \| contribs) Autopatrolled, Event coordinators, Extended confirmed users, Page movers, File movers, IP block exemptions, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 280,584 edits avoid over-linking ← Previous edit		Revision as of 10:41, 15 June 2018 edit undo Pigsonthewing (talk \| contribs) Autopatrolled, Event coordinators, Extended confirmed users, Page movers, File movers, IP block exemptions, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 280,584 edits avoid over-linking Next edit →
Line 1: '''Multimodal sentiment analysis''' is a new dimension of the traditional text-based sentiment analysis, which goes beyond the analysis of texts, and includes other [[Modality (human–computer interaction)\|modalities]] such as audio and visual data.<ref>{{cite journal \|last1=Soleymani \|first1=Mohammad \|last2=Garcia \|first2=David \|last3=Jou \|first3=Brendan \|last4=Schuller \|first4=Björn \|last5=Chang \|first5=Shih-Fu \|last6=Pantic \|first6=Maja \|title=A survey of multimodal sentiment analysis \|journal=Image and Vision Computing \|date=September 2017 \|volume=65 \|pages=3–14 \|doi=10.1016/j.imavis.2017.08.003}}</ref> It can be bimodal, which includes different combinations of two ~~[[Modality (human–computer interaction)\|~~modalities]], or trimodal, which incorporates three ~~[[Modality (human–computer interaction)\|~~modalities]].<ref>{{cite journal \|last1=Karray \|first1=Fakhreddine \|last2=Milad \|first2=Alemzadeh \|last3=Saleh \|first3=Jamil Abou \|last4=Mo Nours \|first4=Arab \|title=Human-Computer Interaction: Overview on State of the Art \|journal=International Journal on Smart Sensing and Intelligent Systems \|date=2008 \|url=http://s2is.org/Issues/v1/n1/papers/paper9.pdf}}</ref> With the extensive amount of social media data available online in different forms such as videos and images, the conventional text-based sentiment analysis has evolved into more complex models of multimodal sentiment analysis<ref name="s1">{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Bajpai \|first3=Rajiv \|last4=Hussain \|first4=Amir \|title=A review of affective computing: From unimodal analysis to multimodal fusion \|journal=Information Fusion \|date=September 2017 \|volume=37 \|pages=98–125 \|doi=10.1016/j.inffus.2017.02.003}}</ref>, which can be applied in the development of [[virtual assistant]]s<ref name ="s5">{{cite web \|title=Google AI to make phone calls for you \|url=https://www.bbc.com/news/technology-44045424 \|website=BBC News \|accessdate=12 June 2018 \|date=8 May 2018}}</ref>, analysis of YouTube movie reviews<ref name="s4">{{cite journal \|last1=Wollmer \|first1=Martin \|last2=Weninger \|first2=Felix \|last3=Knaup \|first3=Tobias \|last4=Schuller \|first4=Bjorn \|last5=Sun \|first5=Congkai \|last6=Sagae \|first6=Kenji \|last7=Morency \|first7=Louis-Philippe \|title=YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context \|journal=IEEE Intelligent Systems \|date=May 2013 \|volume=28 \|issue=3 \|pages=46–53 \|doi=10.1109/MIS.2013.34}}</ref>, analysis of news videos<ref>{{cite journal \|last1=Pereira \|first1=Moisés H. R. \|last2=Pádua \|first2=Flávio L. C. \|last3=Pereira \|first3=Adriano C. M. \|last4=Benevenuto \|first4=Fabrício \|last5=Dalip \|first5=Daniel H. \|title=Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos \|journal=arXiv:1604.02612 [cs] \|date=9 April 2016 \|url=http://arxiv.org/abs/1604.02612}}</ref>, and [[emotion recognition]] (sometimes known as [[emotion]] detection) such as [[depression]] monitoring<ref name = "s6">{{cite journal \|last1=Zucco \|first1=Chiara \|last2=Calabrese \|first2=Barbara \|last3=Cannataro \|first3=Mario \|title=Sentiment analysis and affective computing for depression monitoring \|journal=2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) \|date=November 2017 \|pages=1988–1995 \|doi=10.1109/bibm.2017.8217966 \|url=http://doi.ieeecomputersociety.org/10.1109/BIBM.2017.8217966 \|publisher=IEEE \|language=English}}</ref>, among others. Similar to the traditional sentiment analysis, one of the most basic tasks in multimodal sentiment analysis is [[sentiment]] classification, which is classifying different [[sentiment]]s into positive, negative, or neutral<ref>{{cite book \|last1=Pang \|first1=Bo \|last2=Lee \|first2=Lillian \|title=Opinion mining and sentiment analysis \|date=2008 \|publisher=Now Publishers \|___location=Hanover, MA \|isbn=1601981503}}</ref>. The complexity of analyzing text, audio, and visual features to perform such a task, requires different fusion techniques such as feature-level, decision-level, and hybrid fusion.<ref name="s1"></ref> The performance of these fusion techniques and the classification algorithms applied, are influenced by the type of textual, audio, and visual features employed in the analysis.<ref name = "s7"></ref> Line 19: == Fusion Techniques == Unlike the traditional text-based sentiment analysis, multimodal sentiment analysis undergoes a fusion process in which data from different ~~[[Modality (human–computer interaction)\|~~modalities]] (text, audio, or visual) are fused and analyzed together.<ref name ="s1"></ref> The existing approaches in multimodal sentiment analysis [[data fusion]] can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the [[sentiment]] classification depends on which type of fusion technique is employed.<ref name ="s1"></ref> === Feature-level Fusion === Line 27: === Decision-level Fusion === Decision-level fusion (sometimes known as late fusion), feeds data from each [[modality ~~(human–computer interaction)\|modality]]~~ (text, audio, or visual) independently into its own classification algorithm, and obtains the final [[sentiment]] classification results by fusing each result into a single decision vector.<ref name="s3"></ref> One of the advantages of this fusion technique is it eliminates the need to fuse heterogeneous data, and each [[modality (human–computer interaction)\|modality]] can utilize its most appropriate classification algorithm.<ref name="s1"></ref> === Hybrid Fusion === Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4"></ref> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two ~~[[Modality (human–computer interaction)\|~~modalities]], and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)\|modality]].<ref>{{cite journal \|last1=Shahla \|first1=Shahla \|last2=Naghsh-Nilchi \|first2=Ahmad Reza \|title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication \|journal=ieeexplore.ieee.org \|date=2017 \|url=https://ieeexplore.ieee.org/abstract/document/7983051/}}</ref><ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Peng \|first2=Haiyun \|last3=Hussain \|first3=Amir \|last4=Howard \|first4=Newton \|last5=Cambria \|first5=Erik \|title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis \|journal=Neurocomputing \|date=October 2017 \|volume=261 \|pages=217–230 \|doi=10.1016/j.neucom.2016.09.117}}</ref> == Applications ==

Multimodal sentiment analysis: Difference between revisions