Revision as of 11:27, 27 September 2020 edit Citation bot (talk \| contribs) Bots 5,868,227 edits Add: url, s2cid. \| You can use this bot yourself. Report bugs here. \| Suggested by SemperIocundus \| via #UCB_webform ← Previous edit		Revision as of 03:08, 17 December 2020 edit undo Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 18 (cosmetic): eval 23 templates: del empty params (2×); hyphenate params (2×); cvt lang vals (1×); Tag: AWB Next edit →
Line 1: '''Multimodal sentiment analysis''' is a new dimension{{peacock term\|date=June 2018}} of the traditional text-based [[sentiment analysis]], which goes beyond the analysis of texts, and includes other [[Modality (human–computer interaction)\|modalities]] such as audio and visual data.<ref>{{cite journal \|last1=Soleymani \|first1=Mohammad \|last2=Garcia \|first2=David \|last3=Jou \|first3=Brendan \|last4=Schuller \|first4=Björn \|last5=Chang \|first5=Shih-Fu \|last6=Pantic \|first6=Maja \|title=A survey of multimodal sentiment analysis \|journal=Image and Vision Computing \|date=September 2017 \|volume=65 \|pages=3–14 \|doi=10.1016/j.imavis.2017.08.003\|url=https://zenodo.org/record/3449163 }}</ref> It can be bimodal, which includes different combinations of two modalities, or trimodal, which incorporates three modalities.<ref>{{cite journal \|last1=Karray \|first1=Fakhreddine \|last2=Milad \|first2=Alemzadeh \|last3=Saleh \|first3=Jamil Abou \|last4=Mo Nours \|first4=Arab \|title=Human-Computer Interaction: Overview on State of the Art \|journal=International Journal on Smart Sensing and Intelligent Systems \|volume=1 \|pages=137–159 \|date=2008 \|url=http://s2is.org/Issues/v1/n1/papers/paper9.pdf\|doi=10.21307/ijssis-2017-283 }}</ref> With the extensive amount of [[social media]] data available online in different forms such as videos and images, the conventional text-based [[sentiment analysis]] has evolved into more complex models of multimodal sentiment analysis,<ref name="s1">{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Bajpai \|first3=Rajiv \|last4=Hussain \|first4=Amir \|title=A review of affective computing: From unimodal analysis to multimodal fusion \|journal=Information Fusion \|date=September 2017 \|volume=37 \|pages=98–125 \|doi=10.1016/j.inffus.2017.02.003\|hdl=1893/25490 \|hdl-access=free }}</ref> which can be applied in the development of [[virtual assistant]]s,<ref name ="s5">{{cite web \|title=Google AI to make phone calls for you \|url=https://www.bbc.com/news/technology-44045424 \|website=BBC News \|~~accessdate~~access-date=12 June 2018 \|date=8 May 2018}}</ref> [[Social media analytics\|analysis]] of YouTube movie reviews,<ref name="s4">{{cite journal \|last1=Wollmer \|first1=Martin \|last2=Weninger \|first2=Felix \|last3=Knaup \|first3=Tobias \|last4=Schuller \|first4=Bjorn \|last5=Sun \|first5=Congkai \|last6=Sagae \|first6=Kenji \|last7=Morency \|first7=Louis-Philippe \|title=YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context \|journal=IEEE Intelligent Systems \|date=May 2013 \|volume=28 \|issue=3 \|pages=46–53 \|doi=10.1109/MIS.2013.34\|s2cid=12789201 }}</ref> [[Social media analytics\|analysis]] of news videos,<ref>{{cite arxiv\|last1=Pereira \|first1=Moisés H. R. \|last2=Pádua \|first2=Flávio L. C. \|last3=Pereira \|first3=Adriano C. M. \|last4=Benevenuto \|first4=Fabrício \|last5=Dalip \|first5=Daniel H. \|title=Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos\|date=9 April 2016 \|eprint=1604.02612\|class=cs.CL }}</ref> and [[emotion recognition]] (sometimes known as [[emotion]] detection) such as [[depression (mood)\|depression]] monitoring,<ref name = "s6">{{cite book \|last1=Zucco \|first1=Chiara \|last2=Calabrese \|first2=Barbara \|last3=Cannataro \|first3=Mario \|title=Sentiment analysis and affective computing for depression monitoring \|journal=2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) \|date=November 2017 \|pages=1988–1995 \|doi=10.1109/bibm.2017.8217966 \|publisher=IEEE \|language=~~English~~en\|isbn=978-1-5090-3050-7 \|s2cid=24408937 }}</ref> among others. Similar to the traditional [[sentiment analysis]], one of the most basic task in multimodal sentiment analysis is [[Feeling\|sentiment]] classification, which classifies different sentiments into categories such as positive, negative, or neutral.<ref>{{cite book \|last1=Pang \|first1=Bo \|last2=Lee \|first2=Lillian \|title=Opinion mining and sentiment analysis \|date=2008 \|publisher=Now Publishers \|___location=Hanover, MA \|isbn=978-1601981509}}</ref> The complexity of [[Social media analytics\|analyzing]] text, audio, and visual features to perform such a task requires the application of different fusion techniques, such as feature-level, decision-level, and hybrid fusion.<ref name="s1" /> The performance of these fusion techniques and the [[classification]] [[algorithm]]s applied, are influenced by the type of textual, audio, and visual features employed in the analysis.<ref name = "s7" /> Line 13: === Visual features === One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Hazarika \|first3=Devamanyu \|last4=Majumder \|first4=Navonil \|last5=Zadeh \|first5=Amir \|last6=Morency \|first6=Louis-Philippe \|title=Context-Dependent Sentiment Analysis in User-Generated Videos \|journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) \|pages=873–883 \|date=2017 \|doi=10.18653/v1/p17-1081 \|doi-access=free }}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1" /> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2" /> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal \|title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication ~~\|journal=~~ \|doi= 10.1109/WACV.2016.7477553\|s2cid= 1919851\|url= https://www.repository.cam.ac.uk/handle/1810/280724}}</ref> == Fusion techniques == Line 25: === Hybrid fusion === Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4" /> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)\|modality]].<ref>{{cite journal \|last1=Shahla \|first1=Shahla \|last2=Naghsh-Nilchi \|first2=Ahmad Reza \|title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication ~~\|journal=~~ \|date=2017 \|doi=10.1109/PRIA.2017.7983051 \|s2cid=24466718 }}</ref><ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Peng \|first2=Haiyun \|last3=Hussain \|first3=Amir \|last4=Howard \|first4=Newton \|last5=Cambria \|first5=Erik \|title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis \|journal=Neurocomputing \|date=October 2017 \|volume=261 \|pages=217–230 \|doi=10.1016/j.neucom.2016.09.117}}</ref> == Applications == Similar to text-based sentiment analysis, multimodal sentiment analysis can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4" /> and general product reviews,<ref>{{cite journal \|last1=Pérez-Rosas \|first1=Verónica \|last2=Mihalcea \|first2=Rada \|last3=Morency \|first3=Louis Philippe \|title=Utterance-level multimodal sentiment analysis \|journal=Long Papers \|date=1 January 2013 \|url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis \|publisher=Association for Computational Linguistics (ACL)}}</ref> to predict the sentiments of customers, and subsequently create product or service recommendations.<ref>{{cite web \|last1=Chui \|first1=Michael \|last2=Manyika \|first2=James \|last3=Miremadi \|first3=Mehdi \|last4=Henke \|first4=Nicolaus \|last5=Chung \|first5=Rita \|last6=Nel \|first6=Pieter \|last7=Malhotra \|first7=Sankalp \|title=Notes from the AI frontier. Insights from hundreds of use cases \|url=https://www.mckinsey.com/mgi/ \|website=McKinsey & Company \|publisher=McKinsey & Company \|~~accessdate~~access-date=13 June 2018 \|language=en}}</ref> Multimodal sentiment analysis also plays an important role in the advancement of [[virtual assistant]]s through the application of [[natural language processing]] (NLP) and [[machine learning]] techniques.<ref name ="s5" /> In the healthcare ___domain, multimodal sentiment analysis can be utilized to detect certain medical conditions such as [[Psychological stress\|stress]], [[anxiety]], or [[Depression (mood)\|depression]].<ref name = "s6" /> Multimodal sentiment analysis can also be applied in understanding the sentiments contained in video news programs, which is considered as a complicated and challenging ___domain, as sentiments expressed by reporters tend to be less obvious or neutral.<ref>{{cite book\|last1=Ellis \|first1=Joseph G. \|last2=Jou \|first2=Brendan \|last3=Chang \|first3=Shih-Fu \|title=Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News \|date=12 November 2014 \|pages=104–111 \|doi=10.1145/2663204.2663237 \|publisher=ACM\|chapter=Why We Watch the News \|isbn=9781450328852 \|s2cid=14112246 }}</ref> ==References==

Multimodal sentiment analysis: Difference between revisions