Multimodal sentiment analysis: Difference between revisions

Content deleted Content added
new article
 
Citation bot (talk | contribs)
Altered template type. Add: class, eprint. Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
 
(47 intermediate revisions by 26 users not shown)
Line 1:
{{Short description|Technology for sentiment analysis}}
'''Multimodal [[sentiment analysis]]''' is a newtechnology dimension of thefor traditional text-based [[sentiment analysis]], which goes beyond the analysis of texts, and includes other [[Modality (human–computer interaction)|modalities]] such as audio and visual data.<ref>{{cite journal |last1=Soleymani |first1=Mohammad |last2=Garcia |first2=David |last3=Jou |first3=Brendan |last4=Schuller |first4=Björn |last5=Chang |first5=Shih-Fu |last6=Pantic |first6=Maja |title=A survey of multimodal sentiment analysis |journal=Image and Vision Computing |date=September 2017 |volume=65 |pages=3–14 |doi=10.1016/j.imavis.2017.08.003|s2cid=19491070 |url=https://zenodo.org/record/3449163 }}</ref> It can be bimodal, which includes different combinations of two [[Modality (human–computer interaction)|modalities]], or trimodal, which incorporates three [[Modality (human–computer interaction)|modalities]].<ref>{{cite journal |last1=Karray |first1=Fakhreddine |last2=Milad |first2=Alemzadeh |last3=Saleh |first3=Jamil Abou |last4=Mo Nours |first4=Arab |title=Human-Computer Interaction: Overview on State of the Art |journal=International Journal on Smart Sensing and Intelligent Systems |volume=1 |pages=137–159 |date=2008 |url=http://s2is.org/Issues/v1/n1/papers/paper9.pdf|doi=10.21307/ijssis-2017-283 |doi-access=free }}</ref> With the extensive amount of [[social media]] data available online in different forms such as videos and images, the conventional text-based [[sentiment analysis]] has evolved into more complex models of multimodal [[sentiment analysis]],<ref name="s1">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Bajpai |first3=Rajiv |last4=Hussain |first4=Amir |title=A review of affective computing: From unimodal analysis to multimodal fusion |journal=Information Fusion |date=September 2017 |volume=37 |pages=98–125 |doi=10.1016/j.inffus.2017.02.003|hdl=1893/25490 |s2cid=205433041 |url=http://researchrepository.napier.ac.uk/Output/1792429 |hdl-access=free }}</ref><ref>{{cite arXiv |last1=Nguyen |first1=Quy Hoang |title=New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis |date=2024-05-01 |eprint=2405.00543 |last2=Nguyen |first2=Minh-Van Truong |last3=Van Nguyen |first3=Kiet|class=cs.CL }}</ref>, which can be applied in the development of [[virtual assistant]]s,<ref name ="s5">{{cite web |title=Google AI to make phone calls for you |url=https://www.bbc.com/news/technology-44045424 |website=BBC News |accessdateaccess-date=12 June 2018 |date=8 May 2018}}</ref>, [[Social media analytics|analysis]] of YouTube movie reviews,<ref name="s4">{{cite journal |last1=Wollmer |first1=Martin |last2=Weninger |first2=Felix |last3=Knaup |first3=Tobias |last4=Schuller |first4=Bjorn |last5=Sun |first5=Congkai |last6=Sagae |first6=Kenji |last7=Morency |first7=Louis-Philippe |title=YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context |journal=IEEE Intelligent Systems |date=May 2013 |volume=28 |issue=3 |pages=46–53 |doi=10.1109/MIS.2013.34|s2cid=12789201 |url=https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf }}</ref>, [[Social media analytics|analysis]] of news videos,<ref>{{cite journal arXiv|last1=Pereira |first1=Moisés H. R. |last2=Pádua |first2=Flávio L. C. |last3=Pereira |first3=Adriano C. M. |last4=Benevenuto |first4=Fabrício |last5=Dalip |first5=Daniel H. |title=Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos |journal=arXiv:1604.02612 [cs] |date=9 April 2016 |urleprint=http://arxiv.org/abs/1604.02612|class=cs.CL }}</ref>, and [[emotion recognition]] (sometimes known as [[emotion]] detection) such as [[depression (mood)|depression]] monitoring,<ref name = "s6">{{cite journalbook |last1=Zucco |first1=Chiara |last2=Calabrese |first2=Barbara |last3=Cannataro |first3=Mario |title=Sentiment analysis and affective computing for depression monitoring |journal=2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) |chapter=Sentiment analysis and affective computing for depression monitoring |date=November 2017 |pages=1988–1995 |doi=10.1109/bibm.2017.8217966 |url=http://doi.ieeecomputersociety.org/10.1109/BIBM.2017.8217966 |publisher=IEEE |language=Englishen|isbn=978-1-5090-3050-7 |s2cid=24408937 }}</ref>, among others.
 
Similar to the traditional [[sentiment analysis]], one of the most basic taskstask in multimodal [[sentiment analysis]] is [[Feeling|sentiment]] classification, which is classifyingclassifies different [[sentiment]]ssentiments into categories such as positive, negative, or neutral.<ref>{{cite book |last1=Pang |first1=Bo |last2=Lee |first2=Lillian |title=Opinion mining and sentiment analysis |date=2008 |publisher=Now Publishers |___location=Hanover, MA |isbn=1601981503978-1601981509}}</ref>. The complexity of [[Social media analytics|analyzing]] text, audio, and visual features to perform such a task, requires the application of different fusion techniques, such as feature-level, decision-level, and hybrid fusion.<ref name="s1">< /ref> The performance of these fusion techniques and the [[classification]] algorithms[[algorithm]]s applied, are influenced by the type of textual, audio, and visual features employed in the analysis.<ref name = "s7">< /ref>
 
== Features ==
[[Feature engineering]], which involves the selection of features that are fed into [[machine learning]] algorithms, plays a key role in the [[sentiment]] classification performance.<ref name = "s7">{{cite journal |last1=Sun |first1=Shiliang |last2=Luo |first2=Chen |last3=Chen |first3=Junyu |title=A review of natural language processing techniques for opinion mining systems |journal=Information Fusion |date=July 2017 |volume=36 |pages=10–25 |doi=10.1016/j.inffus.2016.10.004}}</ref> In multimodal [[sentiment analysis]], a combination of different textual, audio, and visual features are employed.<ref name = "s1">< /ref>
 
=== Textual Featuresfeatures ===
Similar to the conventional text-based [[sentiment analysis]], some of the most commonly used textual features in multimodal [[sentiment analysis]] are [[n-grams|unigrams]] and [[n-gram]]s, which are basically a sequence of words in a given textual document.<ref>{{cite journal |last1=Yadollahi |first1=Ali |last2=Shahraki |first2=Ameneh Gholipour |last3=Zaiane |first3=Osmar R. |title=Current State of Text Sentiment Analysis from Opinion to Emotion Mining |journal=ACM Computing Surveys |date=25 May 2017 |volume=50 |issue=2 |pages=1–33 |doi=10.1145/3057270|s2cid=5275807 }}</ref> These features are applied using [[bag-of-words]] or bag-of-concepts feature representations, in which words or concepts are represented as vectors in a suitable space.<ref name="s2">{{cite journal |last1=Perez Rosas |first1=Veronica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis-Philippe |title=Multimodal Sentiment Analysis of Spanish Online Videos |journal=IEEE Intelligent Systems |date=May 2013 |volume=28 |issue=3 |pages=38–45 |doi=10.1109/MIS.2013.9|s2cid=1132247 }}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hussain |first3=Amir |last4=Huang |first4=Guang-Bin |title=Towards an intelligent framework for multimodal affective data analysis |journal=Neural Networks |date=March 2015 |volume=63 |pages=104–116 |doi=10.1016/j.neunet.2014.10.005|pmid=25523041 |hdl=1893/21310 |s2cid=342649 |hdl-access=free }}</ref>
 
=== Audio Featuresfeatures ===
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16|s2cid=52853112 }}</ref> Some of the most important audio features employed in multimodal [[sentiment analysis]] are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]] histogram, [[beat]] sum, strongest [[beat]], pause duration, and [[pitch accent|pitch]].<ref name="s1">< /ref> [[OpenSMILE]]<ref>{{cite journalbook |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journalpages=ieeexplore.ieee.org1 |date=2009 |doi=10.1109/ACII.2009.5349350 |isbn=978-1-4244-4800-5 |chapter=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit |s2cid=2081569 |url=httphttps://ieeexplore.ieeenbn-resolving.org/document/5349350urn:nbn:de:bvb:384-opus4-766112 }}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal book|last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |urlpublisher=https://dl.acm.org/citation.cfm?idACM|chapter=2070509Towards multimodal sentiment analysis |publisherisbn=ACM9781450306416 |s2cid=1257599 }}</ref>
 
=== Visual features ===
[[Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16}}</ref> Some of the most important audio features employed in multimodal [[sentiment analysis]] are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]] histogram, [[beat]] sum, strongest [[beat]], pause duration, and [[pitch]].<ref name="s1"></ref> [[OpenSMILE]]<ref>{{cite journal |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2009 |doi=10.1109/ACII.2009.5349350 |url=http://ieeexplore.ieee.org/document/5349350}}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal |last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |url=https://dl.acm.org/citation.cfm?id=2070509 |publisher=ACM}}</ref>
One of the main advantages of analyzing videos overwith respect to texts alone, areis the presence of rich [[sentiment]] cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |pages=873–883 |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisheraccess=Associationfree for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are principalof signsparamount ofimportance understandingin [[sentiment]]scapturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1">< /ref> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal [[sentiment analysis]].<ref name="s2">< /ref> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journalbook |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journaldate=ieeexplore March 2016|doi= 10.ieee1109/WACV.org2016.7477553|isbn= 978-1-5090-0641-0|s2cid= 1919851|url= https://ieeexplorewww.ieeerepository.orgcam.ac.uk/documenthandle/74775531810/280724}}</ref>
 
=== VisualFusion Featurestechniques ===
Unlike the traditional text-based [[sentiment analysis]], multimodal [[sentiment analysis]] undergoesundergo a fusion process in which data from different [[Modality (human–computer interaction)|modalities]] (text, audio, or visual) are fused and analyzed together.<ref name ="s1">< /ref> The existing approaches in multimodal [[sentiment analysis]] [[data fusion]] can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the [[sentiment]] classification depends on which type of fusion technique is employed.<ref name ="s1">< /ref>
 
=== Feature-level Fusionfusion ===
One of the main advantages of analyzing videos over texts alone, are the presence of rich [[sentiment]] cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisher=Association for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are principal signs of understanding [[sentiment]]s and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1"></ref> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal [[sentiment analysis]].<ref name="s2"></ref> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journal=ieeexplore.ieee.org |url=https://ieeexplore.ieee.org/document/7477553/}}</ref>
Feature-level fusion (sometimes known as early fusion), gathers all the features from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm.<ref name="s3">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Howard |first3=Newton |last4=Huang |first4=Guang-Bin |last5=Hussain |first5=Amir |title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content |journal=Neurocomputing |date=January 2016 |volume=174 |pages=50–59 |doi=10.1016/j.neucom.2015.01.095|s2cid=15287807 }}</ref>. One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1">< /ref>
 
=== FusionDecision-level Techniquesfusion ===
Decision-level fusion (sometimes known as late fusion), feeds data from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) independently into its own classification algorithm, and obtains the final [[sentiment]] classification results by fusing each result into a single decision vector.<ref name="s3">< /ref> One of the advantages of this fusion technique is that it eliminates the need to fuse heterogeneous data, and each [[modality (human–computer interaction)|modality]] can utilize its most appropriate [[classification]] [[algorithm]].<ref name="s1">< /ref>
 
=== Hybrid Fusionfusion ===
Unlike the traditional text-based [[sentiment analysis]], multimodal [[sentiment analysis]] undergoes a fusion process in which data from different [[Modality (human–computer interaction)|modalities]] (text, audio, or visual) are fused and analyzed together.<ref name ="s1"></ref> The existing approaches in multimodal [[sentiment analysis]] [[data fusion]] can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the [[sentiment]] classification depends on which type of fusion technique is employed.<ref name ="s1"></ref>
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4">< /ref> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two [[Modality (human–computer interaction)|modalities]], and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2017 |urldoi=https:10.1109//ieeexplorePRIA.ieee2017.org/abstract/document/7983051/ |s2cid=24466718 }}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Peng |first2=Haiyun |last3=Hussain |first3=Amir |last4=Howard |first4=Newton |last5=Cambria |first5=Erik |title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis |journal=Neurocomputing |date=October 2017 |volume=261 |pages=217–230 |doi=10.1016/j.neucom.2016.09.117}}</ref>
 
=== Feature-level Fusion ===
 
Feature-level fusion (sometimes known as early fusion), gathers all features from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm<ref name="s3">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Howard |first3=Newton |last4=Huang |first4=Guang-Bin |last5=Hussain |first5=Amir |title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content |journal=Neurocomputing |date=January 2016 |volume=174 |pages=50–59 |doi=10.1016/j.neucom.2015.01.095}}</ref>. One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1"></ref>
 
=== Decision-level Fusion ===
 
Decision-level fusion (sometimes known as late fusion), feeds data from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) independently into its own classification algorithm, and obtains the final [[sentiment]] classification results by fusing each result into a single decision vector.<ref name="s3"></ref> One of the advantages of this fusion technique is it eliminates the need to fuse heterogeneous data, and each [[modality (human–computer interaction)|modality]] can utilize its most appropriate classification algorithm.<ref name="s1"></ref>
 
=== Hybrid Fusion ===
 
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4"></ref> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two [[Modality (human–computer interaction)|modalities]], and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2017 |url=https://ieeexplore.ieee.org/abstract/document/7983051/}}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Peng |first2=Haiyun |last3=Hussain |first3=Amir |last4=Howard |first4=Newton |last5=Cambria |first5=Erik |title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis |journal=Neurocomputing |date=October 2017 |volume=261 |pages=217–230 |doi=10.1016/j.neucom.2016.09.117}}</ref>
 
== Applications ==
Similar to text-based [[sentiment analysis]], multimodal [[sentiment analysis]] can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4">< /ref> and general product reviews,<ref>{{cite journal |last1=Pérez-Rosas |first1=Verónica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis Philippe |title=Utterance-level multimodal sentiment analysis |journal=Long Papers |date=1 January 2013 |url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis |publisher=Association for Computational Linguistics (ACL)}}</ref>, to predict the [[sentiment]]ssentiments of customers, and subsequently create product or service recommendations.<ref>{{cite web |last1=Chui |first1=Michael |last2=Manyika |first2=James |last3=Miremadi |first3=Mehdi |last4=Henke |first4=Nicolaus |last5=Chung |first5=Rita |last6=Nel |first6=Pieter |last7=Malhotra |first7=Sankalp |title=Notes from the AI frontier. Insights from hundreds of use cases |url=https://www.mckinsey.com/mgi/ |website=McKinsey & Company |publisher=McKinsey & Company |accessdateaccess-date=13 June 2018 |language=en}}</ref> Multimodal [[sentiment analysis]] also plays an important role in the advancement of [[virtual assistant]]s through the application of [[natural language processing]] (NLP) and [[machine learning]] techniques.<ref name ="s5">< /ref> In the healthcare ___domain, multimodal [[sentiment analysis]] can be utilized to detect certain medical conditions such as [[Psychological stress|stress]], [[anxiety]], or [[Depression (mood)|depression]].<ref name = "s6">< /ref> Multimodal [[sentiment analysis]] can also be applied in understanding the [[sentiment]]ssentiments contained in video news programs, which is considered as a complicated and challenging ___domain, as sentiments expressed by reporters tend to be less obvious or neutral.<ref>{{cite journal book|last1=Ellis |first1=Joseph G. |last2=Jou |first2=Brendan |last3=Chang |first3=Shih-Fu |title=Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News |date=12 November 2014 |pages=104–111 |doi=10.1145/2663204.2663237 |urlpublisher=https://dl.acm.org/citation.cfm?doidACM|chapter=2663204.2663237Why We Watch the News |publisherisbn=ACM9781450328852 |s2cid=14112246 }}</ref>
 
==References==
Similar to text-based [[sentiment analysis]], multimodal [[sentiment analysis]] can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4"></ref> and general product reviews<ref>{{cite journal |last1=Pérez-Rosas |first1=Verónica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis Philippe |title=Utterance-level multimodal sentiment analysis |journal=Long Papers |date=1 January 2013 |url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis |publisher=Association for Computational Linguistics (ACL)}}</ref>, to predict the [[sentiment]]s of customers, and subsequently create product or service recommendations.<ref>{{cite web |last1=Chui |first1=Michael |last2=Manyika |first2=James |last3=Miremadi |first3=Mehdi |last4=Henke |first4=Nicolaus |last5=Chung |first5=Rita |last6=Nel |first6=Pieter |last7=Malhotra |first7=Sankalp |title=Notes from the AI frontier. Insights from hundreds of use cases |url=https://www.mckinsey.com/mgi/ |website=McKinsey & Company |publisher=McKinsey & Company |accessdate=13 June 2018 |language=en}}</ref> Multimodal [[sentiment analysis]] also plays an important role in the advancement of [[virtual assistant]]s through the application of [[natural language processing]] (NLP) and [[machine learning]] techniques.<ref name ="s5"></ref> In the healthcare ___domain, multimodal [[sentiment analysis]] can be utilized to detect certain medical conditions such as [[Psychological stress|stress]], [[anxiety]], or [[Depression (mood)|depression]].<ref name = "s6"></ref> Multimodal [[sentiment analysis]] can also be applied in understanding the [[sentiment]]s contained in video news programs, which is considered as a complicated and challenging ___domain, as sentiments expressed by reporters tend to be less obvious or neutral.<ref>{{cite journal |last1=Ellis |first1=Joseph G. |last2=Jou |first2=Brendan |last3=Chang |first3=Shih-Fu |title=Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News |date=12 November 2014 |pages=104–111 |doi=10.1145/2663204.2663237 |url=https://dl.acm.org/citation.cfm?doid=2663204.2663237 |publisher=ACM}}</ref>
{{Reflist}}
 
==References==
{{Reference}}
[[Category:Natural language processing]]
[[Category:Affective computing]]
[[Category:Social media]]
[[Category:Machine learning]]
[[Category:Multimodal interaction]]