Multimodal sentiment analysis: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Peacock-inline}}
Citation bot (talk | contribs)
Altered template type. Add: class, eprint. Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
 
(37 intermediate revisions by 18 users not shown)
Line 1:
{{Short description|Technology for sentiment analysis}}
'''Multimodal sentiment analysis''' is a newtechnology dimension{{peacock-inline|date=June 2018}} of thefor traditional text-based [[Sentiment analysis|sentiment analysis]], which goes beyond the analysis of texts, and includes other [[Modality (human–computer interaction)|modalities]] such as audio and visual data.<ref>{{cite journal |last1=Soleymani |first1=Mohammad |last2=Garcia |first2=David |last3=Jou |first3=Brendan |last4=Schuller |first4=Björn |last5=Chang |first5=Shih-Fu |last6=Pantic |first6=Maja |title=A survey of multimodal sentiment analysis |journal=Image and Vision Computing |date=September 2017 |volume=65 |pages=3–14 |doi=10.1016/j.imavis.2017.08.003|s2cid=19491070 |url=https://zenodo.org/record/3449163 }}</ref> It can be bimodal, which includes different combinations of two modalities, or trimodal, which incorporates three modalities.<ref>{{cite journal |last1=Karray |first1=Fakhreddine |last2=Milad |first2=Alemzadeh |last3=Saleh |first3=Jamil Abou |last4=Mo Nours |first4=Arab |title=Human-Computer Interaction: Overview on State of the Art |journal=International Journal on Smart Sensing and Intelligent Systems |volume=1 |pages=137–159 |date=2008 |url=http://s2is.org/Issues/v1/n1/papers/paper9.pdf|doi=10.21307/ijssis-2017-283 |doi-access=free }}</ref> With the extensive amount of [[Social media|social media]] data available online in different forms such as videos and images, the conventional text-based [[Sentiment analysis|sentiment analysis]] has evolved into more complex models of multimodal sentiment analysis,<ref name="s1">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Bajpai |first3=Rajiv |last4=Hussain |first4=Amir |title=A review of affective computing: From unimodal analysis to multimodal fusion |journal=Information Fusion |date=September 2017 |volume=37 |pages=98–125 |doi=10.1016/j.inffus.2017.02.003|hdl=1893/25490 |s2cid=205433041 |url=http://researchrepository.napier.ac.uk/Output/1792429 |hdl-access=free }}</ref><ref>{{cite arXiv |last1=Nguyen |first1=Quy Hoang |title=New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis |date=2024-05-01 |eprint=2405.00543 |last2=Nguyen |first2=Minh-Van Truong |last3=Van Nguyen |first3=Kiet|class=cs.CL }}</ref>, which can be applied in the development of [[virtual assistant]]s,<ref name ="s5">{{cite web |title=Google AI to make phone calls for you |url=https://www.bbc.com/news/technology-44045424 |website=BBC News |accessdateaccess-date=12 June 2018 |date=8 May 2018}}</ref>, [[Social media analytics|analysis]] of YouTube movie reviews,<ref name="s4">{{cite journal |last1=Wollmer |first1=Martin |last2=Weninger |first2=Felix |last3=Knaup |first3=Tobias |last4=Schuller |first4=Bjorn |last5=Sun |first5=Congkai |last6=Sagae |first6=Kenji |last7=Morency |first7=Louis-Philippe |title=YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context |journal=IEEE Intelligent Systems |date=May 2013 |volume=28 |issue=3 |pages=46–53 |doi=10.1109/MIS.2013.34|s2cid=12789201 |url=https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf }}</ref>, [[Social media analytics|analysis]] of news videos,<ref>{{cite journal arXiv|last1=Pereira |first1=Moisés H. R. |last2=Pádua |first2=Flávio L. C. |last3=Pereira |first3=Adriano C. M. |last4=Benevenuto |first4=Fabrício |last5=Dalip |first5=Daniel H. |title=Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos |journal=arXiv:1604.02612 [cs] |date=9 April 2016 |urleprint=https://arxiv.org/abs/1604.02612|class=cs.CL }}</ref>, and [[emotion recognition]] (sometimes known as [[emotion]] detection) such as [[depression (mood)|depression]] monitoring,<ref name = "s6">{{cite journalbook |last1=Zucco |first1=Chiara |last2=Calabrese |first2=Barbara |last3=Cannataro |first3=Mario |title=Sentiment analysis and affective computing for depression monitoring |journal=2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) |chapter=Sentiment analysis and affective computing for depression monitoring |date=November 2017 |pages=1988–1995 |doi=10.1109/bibm.2017.8217966 |url=http://doi.ieeecomputersociety.org/10.1109/BIBM.2017.8217966 |publisher=IEEE |language=Englishen|isbn=978-1-5090-3050-7 |s2cid=24408937 }}</ref>, among others.
 
Similar to the traditional [[Sentiment analysis|sentiment analysis]], one of the most basic task in multimodal sentiment analysis is [[Feeling|sentiment]] classification, which classifies different sentiments into categories such as positive, negative, or neutral.<ref>{{cite book |last1=Pang |first1=Bo |last2=Lee |first2=Lillian |title=Opinion mining and sentiment analysis |date=2008 |publisher=Now Publishers |___location=Hanover, MA |isbn=1601981503978-1601981509}}</ref>. The complexity of [[Social media analytics|analyzing]] text, audio, and visual features to perform such a task requires the application of different fusion techniques, such as feature-level, decision-level, and hybrid fusion.<ref name="s1">< /ref> The performance of these fusion techniques and the [[Classification|classification]] [[Algorithm|algorithm]]s applied, are influenced by the type of textual, audio, and visual features employed in the analysis.<ref name = "s7">< /ref>
 
== Features ==
[[Feature engineering]], which involves the selection of features that are fed into [[machine learning]] algorithms, plays a key role in the sentiment classification performance.<ref name = "s7">{{cite journal |last1=Sun |first1=Shiliang |last2=Luo |first2=Chen |last3=Chen |first3=Junyu |title=A review of natural language processing techniques for opinion mining systems |journal=Information Fusion |date=July 2017 |volume=36 |pages=10–25 |doi=10.1016/j.inffus.2016.10.004}}</ref> In multimodal sentiment analysis, a combination of different textual, audio, and visual features are employed.<ref name = "s1">< /ref>
 
=== Textual Featuresfeatures ===
Similar to the conventional text-based [[Sentiment analysis|sentiment analysis]], some of the most commonly used textual features in multimodal sentiment analysis are [[n-grams|unigrams]] and [[n-gram]]s, which are basically a sequence of words in a given textual document.<ref>{{cite journal |last1=Yadollahi |first1=Ali |last2=Shahraki |first2=Ameneh Gholipour |last3=Zaiane |first3=Osmar R. |title=Current State of Text Sentiment Analysis from Opinion to Emotion Mining |journal=ACM Computing Surveys |date=25 May 2017 |volume=50 |issue=2 |pages=1–33 |doi=10.1145/3057270|s2cid=5275807 }}</ref> These features are applied using [[bag-of-words]] or bag-of-concepts feature representations, in which words or concepts are represented as vectors in a suitable space.<ref name="s2">{{cite journal |last1=Perez Rosas |first1=Veronica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis-Philippe |title=Multimodal Sentiment Analysis of Spanish Online Videos |journal=IEEE Intelligent Systems |date=May 2013 |volume=28 |issue=3 |pages=38–45 |doi=10.1109/MIS.2013.9|s2cid=1132247 }}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hussain |first3=Amir |last4=Huang |first4=Guang-Bin |title=Towards an intelligent framework for multimodal affective data analysis |journal=Neural Networks |date=March 2015 |volume=63 |pages=104–116 |doi=10.1016/j.neunet.2014.10.005|pmid=25523041 |hdl=1893/21310 |s2cid=342649 |hdl-access=free }}</ref>
 
=== Audio Featuresfeatures ===
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16|s2cid=52853112 }}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]]{{dn|date=June 2018}} histogram, beat sum, strongest beat, pause duration, and [[pitch accent|pitch]].<ref name="s1">< /ref> [[OpenSMILE]]<ref>{{cite journalbook |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journalpages=ieeexplore.ieee.org1 |date=2009 |doi=10.1109/ACII.2009.5349350 |isbn=978-1-4244-4800-5 |chapter=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit |s2cid=2081569 |url=httphttps://ieeexplore.ieeenbn-resolving.org/document/5349350urn:nbn:de:bvb:384-opus4-766112 }}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal book|last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |urlpublisher=https://dl.acm.org/citation.cfm?idACM|chapter=2070509Towards multimodal sentiment analysis |publisherisbn=ACM9781450306416 |s2cid=1257599 }}</ref>
 
=== Visual features ===
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16}}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], [[beat]]{{dn|date=June 2018}} histogram, beat sum, strongest beat, pause duration, and [[pitch accent|pitch]].<ref name="s1"></ref> [[OpenSMILE]]<ref>{{cite journal |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2009 |doi=10.1109/ACII.2009.5349350 |url=http://ieeexplore.ieee.org/document/5349350}}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite journal |last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |url=https://dl.acm.org/citation.cfm?id=2070509 |publisher=ACM}}</ref>
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |pages=873–883 |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisheraccess=Association for Computationalfree Linguistics}}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1">< /ref> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2">< /ref> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journalbook |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journaldate=ieeexplore March 2016|doi= 10.ieee1109/WACV.org2016.7477553|isbn= 978-1-5090-0641-0|s2cid= 1919851|url= https://ieeexplorewww.ieeerepository.orgcam.ac.uk/documenthandle/74775531810/280724}}</ref>
 
=== VisualFusion Featurestechniques ===
Unlike the traditional text-based [[Sentiment analysis|sentiment analysis]], multimodal sentiment analysis undergo a fusion process in which data from different modalities (text, audio, or visual) are fused and analyzed together.<ref name ="s1">< /ref> The existing approaches in multimodal sentiment analysis [[data fusion]] can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the sentiment classification depends on which type of fusion technique is employed.<ref name ="s1">< /ref>
 
=== Feature-level Fusionfusion ===
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |date=2017 |doi=10.18653/v1/p17-1081 |url=https://doi.org/10.18653/v1/P17-1081 |publisher=Association for Computational Linguistics}}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1"></ref> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2"></ref> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite journal |title=OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication |journal=ieeexplore.ieee.org |url=https://ieeexplore.ieee.org/document/7477553/}}</ref>
Feature-level fusion (sometimes known as early fusion) gathers all the features from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm.<ref name="s3">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Howard |first3=Newton |last4=Huang |first4=Guang-Bin |last5=Hussain |first5=Amir |title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content |journal=Neurocomputing |date=January 2016 |volume=174 |pages=50–59 |doi=10.1016/j.neucom.2015.01.095|s2cid=15287807 }}</ref>. One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1">< /ref>
 
=== FusionDecision-level Techniquesfusion ===
Decision-level fusion (sometimes known as late fusion), feeds data from each modality (text, audio, or visual) independently into its own classification algorithm, and obtains the final sentiment classification results by fusing each result into a single decision vector.<ref name="s3">< /ref> One of the advantages of this fusion technique is that it eliminates the need to fuse heterogeneous data, and each [[modality (human–computer interaction)|modality]] can utilize its most appropriate [[Classification|classification]] [[Algorithm|algorithm]].<ref name="s1">< /ref>
 
=== Hybrid Fusionfusion ===
Unlike the traditional text-based [[Sentiment analysis|sentiment analysis]], multimodal sentiment analysis undergo a fusion process in which data from different modalities (text, audio, or visual) are fused and analyzed together.<ref name ="s1"></ref> The existing approaches in multimodal sentiment analysis [[data fusion]] can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the sentiment classification depends on which type of fusion technique is employed.<ref name ="s1"></ref>
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4">< /ref> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2017 |urldoi=https:10.1109//ieeexplorePRIA.ieee2017.org/abstract/document/7983051/ |s2cid=24466718 }}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Peng |first2=Haiyun |last3=Hussain |first3=Amir |last4=Howard |first4=Newton |last5=Cambria |first5=Erik |title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis |journal=Neurocomputing |date=October 2017 |volume=261 |pages=217–230 |doi=10.1016/j.neucom.2016.09.117}}</ref>
 
=== Feature-level Fusion ===
 
Feature-level fusion (sometimes known as early fusion) gathers all the features from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm<ref name="s3">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Howard |first3=Newton |last4=Huang |first4=Guang-Bin |last5=Hussain |first5=Amir |title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content |journal=Neurocomputing |date=January 2016 |volume=174 |pages=50–59 |doi=10.1016/j.neucom.2015.01.095}}</ref>. One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1"></ref>
 
=== Decision-level Fusion ===
 
Decision-level fusion (sometimes known as late fusion), feeds data from each modality (text, audio, or visual) independently into its own classification algorithm, and obtains the final sentiment classification results by fusing each result into a single decision vector.<ref name="s3"></ref> One of the advantages of this fusion technique is that it eliminates the need to fuse heterogeneous data, and each [[modality (human–computer interaction)|modality]] can utilize its most appropriate [[Classification|classification]] [[Algorithm|algorithm]].<ref name="s1"></ref>
 
=== Hybrid Fusion ===
 
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4"></ref> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication |journal=ieeexplore.ieee.org |date=2017 |url=https://ieeexplore.ieee.org/abstract/document/7983051/}}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Peng |first2=Haiyun |last3=Hussain |first3=Amir |last4=Howard |first4=Newton |last5=Cambria |first5=Erik |title=Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis |journal=Neurocomputing |date=October 2017 |volume=261 |pages=217–230 |doi=10.1016/j.neucom.2016.09.117}}</ref>
 
== Applications ==
Similar to text-based sentiment analysis, multimodal sentiment analysis can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4">< /ref> and general product reviews,<ref>{{cite journal |last1=Pérez-Rosas |first1=Verónica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis Philippe |title=Utterance-level multimodal sentiment analysis |journal=Long Papers |date=1 January 2013 |url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis |publisher=Association for Computational Linguistics (ACL)}}</ref>, to predict the sentiments of customers, and subsequently create product or service recommendations.<ref>{{cite web |last1=Chui |first1=Michael |last2=Manyika |first2=James |last3=Miremadi |first3=Mehdi |last4=Henke |first4=Nicolaus |last5=Chung |first5=Rita |last6=Nel |first6=Pieter |last7=Malhotra |first7=Sankalp |title=Notes from the AI frontier. Insights from hundreds of use cases |url=https://www.mckinsey.com/mgi/ |website=McKinsey & Company |publisher=McKinsey & Company |accessdateaccess-date=13 June 2018 |language=en}}</ref> Multimodal sentiment analysis also plays an important role in the advancement of [[virtual assistant]]s through the application of [[natural language processing]] (NLP) and [[machine learning]] techniques.<ref name ="s5">< /ref> In the healthcare ___domain, multimodal sentiment analysis]] can be utilized to detect certain medical conditions such as [[Psychological stress|stress]], [[anxiety]], or [[Depression (mood)|depression]].<ref name = "s6">< /ref> Multimodal sentiment analysis can also be applied in understanding the sentiments contained in video news programs, which is considered as a complicated and challenging ___domain, as sentiments expressed by reporters tend to be less obvious or neutral.<ref>{{cite journal book|last1=Ellis |first1=Joseph G. |last2=Jou |first2=Brendan |last3=Chang |first3=Shih-Fu |title=Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News |date=12 November 2014 |pages=104–111 |doi=10.1145/2663204.2663237 |urlpublisher=https://dl.acm.org/citation.cfm?doidACM|chapter=2663204.2663237Why We Watch the News |publisherisbn=ACM9781450328852 |s2cid=14112246 }}</ref>
 
==References==
Similar to text-based sentiment analysis, multimodal sentiment analysis can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4"></ref> and general product reviews<ref>{{cite journal |last1=Pérez-Rosas |first1=Verónica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis Philippe |title=Utterance-level multimodal sentiment analysis |journal=Long Papers |date=1 January 2013 |url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis |publisher=Association for Computational Linguistics (ACL)}}</ref>, to predict the sentiments of customers, and subsequently create product or service recommendations.<ref>{{cite web |last1=Chui |first1=Michael |last2=Manyika |first2=James |last3=Miremadi |first3=Mehdi |last4=Henke |first4=Nicolaus |last5=Chung |first5=Rita |last6=Nel |first6=Pieter |last7=Malhotra |first7=Sankalp |title=Notes from the AI frontier. Insights from hundreds of use cases |url=https://www.mckinsey.com/mgi/ |website=McKinsey & Company |publisher=McKinsey & Company |accessdate=13 June 2018 |language=en}}</ref> Multimodal sentiment analysis also plays an important role in the advancement of [[virtual assistant]]s through the application of [[natural language processing (NLP) and [[machine learning]] techniques.<ref name ="s5"></ref> In the healthcare ___domain, multimodal sentiment analysis]] can be utilized to detect certain medical conditions such as [[Psychological stress|stress]], [[anxiety]], or [[Depression (mood)|depression]].<ref name = "s6"></ref> Multimodal sentiment analysis can also be applied in understanding the sentiments contained in video news programs, which is considered as a complicated and challenging ___domain, as sentiments expressed by reporters tend to be less obvious or neutral.<ref>{{cite journal |last1=Ellis |first1=Joseph G. |last2=Jou |first2=Brendan |last3=Chang |first3=Shih-Fu |title=Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News |date=12 November 2014 |pages=104–111 |doi=10.1145/2663204.2663237 |url=https://dl.acm.org/citation.cfm?doid=2663204.2663237 |publisher=ACM}}</ref>
{{Reflist}}
 
==References==
{{Reference}}
[[Category:Natural language processing]]
[[Category:Affective computing]]
[[Category:Social media]]
[[Category:Machine learning]]
[[Category:Multimodal interaction]]