Content deleted Content added
Citation bot (talk | contribs) m Add: url. Removed URL that duplicated unique identifier. Removed parameters. | You can use this bot yourself. Report bugs here.| Activated by User:Nemo bis | via #UCB_webform |
Citation bot (talk | contribs) Altered template type. Add: class, eprint. Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
||
(22 intermediate revisions by 13 users not shown) | |||
Line 1:
{{Short description|Technology for sentiment analysis}}
'''Multimodal sentiment analysis''' is a
Similar to the traditional [[sentiment analysis]], one of the most basic task in multimodal sentiment analysis is [[Feeling|sentiment]] classification, which classifies different sentiments into categories such as positive, negative, or neutral.<ref>{{cite book |last1=Pang |first1=Bo |last2=Lee |first2=Lillian |title=Opinion mining and sentiment analysis |date=2008 |publisher=Now Publishers |___location=Hanover, MA |isbn=978-1601981509}}</ref> The complexity of [[Social media analytics|analyzing]] text, audio, and visual features to perform such a task requires the application of different fusion techniques, such as feature-level, decision-level, and hybrid fusion.<ref name="s1" /> The performance of these fusion techniques and the [[classification]] [[algorithm]]s applied, are influenced by the type of textual, audio, and visual features employed in the analysis.<ref name = "s7" />
Line 7 ⟶ 8:
=== Textual features ===
Similar to the conventional text-based [[sentiment analysis]], some of the most commonly used textual features in multimodal sentiment analysis are [[n-grams|unigrams]] and [[n-gram]]s, which are basically a sequence of words in a given textual document.<ref>{{cite journal |last1=Yadollahi |first1=Ali |last2=Shahraki |first2=Ameneh Gholipour |last3=Zaiane |first3=Osmar R. |title=Current State of Text Sentiment Analysis from Opinion to Emotion Mining |journal=ACM Computing Surveys |date=25 May 2017 |volume=50 |issue=2 |pages=1–33 |doi=10.1145/3057270|s2cid=5275807 }}</ref> These features are applied using [[bag-of-words]] or bag-of-concepts feature representations, in which words or concepts are represented as vectors in a suitable space.<ref name="s2">{{cite journal |last1=Perez Rosas |first1=Veronica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis-Philippe |title=Multimodal Sentiment Analysis of Spanish Online Videos |journal=IEEE Intelligent Systems |date=May 2013 |volume=28 |issue=3 |pages=38–45 |doi=10.1109/MIS.2013.9|s2cid=1132247 }}</ref><ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hussain |first3=Amir |last4=Huang |first4=Guang-Bin |title=Towards an intelligent framework for multimodal affective data analysis |journal=Neural Networks |date=March 2015 |volume=63 |pages=104–116 |doi=10.1016/j.neunet.2014.10.005|pmid=25523041 |hdl=1893/21310 |s2cid=342649 |hdl-access=free }}</ref>
=== Audio features ===
[[Feeling|Sentiment]] and [[emotion]] characteristics are prominent in different [[phonetic]] and [[prosodic]] properties contained in audio features.<ref>{{cite journal |last1=Chung-Hsien Wu |last2=Wei-Bin Liang |title=Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels |journal=IEEE Transactions on Affective Computing |date=January 2011 |volume=2 |issue=1 |pages=10–21 |doi=10.1109/T-AFFC.2010.16|s2cid=52853112 }}</ref> Some of the most important audio features employed in multimodal sentiment analysis are [[mel-frequency cepstrum| mel-frequency cepstrum (MFCC)]], [[spectral centroid]], [[spectral flux]], beat histogram, beat sum, strongest beat, pause duration, and [[pitch accent|pitch]].<ref name="s1" /> [[OpenSMILE]]<ref>{{cite book |last1=Eyben |first1=Florian |last2=Wöllmer |first2=Martin |last3=Schuller |first3=Björn |title=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication |pages=1 |date=2009 |doi=10.1109/ACII.2009.5349350 |isbn=978-1-4244-4800-5 |chapter=OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit |s2cid=2081569 |url=https://nbn-resolving.org/urn:nbn:de:bvb:384-opus4-766112 }}</ref> and [[Praat]] are popular open-source toolkits for extracting such audio features.<ref>{{cite book|last1=Morency |first1=Louis-Philippe |last2=Mihalcea |first2=Rada |last3=Doshi |first3=Payal |title=Towards multimodal sentiment analysis: harvesting opinions from the web |date=14 November 2011 |pages=169–176 |doi=10.1145/2070481.2070509 |publisher=ACM|chapter=Towards multimodal sentiment analysis |isbn=9781450306416 |s2cid=1257599 }}</ref>
=== Visual features ===
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<ref>{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Hazarika |first3=Devamanyu |last4=Majumder |first4=Navonil |last5=Zadeh |first5=Amir |last6=Morency |first6=Louis-Philippe |title=Context-Dependent Sentiment Analysis in User-Generated Videos |journal=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |pages=873–883 |date=2017 |doi=10.18653/v1/p17-1081 |doi-access=free }}</ref> Visual features include [[facial expression]]s, which are of paramount importance in capturing sentiments and [[emotion]]s, as they are a main channel of forming a person's present state of mind.<ref name="s1" /> Specifically, [[smile]], is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<ref name="s2" /> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<ref>{{cite
== Fusion techniques ==
Line 19 ⟶ 20:
=== Feature-level fusion ===
Feature-level fusion (sometimes known as early fusion) gathers all the features from each [[modality (human–computer interaction)|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm.<ref name="s3">{{cite journal |last1=Poria |first1=Soujanya |last2=Cambria |first2=Erik |last3=Howard |first3=Newton |last4=Huang |first4=Guang-Bin |last5=Hussain |first5=Amir |title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content |journal=Neurocomputing |date=January 2016 |volume=174 |pages=50–59 |doi=10.1016/j.neucom.2015.01.095|s2cid=15287807 }}</ref> One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1" />
=== Decision-level fusion ===
Line 25 ⟶ 26:
=== Hybrid fusion ===
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<ref name="s4" /> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining [[Modality (human–computer interaction)|modality]].<ref>{{cite journal |last1=Shahla |first1=Shahla |last2=Naghsh-Nilchi |first2=Ahmad Reza |title=Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication
== Applications ==
Similar to text-based sentiment analysis, multimodal sentiment analysis can be applied in the development of different forms of [[recommender system]]s such as in the analysis of user-generated videos of movie reviews<ref name="s4" /> and general product reviews,<ref>{{cite journal |last1=Pérez-Rosas |first1=Verónica |last2=Mihalcea |first2=Rada |last3=Morency |first3=Louis Philippe |title=Utterance-level multimodal sentiment analysis |journal=Long Papers |date=1 January 2013 |url=https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis |publisher=Association for Computational Linguistics (ACL)}}</ref> to predict the sentiments of customers, and subsequently create product or service recommendations.<ref>{{cite web |last1=Chui |first1=Michael |last2=Manyika |first2=James |last3=Miremadi |first3=Mehdi |last4=Henke |first4=Nicolaus |last5=Chung |first5=Rita |last6=Nel |first6=Pieter |last7=Malhotra |first7=Sankalp |title=Notes from the AI frontier. Insights from hundreds of use cases |url=https://www.mckinsey.com/mgi/ |website=McKinsey & Company |
==References==
Line 37 ⟶ 38:
[[Category:Social media]]
[[Category:Machine learning]]
[[Category:Multimodal interaction]]
|