Revision as of 07:05, 30 January 2022 edit Citation bot (talk \| contribs) Bots 5,868,612 edits Alter: template type. Add: url. \| Use this bot. Report bugs. \| Suggested by AManWithNoPlan \| #UCB_webform 270/1682 ← Previous edit		Revision as of 13:14, 11 August 2022 edit undo Citation bot (talk \| contribs) Bots 5,868,612 edits Add: s2cid. \| Use this bot. Report bugs. \| Suggested by Abductive \| #UCB_webform 1485/2002 Next edit →
Line 7: === Textual features === Similar to the conventional text-based [[sentiment analysis]], some of the most commonly used textual features in multimodal sentiment analysis are [[n-grams\|unigrams]] and [[n-gram]]s, which are basically a sequence of words in a given textual document.<ref>{{cite journal \|last1=Yadollahi \|first1=Ali \|last2=Shahraki \|first2=Ameneh Gholipour \|last3=Zaiane \|first3=Osmar R. \|title=Current State of Text Sentiment Analysis from Opinion to Emotion Mining \|journal=ACM Computing Surveys \|date=25 May 2017 \|volume=50 \|issue=2 \|pages=1–33 \|doi=10.1145/3057270\|s2cid=5275807 }}</ref> These features are applied using [[bag-of-words]] or bag-of-concepts feature representations, in which words or concepts are represented as vectors in a suitable space.<ref name="s2">{{cite journal \|last1=Perez Rosas \|first1=Veronica \|last2=Mihalcea \|first2=Rada \|last3=Morency \|first3=Louis-Philippe \|title=Multimodal Sentiment Analysis of Spanish Online Videos \|journal=IEEE Intelligent Systems \|date=May 2013 \|volume=28 \|issue=3 \|pages=38–45 \|doi=10.1109/MIS.2013.9\|s2cid=1132247 }}</ref><ref>{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Hussain \|first3=Amir \|last4=Huang \|first4=Guang-Bin \|title=Towards an intelligent framework for multimodal affective data analysis \|journal=Neural Networks \|date=March 2015 \|volume=63 \|pages=104–116 \|doi=10.1016/j.neunet.2014.10.005\|pmid=25523041 \|hdl=1893/21310 \|s2cid=342649 \|hdl-access=free }}</ref> === Audio features === Line 19: === Feature-level fusion === Feature-level fusion (sometimes known as early fusion) gathers all the features from each [[modality (human–computer interaction)\|modality]] (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm.<ref name="s3">{{cite journal \|last1=Poria \|first1=Soujanya \|last2=Cambria \|first2=Erik \|last3=Howard \|first3=Newton \|last4=Huang \|first4=Guang-Bin \|last5=Hussain \|first5=Amir \|title=Fusing audio, visual and textual clues for sentiment analysis from multimodal content \|journal=Neurocomputing \|date=January 2016 \|volume=174 \|pages=50–59 \|doi=10.1016/j.neucom.2015.01.095\|s2cid=15287807 }}</ref> One of the difficulties in implementing this technique is the integration of the heterogeneous features.<ref name="s1" /> === Decision-level fusion ===

Multimodal sentiment analysis: Difference between revisions