<!--- Don't mess with this line! ---><!--- Write your article below this line --->
'''Multimodal learning''', in the context of [[machine learning]], is a type of [[deep learning]] using a combination of various [[Modality (human–computer interaction)|modalities]] of data, oftensuch arisingas text, audio, or images, in order to create a more robust model of the real-world applicationsphenomena in question. AnIn examplecontrast, ofsingular multi-modal datalearning iswould data that combinesanalyze text (typically represented as [[feature vector]]) withor imaging data (consisting of [[pixel]] intensities and annotation tags) independently. AsMultimodal thesemachine modalitieslearning havecombines these fundamentally different statistical properties, combining them is non-trivial, which isanalyses whyusing specialized modellingmodeling strategies and algorithms, areresulting required.in Thea model isthat thencomes trainedcloser to ablerepresenting tothe understandreal and work with multiple forms of dataworld.