Revision as of 20:16, 4 November 2021 edit Unbuggy (talk \| contribs) 16 edits m Tweak wording, mostly for grammar, in the first paragraph ← Previous edit		Revision as of 03:50, 22 January 2022 edit undo Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,084 edits m linking Next edit →
Line 5: }} <!--- Don't mess with this line! ---><!--- Write your article below this line ---> Information in the real world usually comes as different [[Modality (human–computer interaction)\|modalities]]. For example, images are usually associated with tags and text explanations; text contains images to more clearly express the main idea of the article. Different modalities are characterized by different statistical properties. For instance, images are usually represented as [[pixel]] intensities or outputs of [[Feature extraction\|feature extractors]], while texts are represented as discrete word count vectors. Due to the distinct statistical properties of different information resources, it is important to discover the relationship between different modalities. '''Multimodal learning''' is a good model to represent the joint representations of different modalities. The '''multimodal learning model''' is also capable of supplying a missing modality based on observed ones. The multimodal learning model combines two [[Deep Boltzmann Machines\|deep Boltzmann machines]], each corresponding to one modality. An additional hidden layer is placed on top of the two Boltzmann Machines to produce the joint representation. ==Motivation==

Multimodal learning: Difference between revisions