Content deleted Content added
m Tweak wording, mostly for grammar, in the first paragraph |
m linking |
||
Line 5:
}}
<!--- Don't mess with this line! ---><!--- Write your article below this line --->
Information in the real world usually comes as different [[Modality (human–computer interaction)|modalities]]. For example, images are usually associated with tags and text explanations; text contains images to more clearly express the main idea of the article. Different modalities are characterized by different statistical properties. For instance, images are usually represented as [[pixel]] intensities or outputs of [[Feature extraction|feature extractors]], while texts are represented as discrete word count vectors. Due to the distinct statistical properties of different information resources, it is important to discover the relationship between different modalities. '''Multimodal learning''' is a good model to represent the joint representations of different modalities. The '''multimodal learning model''' is also capable of supplying a missing modality based on observed ones. The multimodal learning model combines two [[Deep Boltzmann Machines|deep Boltzmann machines]], each corresponding to one modality. An additional hidden layer is placed on top of the two Boltzmann Machines to produce the joint representation.
==Motivation==
|