<!--- Don't mess with this line! ---><!--- Write your article below this line --->
TheInformation informationin inthe real world usually comes as different modalities. For example, images are usually associated with tags and text explanations; textstext containcontains images to more clearly express the main idea of the article. Different modalities are characterized by very different statistical properties. For instance, images are usually represented as [[pixel]] intensities or outputs of [[Feature extraction|feature extractors]], while texts are represented as discrete word count vectors. Due to the distinct statistical properties of different information resources, it is very important to discover the relationship between different modalities. '''Multimodal learning''' is a good model to represent the joint representations of different modalities. The '''multimodal learning model''' is also capable toof fillsupplying a missing modality givenbased theon observed ones. The multimodal learning model combines two [[Deep Boltzmann Machines|deep Boltzmann machines]], each correspondscorresponding to one modality. An additional hidden layer is placed on top of the two Boltzmann Machines to giveproduce the joint representation.