Content deleted Content added
→Motivation: I improved the page by expressing myself to people's understanding Tags: Reverted Mobile edit Mobile web edit |
|||
Line 9:
==Motivation==
A lot of models/algorithms have been implemented to retrieve and classify a certain type of data, e.g. image or text (where humans who interact with machines can extract images in a form of pictures and text that could be any message etc). However, data usually comes with different modalities (it is the degree to which a system's components may be separated or combined) which carry different information. For example, it is very common to caption an image to convey the information not presented by this image. Similarly, sometimes it is more straightforward to use an image to describe the information which may not be obvious from texts. As a result, if some different words appear in similar images, these words are very likely used to describe the same thing. Conversely, if some words are used in different images, these images may represent the same object. Thus, it is important to invite a novel model which is able to jointly represent the information such that the model can capture the correlation structure between different modalities. Moreover, it should also be able to recover missing modalities given observed ones, e.g. predicting possible image object according to text description. The '''Multimodal Deep Boltzmann Machine model''' satisfies the above purposes.
==Background: Boltzmann machine==
|