Revision as of 22:05, 23 October 2024 edit Scientia et sapientia (talk \| contribs) 12 edits →Cross-modal retrieval ← Previous edit		Revision as of 08:44, 24 October 2024 edit undo Miguel Andrade (talk \| contribs) Extended confirmed users 1,921 edits m →Multimodal deep Boltzmann machines Next edit →
Line 14: {{excerpt\|Large language model\|Multimodality}} == Multimodal deep Boltzmann machines == A [[Boltzmann machine]] is a type of [[stochastic neural network]] invented by [[Geoffrey Hinton]] and [[Terry Sejnowski]] in 1985. Boltzmann machines can be seen as the [[stochastic process\|stochastic]], [[generative model\|generative]] counterpart of [[Hopfield net]]s. They are named after the [[Boltzmann distribution]] in statistical mechanics. The units in Boltzmann machines are divided into two groups: visible units and hidden units. Each unit is like a neuron with a binary output that represents whether it's is activated or not.<ref>{{Cite web \|last=Dey \|first=Victor \|date=2021-09-03 \|title=Beginners Guide to Boltzmann Machine \|url=https://analyticsindiamag.com/beginners-guide-to-boltzmann-machines/ \|access-date=2024-03-02 \|website=Analytics India Magazine \|language=en-US}}</ref> General Boltzmann machines allow connection between any units. However, learning is impractical using general Boltzmann Machines because the computational time is exponential to the size of the machine{{Citation needed\|date=November 2022}}. A more efficient architecture is called [[restricted Boltzmann machine]] where connection is only allowed between hidden unit and visible unit, which is described in the next section. Multimodal deep Boltzmann machines can process and learn from different types of information, such as images and text, simultaneously. This can notably be done by having a separate deep Boltzmann machine for each modality, for example one for images and one for text, joined at an additional top hidden layer.<ref>{{cite web \|year=2014 \|title=Multimodal Learning with Deep Boltzmann Machine \|url=http://www.jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf \|url-status=live \|archive-url=https://web.archive.org/web/20150621055730/http://jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf \|archive-date=2015-06-21 \|access-date=2015-06-14}}</ref>

Multimodal learning: Difference between revisions