Revision as of 05:03, 11 January 2024 edit Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,086 edits →See also: describing multimodal transformers ← Previous edit		Revision as of 14:01, 2 March 2024 edit undo Alenoach (talk \| contribs) Extended confirmed users 5,864 edits Moved up the section on multimodal transformers and changed the titles Tag: Visual edit Next edit →
Line 12: Many models and algorithms have been implemented to retrieve and classify certain types of data, e.g. image or text (where humans who interact with machines can extract images in form of pictures and texts that could be any message etc.). However, data usually come with different modalities (it is the degree to which a system's components may be separated or combined) which carry different information. For example, it is very common to caption an image to convey the information not presented in the image itself. Similarly, sometimes it is more straightforward to use an image to describe the information which may not be obvious from texts. As a result, if different words appear in similar images, then these words likely describe the same thing. Conversely, if a word is used to describe seemingly dissimilar images, then these images may represent the same object. Thus, in cases dealing with multi-modal data, it is important to use a model which is able to jointly represent the information such that the model can capture the correlation structure between different modalities. Moreover, it should also be able to recover missing modalities given observed ones (e.g. predicting possible image object according to text description). The '''Multimodal Deep Boltzmann Machine model''' satisfies the above purposes. == Multimodal transformers ==▼ ==Background: Boltzmann machine==▼ {{excerpt\|Transformer (machine learning model)\|Multimodality}}▼ A [[Boltzmann machine]] is a type of stochastic neural network invented by [[Geoffrey Hinton]] and [[Terry Sejnowski]] in 1985. Boltzmann machines can be seen as the [[stochastic process\|stochastic]], [[generative model\|generative]] counterpart of [[Hopfield net]]s. They are named after the [[Boltzmann distribution]] in statistical mechanics. The units in Boltzmann machines are divided into two groups: visible units and hidden units. General Boltzmann machines allow connection between any units. However, learning is impractical using general Boltzmann Machines because the computational time is exponential to the size of the machine{{Citation needed\|date=November 2022}}. A more efficient architecture is called '''[[restricted Boltzmann machine]]''' where connection is only allowed between hidden unit and visible unit, which is described in the next section.▼ ▲==~~Background:~~ Boltzmann ~~machine~~machines == === Background === ▲A [[Boltzmann machine]] is a type of stochastic neural network invented by [[Geoffrey Hinton]] and [[Terry Sejnowski]] in 1985. Boltzmann machines can be seen as the [[stochastic process\|stochastic]], [[generative model\|generative]] counterpart of [[Hopfield net]]s. They are named after the [[Boltzmann distribution]] in statistical mechanics. The units in Boltzmann machines are divided into two groups: visible units and hidden units. General Boltzmann machines allow connection between any units. However, learning is impractical using general Boltzmann Machines because the computational time is exponential to the size of the machine{{Citation needed\|date=November 2022}}. A more efficient architecture is called ~~'''~~[[restricted Boltzmann machine]]~~'''~~ where connection is only allowed between hidden unit and visible unit, which is described in the next section. ==== Restricted Boltzmann machine ==== A restricted Boltzmann machine<ref>{{cite web\|url=https://www.cs.cmu.edu/~bhiksha/courses/deeplearning/Fall.2014/pdfs/Smolensky.1986.pdf\|title=Restricted Boltzmann Machine\|year=1986\|access-date=2019-08-29\|archive-date=2016-03-03\|archive-url=https://web.archive.org/web/20160303223045/http://www.cs.cmu.edu/~bhiksha/courses/deeplearning/Fall.2014/pdfs/Smolensky.1986.pdf\|url-status=live}}</ref> is an undirected [[Graph (discrete mathematics)\|graph]] model with stochastic visible variables and stochastic hidden variables. Each visible variable is connected to each hidden variable. The energy function of the model is defined as :<math> E(\mathbf v,\mathbf h;\theta) = -\sum_{i=1}^D\sum_{j=1}^{F}W_{ij}v_ih_j -\sum_{i=1}^Db_iv_i -\sum_{j=1}^Fa_jh_j</math> Line 29 ⟶ 34: The derivative of the log-likelihood with respect to the model parameters can be decomposed as the difference between the ''model's expectation'' and ''data-dependent expectation''. ==== Gaussian-Bernoulli RBM ==== ~~'''~~Gaussian-Bernoulli RBMs~~'''~~<ref>{{cite web\|url=http://cseweb.ucsd.edu/~yfreund/papers/freund94unsupervised.pdf\|title=Gaussian-Bernoulli RBM\|year=1994\|access-date=2015-06-14\|archive-date=2015-07-01\|archive-url=https://web.archive.org/web/20150701142108/http://cseweb.ucsd.edu/~yfreund/papers/freund94unsupervised.pdf\|url-status=live}}</ref> are a variant of restricted Boltzmann machine used for modeling real-valued vectors such as pixel intensities. It is usually used to model the image data. The energy of the system of the Gaussian-Bernoulli RBM is defined as :<math> E(\mathbf v,\mathbf h;\theta) = \sum_{i=1}^D\frac{(v_i-b_i)^2}{2\sigma_i^2} -\sum_{i=1}^D\sum_{j=1}^{F}\frac{v_i}{\sigma_i}W_{ij}v_ih_j -\sum_{i=1}^Db_iv_i -\sum_{j=1}^Fa_jh_j</math> where <math>\theta = \{\mathbf a,\mathbf b,\mathbf w,\mathbf \sigma\}</math> are the model parameters. The joint distribution is defined the same as the one in [[#Restricted Boltzmann machine\|restricted Boltzmann machine]]. The conditional distributions now become Line 37 ⟶ 42: In Gaussian-Bernoulli RBM, the visible unit conditioned on hidden units is modeled as a Gaussian distribution. ==== Replicated Softmax Model ==== The ~~'''~~Replicated Softmax Model~~'''~~<ref>{{cite web\|url=http://papers.nips.cc/paper/3856-replicated-softmax-an-undirected-topic-model.pdf\|title=Replicated Softmax Model\|year=2009a\|access-date=2015-06-14\|archive-date=2015-10-01\|archive-url=https://web.archive.org/web/20151001154255/http://papers.nips.cc/paper/3856-replicated-softmax-an-undirected-topic-model.pdf\|url-status=live}}</ref> is also a variant of restricted Boltzmann machine and commonly used to model word count vectors in a document. In a typical [[text mining]] problem, let <math>K</math> be the dictionary size, and <math>M</math> be the number of words in the document. Let <math>\mathbf V</math> be a <math>M \times K</math> binary matrix with <math>v_{ik} = 1</math> only when the <math>i^{th}</math> word in the document is the <math>k^{th}</math> word in the dictionary. <math>\hat v_k</math> denotes the count for the <math>k^{th}</math> word in the dictionary. The energy of the state <math>\{\mathbf V,\mathbf h\}</math> for a document contains <math>M</math> words is defined as :<math>E(\mathbf V,\mathbf h) = -\sum_{j=1}^{F}\sum_{k=1}^{K}W_{jk}\hat v_kh_j - \sum_{k=1}^Kb_k\hat v_k - M\sum_{j=1}^{F}a_jh_j</math> The conditional distributions are given by Line 44 ⟶ 49: :<math>p(v_{ik} = 1\|\mathbf h) = \frac{\mathrm{exp}(b_k + \sum_{j=1}^Fh_jW_{jk}}{\sum_{q=1}^{K}\mathrm{exp}(b_q + \sum_{j=1}^Fh_jW_{jq}})</math> === Deep Boltzmann machines === A ~~'''~~deep Boltzmann machine~~'''~~<ref>{{cite web\|url=http://www.utstat.toronto.edu/~rsalakhu/papers/dbm.pdf\|title=Deep Boltzmann Machine\|year=2009b\|access-date=2015-06-14\|archive-date=2016-03-10\|archive-url=https://web.archive.org/web/20160310102345/http://www.utstat.toronto.edu/~rsalakhu/papers/dbm.pdf\|url-status=live}}</ref> has a sequence of layers of hidden units. There are only connections between adjacent hidden layers, as well as between visible units and hidden units in the first hidden layer. The energy function of the system adds layer interaction terms to the energy function of general restricted Boltzmann machine and is defined by <math> \begin{align} Line 57 ⟶ 62: :<math>P(\mathbf{v};\theta) = \frac{1}{\mathcal{Z}(\theta)}\sum_{\mathbf h}\mathrm{exp}(-E(\mathbf v,\mathbf h^{(1)},\mathbf h^{(2)},\mathbf h^{(3)};\theta))</math> === Multimodal deep Boltzmann machines === ~~'''~~Multimodal deep Boltzmann machine~~'''~~<ref>{{cite web\|url=http://papers.nips.cc/paper/4683-multimodal-learning-with-deep-boltzmann-machines.pdf\|title=Multimodal Learning with Deep Boltzmann Machine\|year=2012\|access-date=2015-06-14\|archive-date=2016-03-04\|archive-url=https://web.archive.org/web/20160304223211/http://papers.nips.cc/paper/4683-multimodal-learning-with-deep-boltzmann-machines.pdf\|url-status=live}}</ref><ref>{{cite web\|url=http://www.jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf\|title=Multimodal Learning with Deep Boltzmann Machine\|year=2014\|access-date=2015-06-14\|archive-date=2015-06-21\|archive-url=https://web.archive.org/web/20150621055730/http://jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf\|url-status=live}}</ref> uses an image-text bi-modal DBM where the image pathway is modeled as Gaussian-Bernoulli DBM and text pathway as Replicated Softmax DBM, and each DBM has two hidden layers and one visible layer. The two DBMs join together at an additional top hidden layer. The joint distribution over the multi-modal inputs defined as <math> \begin{align} Line 79 ⟶ 84: :<math>p(v_i^m\|\mathbf h^{(1m)}) \sim \mathcal{N}(\sigma_i\sum_{j=1}^{F_1^m}W_{ij}^{(1m)}h_j^{(1m)} + b_i^m,\sigma_i^2)</math> ==== Inference and learning ==== Exact maximum likelihood learning in this model is intractable, but approximate learning of DBMs can be carried out by using a variational approach, where mean-field inference is used to estimate data-dependent expectations and an MCMC based stochastic approximation procedure is used to approximate the model’s expected sufficient statistics.<ref>{{cite web \|url=http://icml2008.cs.helsinki.fi/papers/638.pdf Line 96 ⟶ 101: * Teaching hospital press release: {{cite news \|title=New AI technology integrates multiple data types to predict cancer outcomes \|url=https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|access-date=18 September 2022 \|work=[[Brigham and Women's Hospital]] via medicalxpress.com \|language=en \|archive-date=20 September 2022 \|archive-url=https://web.archive.org/web/20220920172825/https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|url-status=live }}</ref> ▲==Multimodal transformers== ▲{{excerpt\|Transformer (machine learning model)\|Multimodality}} ==See also== *[[Hopfield network]]

Multimodal learning: Difference between revisions