EM algorithm and GMM model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:58, 21 March 2021 edit Rogermx (talk \| contribs) Extended confirmed users, IP block exemptions, Pending changes reviewers 57,050 edits Added links Tag: Visual edit ← Previous edit		Latest revision as of 09:51, 19 March 2025 edit undo 2a02:a468:a993:1:aa93:5f70:d43b:e5de (talk) →EM algorithm in GMM: Removed the double block
(7 intermediate revisions by 7 users not shown)
Line 1: ~~{{Multiple issues\|~~ ~~{{Underlinked\|date=September 2020}}~~ ~~{{Orphan\|date=September 2020}}~~ }} In statistics, [[Expectation–maximization algorithm\|EM (expectation maximization)]] algorithm handles latent variables, while [[Mixture model#Gaussian mixture model\|GMM]] is the Gaussian mixture model. Line 9 ⟶ 4: In the picture below, are shown the [[red blood cell]] hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the Control Group (i.e. the group of people without [[Anemia]]). As expected, people with Anemia have lower red blood cell volume and lower red blood cell [[hemoglobin]] concentration than those without Anemia. [[File:Labeled GMM.png\|thumb\|GMM model with labels]] <math>x</math> is a [[random vector]] such as <math>x:=\big(\text{red blood cell volume}, \text{red blood cell hemoglobin concentration}\big)</math>, and from medical studies{{Citation ~~[cite~~needed\|date=July ~~source]~~2022}} it is known that <math>x</math> are [[Normal distribution\|normally distributed]] in each group, i.e. <math>x \sim \mathcal N(\mu, \Sigma)</math>. <math>z</math> is denoted as the group where <math>x</math> belongs, with <math>z_i = 0</math> when <math>x_i</math> belongs to Anemia Group and <math>z_i=1</math> when <math>x_i</math> belongs to Control Group. Also <math>z \sim \operatorname{Categorical}(k, \phi)</math> where <math>k=2</math>, <math>\phi_j \geq 0,</math> and <math>\sum_{j=1}^k\phi_j=1</math>. See [[Categorical distribution]]. Line 31 ⟶ 26: : <math>\mu_j =\frac{\sum_{i=1}^m 1\{z^{(i)}=j\} x^{(i)}}{\sum_{i=1}^{m} 1\left\{z^{(i)}=j\right\}}</math> : <math>\Sigma_j =\frac{\sum_{i=1}^m 1\{z^{(i)}=j\} (x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^T}{\sum_{i=1}^m 1\{z^{(i)}=j\}}</math><ref name="Stanford CS229 Notes">{{cite web \|last1=Ng \|first1=Andrew \|title=CS229 Lecture notes \|url=~~http~~https://cs229.stanford.edu/~~notes~~summer2023/cs229-notes8.pdf}}</ref> If <math>z_i</math> is known, the estimation of the parameters results to be quite simple with [[maximum likelihood estimation]]. But if <math>z_i</math> is unknown it is much more complicated.<ref name="Machine Learning —Expectation-Maximization Algorithm (EM)">{{cite web \|last1=Hui \|first1=Jonathan \|title=Machine Learning —Expectation-Maximization Algorithm (EM) \|url=https://medium.com/@jonathan_hui/machine-learning-expectation-maximization-algorithm-em-2e954cb76959 \|website=Medium \|language=en \|date=13 October 2019}}</ref> Line 39 ⟶ 34: <ref name="Multivariate normal distribution">{{cite web \|last1=Tong \|first1=Y. L. \|title=Multivariate normal distribution \|url=https://en.wikipedia.org/wiki/Multivariate_normal_distribution \|website=Wikipedia \|language=en \|date=2 July 2020}}</ref>{{Circular reference\|date=July 2020}} In [[machine learning]], the latent variable <math>z</math> is considered as a latent pattern lying under the data, which the observer is not able to see very directly. <math>x_i</math> is the known data, while <math>\phi, \mu, \Sigma</math> are the parameter of the model. With the EM algorithm, some underlying pattern <math>z</math> in the data <math>x_i</math> can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important.<ref name="Inference using EM algorithm">{{cite web \|last1=Misra \|first1=Rishabh \|title=Inference using EM algorithm \|url=https://towardsdatascience.com/inference-using-em-algorithm-d71cccb647bc \|website=Medium \|language=en \|date=7 June 2020}}</ref> [[File:GMM Training on artificial data.gif\|thumb\|alt=Animation of updates to a GMM at each update to the distribution in the EM algorithm.\|GMM Training on artificial data]] == EM algorithm in GMM == Line 49 ⟶ 46: 1. (E-step) For each <math>i, j</math>, set <math>w_{j}^{(i)}:=p\left(z^{(i)}=j \| x^{(i)} ; \phi, \mu, \Sigma\right)</math> Line 57 ⟶ 53: <math>\Sigma_{j} :=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}}</math> <ref name="Stanford CS229 Notes">{{cite web \|last1=Ng \|first1=Andrew \|title=CS229 Lecture notes \|url=~~http~~https://cs229.stanford.edu/~~notes~~summer2023/cs229-notes8.pdf}}</ref> With [[Bayes' Rule\|Bayes Rule]], the following result is obtained by the E-step: Line 64 ⟶ 60: According to GMM setting, these following formulas are obtained: <math>p\left(x^{(i)} \| z^{(i)}=j ; \mu, \Sigma\right)=\frac{1}{(2 \pi)^{n / 2}\left\|\Sigma_{j}\right\|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right)</math> <math>p\left(z^{(i)}=j ; \phi\right)=\phi_j</math>