Revision as of 21:51, 23 October 2024 edit Scientia et sapientia (talk \| contribs) 12 edits No edit summary ← Previous edit		Revision as of 22:04, 23 October 2024 edit undo Scientia et sapientia (talk \| contribs) 12 edits →Application Next edit →
Line 18: Multimodal deep Boltzmann machines can process and learn from different types of information, such as images and text, simultaneously. This can notably be done by having a separate deep Boltzmann machine for each modality, for example one for images and one for text, joined at an additional top hidden layer.<ref>{{cite web \|year=2014 \|title=Multimodal Learning with Deep Boltzmann Machine \|url=http://www.jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf \|url-status=live \|archive-url=https://web.archive.org/web/20150621055730/http://jmlr.org/papers/volume15/srivastava14b/srivastava14b.pdf \|archive-date=2015-06-21 \|access-date=2015-06-14}}</ref> ==~~Application~~ Applications == Multimodal machine learning has numerous applications across various domains: === Cross-modal retrieval === * '''Cross-Modal Retrieval''': Cross-modal retrieval allows users to search for data across different modalities (e.g., retrieving images based on text descriptions), improving multimedia search engines and content recommendation systems. Models like [[Contrastive Language-Image Pre-training\|CLIP]] facilitate efficient, accurate retrieval by embedding data in a shared space, demonstrating strong performance even in zero-shot settings.<ref>{{Cite arXiv \|last1=Hendriksen \|first1=Mariya \|last2=Vakulenko \|first2=Svitlana \|last3=Kuiper \|first3=Ernst \|last4=de Rijke \|first4=Maarten \|date=2023 \|title=Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study \|class=cs.CV \|eprint=2301.05174}}</ref> === ~~'''~~Classification and Missing Data Retrieval~~''':~~ === Multimodal Deep Boltzmann Machines outperform traditional models like [[support vector machine]]s and [[latent Dirichlet allocation]] in classification tasks and can predict missing data in multimodal datasets, such as images and text. === Healthcare Diagnostics === '''Healthcare Diagnostics''': Multimodal models integrate medical imaging, genomic data, and patient records to improve diagnostic accuracy and early disease detection, especially in cancer screening.<ref>{{cite news \|last1=Quach \|first1=Katyanna \|title=Harvard boffins build multimodal AI system to predict cancer \|url=https://www.theregister.com/2022/08/09/ai_cancer_multimodal/ \|access-date=16 September 2022 \|work=The Register \|language=en \|archive-date=20 September 2022 \|archive-url=https://web.archive.org/web/20220920163859/https://www.theregister.com/2022/08/09/ai_cancer_multimodal/ \|url-status=live }}</ref><ref>{{cite journal \|last1=Chen \|first1=Richard J. \|last2=Lu \|first2=Ming Y. \|last3=Williamson \|first3=Drew F. K. \|last4=Chen \|first4=Tiffany Y. \|last5=Lipkova \|first5=Jana \|last6=Noor \|first6=Zahra \|last7=Shaban \|first7=Muhammad \|last8=Shady \|first8=Maha \|last9=Williams \|first9=Mane \|last10=Joo \|first10=Bumjin \|last11=Mahmood \|first11=Faisal \|title=Pan-cancer integrative histology-genomic analysis via multimodal deep learning \|journal=Cancer Cell \|date=8 August 2022 \|volume=40 \|issue=8 \|pages=865–878.e6 \|doi=10.1016/j.ccell.2022.07.004 \|pmid=35944502 \|s2cid=251456162 \|language=English \|issn=1535-6108\|doi-access=free \|pmc=10397370 }} * Teaching hospital press release: {{cite news \|title=New AI technology integrates multiple data types to predict cancer outcomes \|url=https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|access-date=18 September 2022 \|work=[[Brigham and Women's Hospital]] via medicalxpress.com \|language=en \|archive-date=20 September 2022 \|archive-url=https://web.archive.org/web/20220920172825/https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|url-status=live }}</ref><ref>{{Cite arXiv \|last1=Shi \|first1=Yuge \|last2=Siddharth \|first2=N. \|last3=Paige \|first3=Brooks \|last4=Torr \|first4=Philip HS \|year=2019 \|title=Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models \|eprint=1911.03393 \|class=cs.LG}}</ref> === Content Generation === * '''Content Generation''': Models like DALL·E generate images from textual descriptions, benefiting creative industries, while cross-modal retrieval enables dynamic multimedia searches.<ref>{{Cite arXiv \|last1=Shi \|first1=Yuge \|last2=Siddharth \|first2=N. \|last3=Paige \|first3=Brooks \|last4=Torr \|first4=Philip HS \|date=2019 \|title=Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models \|class=cs.LG \|eprint=1911.03393}}</ref> === Robotics and HCI === * '''Robotics and HCI''': Multimodal learning improves interaction in robotics and AI by integrating sensory inputs like speech, vision, and touch, aiding autonomous systems and human-computer interaction. *=== ~~'''~~Emotion Recognition~~''':~~ === Combining visual, audio, and text data, multimodal systems enhance sentiment analysis and emotion recognition, applied in customer service, social media, and marketing. ==See also==

Multimodal learning: Difference between revisions