Revision as of 20:47, 8 October 2024 edit Scientia et sapientia (talk \| contribs) 12 edits →Application ← Previous edit		Revision as of 21:31, 8 October 2024 edit undo Scientia et sapientia (talk \| contribs) 12 edits →Application Next edit →
Line 23: Multimodal machine learning has numerous applications across various domains: * '''Cross-Modal Retrieval''': Cross-modal retrieval allows users to search for data across different modalities (e.g., retrieving images based on text descriptions), ~~enhancing~~improving multimedia search engines and content recommendation systems. Models like [[Contrastive Language-Image Pre-training\|CLIP]] facilitate efficient, accurate retrieval by embedding data in a shared space, demonstrating strong performance even in zero-shot settings.<ref>{{Cite arXiv \|last1=Hendriksen \|first1=Mariya \|last2=Vakulenko \|first2=Svitlana \|last3=Kuiper \|first3=Ernst \|last4=de Rijke \|first4=Maarten \|date=2023 \|title=Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study \|class=cs.CV \|eprint=2301.05174}}</ref> * '''Classification and Missing Data Retrieval''': Multimodal Deep Boltzmann Machines outperform traditional models like [[support vector machine]]s and [[latent Dirichlet allocation]] in classification tasks and can predict missing data in multimodal datasets, such as images and text. * '''Healthcare Diagnostics''': Multimodal models integrate medical imaging, genomic data, and patient records to improve diagnostic accuracy and early disease detection, especially in cancer screening.<ref>{{cite news \|last1=Quach \|first1=Katyanna \|title=Harvard boffins build multimodal AI system to predict cancer \|url=https://www.theregister.com/2022/08/09/ai_cancer_multimodal/ \|access-date=16 September 2022 \|work=The Register \|language=en \|archive-date=20 September 2022 \|archive-url=https://web.archive.org/web/20220920163859/https://www.theregister.com/2022/08/09/ai_cancer_multimodal/ \|url-status=live }}</ref><ref>{{cite journal \|last1=Chen \|first1=Richard J. \|last2=Lu \|first2=Ming Y. \|last3=Williamson \|first3=Drew F. K. \|last4=Chen \|first4=Tiffany Y. \|last5=Lipkova \|first5=Jana \|last6=Noor \|first6=Zahra \|last7=Shaban \|first7=Muhammad \|last8=Shady \|first8=Maha \|last9=Williams \|first9=Mane \|last10=Joo \|first10=Bumjin \|last11=Mahmood \|first11=Faisal \|title=Pan-cancer integrative histology-genomic analysis via multimodal deep learning \|journal=Cancer Cell \|date=8 August 2022 \|volume=40 \|issue=8 \|pages=865–878.e6 \|doi=10.1016/j.ccell.2022.07.004 \|pmid=35944502 \|s2cid=251456162 \|language=English \|issn=1535-6108\|doi-access=free \|pmc=10397370 }} * Teaching hospital press release: {{cite news \|title=New AI technology integrates multiple data types to predict cancer outcomes \|url=https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|access-date=18 September 2022 \|work=[[Brigham and Women's Hospital]] via medicalxpress.com \|language=en \|archive-date=20 September 2022 \|archive-url=https://web.archive.org/web/20220920172825/https://medicalxpress.com/news/2022-08-ai-technology-multiple-cancer-outcomes.html \|url-status=live }}</ref><ref>{{Cite arXiv \|last1=Shi \|first1=Yuge \|last2=Siddharth \|first2=N. \|last3=Paige \|first3=Brooks \|last4=Torr \|first4=Philip HS \|year=2019 \|title=Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models \|eprint=1911.03393 \|class=cs.LG}}</ref> * '''Content Generation''': Models like DALL·E generate images from textual descriptions, benefiting creative industries, while cross-modal retrieval enables dynamic multimedia searches.<ref>{{Cite arXiv \|last1=Shi \|first1=Yuge \|last2=Siddharth \|first2=N. \|last3=Paige \|first3=Brooks \|last4=Torr \|first4=Philip HS \|date=2019 \|title=Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models \|class=cs.LG \|eprint=1911.03393}}</ref> * '''Robotics and HCI''': Multimodal learning improves interaction in robotics and AI by integrating sensory inputs like speech, vision, and touch, aiding autonomous systems and human-computer interaction.

Multimodal learning: Difference between revisions