Content deleted Content added
m Clarified concepts, making them more concrete and accessible. |
Reworked the introduction, and removed the template "Multiple issues" at the top and the obsolete invisible comments |
||
Line 1:
{{Short description|Machine learning methods using multiple input modalities}}
{{machine learning}}
'''Multimodal learning''', in the context of [[machine learning]], is a type of [[deep learning]] using multiple [[Modality (human–computer interaction)|modalities]] of data, such as text, audio, or images.
In contrast, unimodal models can process only one type of data, such as text (typically represented as [[feature vector|feature vectors]]) or images. Multimodal learning is different from combining unimodal models trained independently. It combines information from different modalities in order to make better predictions.<ref>{{Cite web |last=Rosidi |first=Nate |date=March 27, 2023 |title=Multimodal Models Explained |url=https://www.kdnuggets.com/multimodal-models-explained |access-date=2024-06-01 |website=KDnuggets |language=en-US}}</ref>
Large multimodal models, such as [[Google Gemini]] and [[GPT-4o]], have become increasingly popular since 2023, enabling increased versatility and a more robust model of real-world phenomena.<ref>{{Cite web |last=Zia |first=Tehseen |date=January 8, 2024 |title=Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024 |url=https://www.unite.ai/unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024/ |access-date=2024-06-01 |website=Unite.ai}}</ref>
==Motivation==
Line 34 ⟶ 33:
==References==
{{reflist}}
[[Category:Artificial neural networks]]
[[Category:Multimodal interaction]]
|