Originally designed for [[human–robot interaction]], Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment.<ref>{{Cite journalbook|last1=Chau|first1=Aaron|last2=Sekiguchi|first2=Kouhei|last3=Nugraha|first3=Aditya Arie|last4=Yoshii|first4=Kazuyoshi|last5=Funakoshi|first5=Kotaro|datetitle=October 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) |titlechapter=Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments |journaldate=2019 28thOctober IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)2019|___location=New Delhi, India|publisher=IEEE|pages=1–8|doi=10.1109/RO-MAN46459.2019.8956321|isbn=978-1-7281-2622-7|s2cid=210697281}}</ref> Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement.
}}</ref> Other pioneering work in this field was conducted by the research group of [[Hugh F. Durrant-Whyte]] in the early 1990s.<ref name=Leonard1991>{{cite journalbook
|last1=Leonard|first1=J.J.
|last2=Durrant-whyte|first2=H.F.
|title=Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91
|titlechapter=Simultaneous map building and localization for an autonomous mobile robot▼
|year=1991
▲|title=Simultaneous map building and localization for an autonomous mobile robot
|journal=Intelligent Robots and Systems' 91.'Intelligence for Mechanical Systems, Proceedings IROS'91. IEEE/RSJ International Workshop on