[[File:RoboCup Rescue arena map generated by robot Hector from Darmstadt at 2010 German open.jpg|thumb|A map generated by a SLAM Robot]]
'''Simultaneous localization and mapping''' ('''SLAM''') is the computational problem of constructing or updating a [[map]] of an unknown environment while simultaneously keeping track of an [[Intelligent agent|agent]]'s ___location within it. While this initially appears to be a [[chicken or the egg]] problem, there are several [[algorithm]]s known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the [[particle filter]], extended [[Kalman filter]], covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in [[computational geometry]] and [[computer vision]], and are used in [[robot navigation]], [[robotic mapping]] and [[odometry]] for [[virtual reality]] or [[augmented reality]].
SLAM algorithms are tailored to the available resources and are not aimed at perfection but at operational compliance. Published approaches are employed in [[self-driving car]]s, [[unmanned aerial vehicle]]s, [[autonomous underwater vehicle]]s, [[Rover (space exploration)|planetary rovers]], newer [[domestic robot]]s and even inside the human body.
== Mathematical description of the problem ==
{{Expand section|Sources and citations as well as mathematical style convention explanation|date=August 2022}}
Given a series of controls <math>u_t</math> and sensor observations <math>o_t</math> over discrete time steps <math>t</math>, the SLAM problem is to compute an estimate of the agent's state <math>x_t</math> and a map of the environment <math>m_t</math>. All quantities are usually probabilistic, so the objective is to compute<ref>{{cite book |last1=Thrun |first1=Sebastian |authorlink = Sebastian Thrun |last2=Burgard |first2=Wolfram |authorlink2 = Wolfram Burgard |last3=Fox |first3=Dieter |authorlink3 = Dieter Fox|date= |title=Probabalistic Robotics |publisher= The MIT Press |page= 309}}</ref>
:
url=http://www.ensta-bretagne.fr/jaulin/paper_dig_slam.pdf|doi=10.1109/TRO.2011.2147110|s2cid=52801599}}
</ref>
They provide a set which encloses the pose of the robot and a set approximation of the map. [[Bundle adjustment]], and more generally [[maximum a posteriori estimation]] (MAP), is another popular technique for SLAM using image data, which jointly estimates poses and landmark positions, increasing map fidelity, and is used in commercialized SLAM systems such as Google's ARCore which replaces their prior [[augmented reality]] [[computing platform]] named [[Tango (platform)|Tango]], formerly ''Project Tango''. MAP estimators compute the most likely explanation of the robot poses and the map given the sensor data, rather than trying to estimate the entire posterior probability.
New SLAM algorithms remain an active research area,<ref name=":0">{{Cite journal|last1=Cadena|first1=Cesar|last2=Carlone|first2=Luca|last3=Carrillo|first3=Henry|last4=Latif|first4=Yasir|last5=Scaramuzza|first5=Davide|last6=Neira|first6=Jose|last7=Reid|first7=Ian|last8=Leonard|first8=John J.|date=2016|title=Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age|journal=IEEE Transactions on Robotics|language=en-US|volume=32|issue=6|pages=1309–1332|arxiv=1606.05830|bibcode=2016arXiv160605830C|doi=10.1109/tro.2016.2624754|issn=1552-3098|hdl=2440/107554|s2cid=2596787}}</ref> and are often driven by differing requirements and assumptions about the types of maps, sensors and models as detailed below. Many SLAM systems can be viewed as combinations of choices from each of these aspects.
=== Mapping ===
[[Topological map]]smaps are a method of environment representation which capture the connectivity (i.e., [[topology]]) of the environment rather than creating a geometrically accurate map. Topological SLAM approaches have been used to enforce global consistency in metric SLAM algorithms.<ref name=cummins2008>
{{cite journal
|last1=Cummins|first1=Mark
|access-date=23 July 2014}}</ref>
In contrast, [[grid map]]smaps use arrays (typically square or hexagonal) of discretized cells to represent a topological world, and make inferences about which cells are occupied. Typically the cells are assumed to be statistically independent to simplify computation. Under such assumption, <math>P(m_t | x_t, m_{t-1}, o_t )</math> are set to 1 if the new map's cells are consistent with the observation <math>o_t</math> at ___location <math>x_t</math> and 0 if inconsistent.
Modern [[self driving cars]] mostly simplify the mapping problem to almost nothing, by making extensive use of highly detailed map data collected in advance. This can include map annotations to the level of marking locations of individual white line segments and curbs on the road. Location-tagged visual data such as Google's [[StreetView]] may also be used as part of maps. Essentially such systems simplify the SLAM problem to a simpler [[Robot localization|localization]] only task, perhaps allowing for moving objects such as cars and people only to be updated in the map at runtime.
=== Sensing ===
=== Audiovisual SLAM ===
Originally designed for [[human–robot interaction]], Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment.<ref>{{Cite book|last1=Chau|first1=Aaron|last2=Sekiguchi|first2=Kouhei|last3=Nugraha|first3=Aditya Arie|last4=Yoshii|first4=Kazuyoshi|last5=Funakoshi|first5=Kotaro|title=2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) |chapter=Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments |date=October 2019|___location=New Delhi, India|publisher=IEEE|pages=1–8|doi=10.1109/RO-MAN46459.2019.8956321|isbn=978-1-7281-2622-7|s2cid=210697281}}</ref> Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement.
== Implementation methods ==
|