Simultaneous localization and mapping: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:04, 3 July 2023 edit 2.30.69.70 (talk) →Algorithms ← Previous edit		Latest revision as of 20:41, 23 June 2025 edit undo JCW-CleanerBot (talk \| contribs) Bots 136,899 edits m clean up, replaced: IEEE Robotics Automation Magazine → IEEE Robotics & Automation Magazine Tag: AWB
(23 intermediate revisions by 17 users not shown)
Line 3: [[File:RoboCup Rescue arena map generated by robot Hector from Darmstadt at 2010 German open.jpg\|thumb\|A map generated by a SLAM Robot]] '''Simultaneous localization and mapping''' ('''SLAM''') is the computational problem of constructing or updating a [[map]] of an unknown environment while simultaneously keeping track of an [[Intelligent agent\|agent]]'s ___location within it. While this initially appears to be a [[chicken or the egg]] problem, there are several [[algorithm]]s known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the [[particle filter]], extended [[Kalman filter]], covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in [[computational geometry]] and [[computer vision]], and are used in [[robot navigation]], [[robotic mapping]] and [[odometry]] for [[virtual reality]] or [[augmented reality]]. SLAM algorithms are tailored to the available resources and are not aimed at perfection but at operational compliance. Published approaches are employed in [[self-driving car]]s, [[unmanned aerial vehicle]]s, [[autonomous underwater vehicle]]s, [[Rover (space exploration)\|planetary rovers]], newer [[domestic robot]]s and even inside the human body. == Mathematical description of the problem == ~~{{Expand section\|Sources and citations as well as mathematical style convention explanation\|date=August 2022}}~~ Given a series of controls <math>u_t</math> and sensor observations <math>o_t</math> over discrete time steps <math>t</math>, the SLAM problem is to compute an estimate of the agent's state <math>x_t</math> and a map of the environment <math>m_t</math>. All quantities are usually probabilistic, so the objective is to compute<ref>{{cite book \|last1=Thrun \|first1=Sebastian \|authorlink = Sebastian Thrun \|last2=Burgard \|first2=Wolfram \|authorlink2 = Wolfram Burgard \|last3=Fox \|first3=Dieter \|authorlink3 = Dieter Fox\|date= \|title=Probabalistic Robotics \|publisher= The MIT Press \|page= 309}}</ref> : :<math> P(m_{t+1},x_{t+1}\|o_{1:t+1},u_{1:t}) </math> Line 25 ⟶ 23: == Algorithms == Statistical techniques used to approximate the above equations include [[Kalman filter]]s and [[particle filter]]s (the algorithm behind Monte Carlo Localization). They provide an estimation of the [[posterior probability distribution]] for the pose of the robot and for the parameters of the map. Methods which conservatively approximate the above model using [[covariance intersection]] are able to avoid reliance on statistical independence assumptions to reduce algorithmic complexity for large-scale applications.<ref>{{cite conference\|last1= Julier\|first1=S.\|last2=Uhlmann\|first2=J.\|title=Building a Million-Beacon Map.\|conference=Proceedings of ISAM Conference on Intelligent Systems for Manufacturing\|year=2001\|doi=10.1117/12.444158}}</ref> Other approximation methods achieve improved computational efficiency by using simple bounded-region representations of uncertainty.<ref>{{cite conference\|last1= Csorba\|first1=M.\|last2=Uhlmann\|first2=J.\|title=A Suboptimal Algorithm for Automatic Map Building.\|conference=Proceedings of the 1997 American Control Conference\|year=1997\|doi=10.1109/ACC.1997.611857}}</ref> [[Set estimation\|Set-membership techniques]] are mainly based on [[interval propagation\|interval constraint propagation]].<ref> Line 40 ⟶ 38: url=http://www.ensta-bretagne.fr/jaulin/paper_dig_slam.pdf\|doi=10.1109/TRO.2011.2147110\|s2cid=52801599}} </ref> They provide a set which encloses the pose of the robot and a set approximation of the map. [[Bundle adjustment]], and more generally [[maximum a posteriori estimation]] (MAP), is another popular technique for SLAM using image data, which jointly estimates poses and landmark positions, increasing map fidelity, and is used in commercialized SLAM systems such as Google's ARCore which replaces their prior [[augmented reality]] [[computing platform]] named [[Tango ~~(platform)\|Tango]]~~, formerly ''Project Tango''. MAP estimators compute the most likely explanation of the robot poses and the map given the sensor data, rather than trying to estimate the entire posterior probability. New SLAM algorithms remain an active research area,<ref name=":0">{{Cite journal\|last1=Cadena\|first1=Cesar\|last2=Carlone\|first2=Luca\|last3=Carrillo\|first3=Henry\|last4=Latif\|first4=Yasir\|last5=Scaramuzza\|first5=Davide\|last6=Neira\|first6=Jose\|last7=Reid\|first7=Ian\|last8=Leonard\|first8=John J.\|date=2016\|title=Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age\|journal=IEEE Transactions on Robotics\|language=en-US\|volume=32\|issue=6\|pages=1309–1332\|arxiv=1606.05830\|bibcode=2016arXiv160605830C\|doi=10.1109/tro.2016.2624754\|issn=1552-3098\|hdl=2440/107554\|s2cid=2596787}}</ref> and are often driven by differing requirements and assumptions about the types of maps, sensors and models as detailed below. Many SLAM systems can be viewed as combinations of choices from each of these aspects. === Mapping === [[Topological ~~map]]s~~maps are a method of environment representation which capture the connectivity (i.e., [[topology]]) of the environment rather than creating a geometrically accurate map. Topological SLAM approaches have been used to enforce global consistency in metric SLAM algorithms.<ref name=cummins2008> {{cite journal \|last1=Cummins\|first1=Mark Line 58 ⟶ 56: \|access-date=23 July 2014}}</ref> In contrast, [[grid ~~map]]s~~maps use arrays (typically square or hexagonal) of discretized cells to represent a topological world, and make inferences about which cells are occupied. Typically the cells are assumed to be statistically independent to simplify computation. Under such assumption, <math>P(m_t \| x_t, m_{t-1}, o_t )</math> are set to 1 if the new map's cells are consistent with the observation <math>o_t</math> at ___location <math>x_t</math> and 0 if inconsistent. Modern [[self driving cars]] mostly simplify the mapping problem to almost nothing, by making extensive use of highly detailed map data collected in advance. This can include map annotations to the level of marking locations of individual white line segments and curbs on the road. Location-tagged visual data such as Google's [[StreetView]] may also be used as part of maps. Essentially such systems simplify the SLAM problem to a simpler ~~[[Robot localization\|~~localization]] only task, perhaps allowing for moving objects such as cars and people only to be updated in the map at runtime. === Sensing === Line 69 ⟶ 67: Sensor models divide broadly into landmark-based and raw-data approaches. Landmarks are uniquely identifiable objects in the world which ___location can be estimated by a sensor, such as [[Wi-Fi]] access points or radio beacons. Raw-data approaches make no assumption that landmarks can be identified, and instead model <math>P(o_t\|x_t)</math> directly as a function of the ___location. Optical sensors may be one-dimensional (single beam) or 2D- (sweeping) [[laser rangefinder]]s, 3D high definition light detection and ranging ([[lidar]]), 3D flash lidar, 2D or 3D [[sonar]] sensors, and one or more 2D [[camera]]s.<ref name="magnabosco13slam"/> Since ~~2005~~the invention of local features, such as [[scale-invariant feature transform\|SIFT]], there has been intense research into visual SLAM (VSLAM) using primarily visual (camera) sensors, because of the increasing ubiquity of cameras such as those in mobile devices.<ref name=~~KarlssonEtAl2005~~Se2001> {{cite conference \|last1=Se\|first1=Stephen \|collaboration=James J. Little;David Lowe \|year=2001 \|title=Vision-based mobile robot localization and mapping using scale-invariant features \|conference=Int. Conf. on Robotics and Automation (ICRA) \|doi=10.1109/ROBOT.2001.932909 }}</ref> Follow up research includes.<ref name=KarlssonEtAl2005>{{cite conference \|last1=Karlsson\|first1=N. \|collaboration=Di Bernardo, E.; Ostrowski, J; Goncalves, L.; Pirjanian, P.; Munich, M. Line 76 ⟶ 83: \|conference=Int. Conf. on Robotics and Automation (ICRA) \|doi=10.1109/ROBOT.2005.1570091 }}</ref> ~~Visual~~Both visual and [[lidar]] sensors are informative enough to allow for landmark extraction in many cases. Other recent forms of SLAM include tactile SLAM<ref>{{cite conference\|last1= Fox\|first1=C.\|last2=Evans\|first2=M.\|last3=Pearson\|first3=M.\|last4=Prescott\|first4=T.\|title=Tactile SLAM with a biomimetic whiskered robot.\|conference=Proc. IEEE Int. Conf. on Robotics and Automation (ICRA)\|year=2012\|url=http://eprints.uwe.ac.uk/18384/1/fox_icra12_submitted.pdf}}</ref> (sensing by local touch only), radar SLAM,<ref>{{cite conference\|last1=Marck\|first1=J.W.\|last2=Mohamoud\|first2=A.\|last3=v.d. Houwen\|first3=E.\|last4=van Heijster\|first4=R.\|title=Indoor radar SLAM A radar application for vision and GPS denied environments.\|conference=Radar Conference (EuRAD), 2013 European\|year=2013\|url=http://publications.tno.nl/publication/34607287/4nJ48k/marck-2013-indoor.pdf}}</ref> acoustic SLAM,<ref>Evers, Christine, Alastair H. Moore, and Patrick A. Naylor. "[https://spiral.imperial.ac.uk/bitstream/10044/1/38877/2/2016012291332_994036_4133_Final.pdf Acoustic simultaneous localization and mapping (a-SLAM) of a moving microphone array and its surrounding speakers]." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016.</ref> and Wi-Fi-SLAM (sensing by strengths of nearby Wi-Fi access points).<ref>Ferris, Brian, Dieter Fox, and Neil D. Lawrence. "[https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-399.pdf Wi-Fi-slam using gaussian process latent variable models] {{Webarchive\|url=https://web.archive.org/web/20221224110401/https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-399.pdf \|date=2022-12-24 }}." IJCAI. Vol. 7. No. 1. 2007.</ref> Recent approaches apply quasi-optical [[wireless]] ranging for [[Trilateration\|multi-lateration]] ([[real-time locating system]] (RTLS)) or [[Triangulation\|multi-angulation]] in conjunction with SLAM as a tribute to erratic wireless measures. A kind of SLAM for human pedestrians uses a shoe mounted [[inertial measurement unit]] as the main sensor and relies on the fact that pedestrians are able to avoid walls to automatically build floor plans of buildings by an [[indoor positioning system]].<ref name=RobertsonEtAl2009>{{cite conference \|last1=Robertson \|first1=P. Line 101 ⟶ 108: For 2D robots, the kinematics are usually given by a mixture of rotation and "move forward" commands, which are implemented with additional motor noise. Unfortunately the distribution formed by independent noise in angular and linear directions is non-Gaussian, but is often approximated by a Gaussian. An alternative approach is to ignore the kinematic term and read odometry data from robot wheels after each command—such data may then be treated as one of the sensors rather than as kinematics. === Moving objects === Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges.<ref>{{Cite ~~journal~~book\|last1=Perera\|first1=Samunda\|last2=Pasqual\|first2=Ajith\|title=Advances in Visual Computing \|chapter=Towards Realtime Handheld MonoSLAM in Dynamic Environments \|date=2011\|editor-last=Bebis\|editor-first=George\|editor2-last=Boyle\|editor2-first=Richard\|editor3-last=Parvin\|editor3-first=Bahram\|editor4-last=Koracin\|editor4-first=Darko\|editor5-last=Wang\|editor5-first=Song\|editor6-last=Kyungnam\|editor6-first=Kim\|editor7-last=Benes\|editor7-first=Bedrich\|editor8-last=Moreland\|editor8-first=Kenneth\|editor9-last=Borst\|editor9-first=Christoph~~\|title=Towards Realtime Handheld MonoSLAM in Dynamic Environments\|journal=Advances in Visual Computing~~\|volume=6938\|series=Lecture Notes in Computer Science\|language=en\|publisher=Springer Berlin Heidelberg\|pages=313–324\|doi=10.1007/978-3-642-24028-7_29\|isbn=9783642240287}}</ref><ref name=":1">{{Citation\|last1=Perera\|first1=Samunda\|title=Exploration: Simultaneous Localization and Mapping (SLAM)\|date=2014\|work=Computer Vision: A Reference Guide\|pages=268–275\|editor-last=Ikeuchi\|editor-first=Katsushi\|publisher=Springer US\|language=en\|doi=10.1007/978-0-387-31439-6_280\|isbn=9780387314396\|last2=Barnes\|first2=Dr.Nick\|last3=Zelinsky\|first3=Dr.Alexander\|s2cid=34686200}}</ref> SLAM with DATMO is a model which tracks moving objects in a similar way to the agent itself.<ref name=Wang2007>{{cite journal \|last1=Wang \|first1=Chieh-Chih Line 134 ⟶ 142: === Collaborative SLAM === ''Collaborative SLAM'' combines ~~images~~sensors from multiple robots or users to generate 3D maps.<ref>Zou, Danping, and Ping Tan. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.463.8135&rep=rep1&type=pdf Coslam: Collaborative visual slam in dynamic environments]." IEEE transactions on pattern analysis and machine intelligence 35.2 (2012): 354–366.</ref> This capability was demonstrated by a number of teams in the [[DARPA Grand Challenge\|2021 DARPA Subterranean Challenge]]. == Specialized SLAM methods == === Acoustic SLAM === An extension of the common SLAM problem has been applied to the acoustic ___domain, where environments are represented by the three-dimensional (3D) position of sound sources, termed aSLAM ('''A'''coustic '''S'''imultaneous '''L'''ocalization and '''M'''apping).<ref>{{Cite journal\|last1=Evers\|first1=Christine\|last2=Naylor\|first2=Patrick A.\|date=September 2018\|title=Acoustic SLAM\|journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing\|volume=26\|issue=9\|pages=1484–1498\|doi=10.1109/TASLP.2018.2828321\|issn=2329-9290\|url=https://eprints.soton.ac.uk/437941/1/08340823.pdf\|doi-access=free}}</ref> Early implementations of this technique have used direction-of-arrival (DoA) estimates of the sound source ___location, and rely on principal techniques of [[sound localization]] to determine source locations. An observer, or robot must be equipped with a [[microphone array]] to enable use of Acoustic SLAM, so that DoA features are properly estimated. Acoustic SLAM has paved foundations for further studies in acoustic scene mapping, and can play an important role in human-robot interaction through speech. To map multiple, and occasionally intermittent sound sources, an acoustic SLAM system uses foundations in random finite set theory to handle the varying presence of acoustic landmarks.<ref>{{Cite journal\|last=Mahler\|first=R.P.S.\|date=October 2003\|title=Multitarget bayes filtering via first-order multitarget moments\|journal=IEEE Transactions on Aerospace and Electronic Systems\|language=en\|volume=39\|issue=4\|pages=1152–1178\|doi=10.1109/TAES.2003.1261119\|bibcode=2003ITAES..39.1152M\|issn=0018-9251}}</ref> However, the nature of acoustically derived features leaves Acoustic SLAM susceptible to problems of reverberation, inactivity, and noise within an environment. === Audiovisual SLAM === Originally designed for [[human–robot interaction]], Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment.<ref>{{Cite ~~journal~~book\|last1=Chau\|first1=Aaron\|last2=Sekiguchi\|first2=Kouhei\|last3=Nugraha\|first3=Aditya Arie\|last4=Yoshii\|first4=Kazuyoshi\|last5=Funakoshi\|first5=Kotaro\|~~date~~title=~~October~~ 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \|~~title~~chapter=Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments \|~~journal~~date=~~2019 28th~~October ~~IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)~~2019\|___location=New Delhi, India\|publisher=IEEE\|pages=1–8\|doi=10.1109/RO-MAN46459.2019.8956321\|isbn=978-1-7281-2622-7\|s2cid=210697281}}</ref> Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement. == Implementation methods == Line 168 ⟶ 176: === GraphSLAM === In [[robotics]], '''GraphSLAM''' is a SLAM algorithm which uses sparse information matrices produced by generating a [[factor graph]] of observation interdependencies (two observations are related if they contain data about the same landmark).<ref name=Trun2005/> It is based on optimization algorithms. == History == A seminal work in SLAM is the research of ~~R.C.~~ Smith and P. Cheeseman on the representation and estimation of spatial uncertainty in 1986.<ref name=Smith1986>{{cite journal \|last1=Smith\|first1=R.C. \|last2=Cheeseman\|first2=P. Line 199 ⟶ 207: \|archive-url=https://web.archive.org/web/20100702155505/http://www-robotics.usc.edu/~maja/teaching/cs584/papers/smith90stochastic.pdf \|archive-date=2010-07-02 }}</ref> Other pioneering work in this field was conducted by the research group of [[Hugh F. Durrant-Whyte]] in the early 1990s.<ref name=Leonard1991>{{cite ~~journal~~book \|last1=Leonard\|first1=J.J. \|last2=Durrant-whyte\|first2=H.F. \|~~year~~date=1991 \|title=Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91 \|~~title~~chapter=Simultaneous map building and localization for an autonomous mobile robot ~~\|journal=Intelligent Robots and Systems' 91.'Intelligence for Mechanical Systems, Proceedings IROS'91. IEEE/RSJ International Workshop on~~ \|pages=1442–1447 \|doi=10.1109/IROS.1991.174711 \|isbn=978-0-7803-0067-5 \|s2cid=206935019 }}</ref> which showed that solutions to SLAM exist in the infinite data limit. This finding motivates the search for algorithms which are computationally tractable and approximate the solution. The acronym SLAM was coined within the paper, "Localization of Autonomous Guided Vehicles" which first appeared in [[Information Systems Research\|ISR]] in 1995.<ref>{{Cite journal\|last1=Durrant-Whyte\|first1=H.\|last2=Bailey\|first2=T.\|date=June 2006\|title=Simultaneous localization and mapping: part I\|journal=IEEE Robotics & Automation Magazine\|volume=13\|issue=2\|pages=99–110\|doi=10.1109/MRA.2006.1638022\|s2cid=8061430\|issn=1558-223X\|doi-access=free}}</ref> The self-driving STANLEY and JUNIOR cars, led by [[Sebastian Thrun]], won the DARPA Grand Challenge and came second in the DARPA Urban Challenge in the 2000s, and included SLAM systems, bringing SLAM to worldwide attention. Mass-market SLAM implementations can now be found in consumer robot vacuum cleaners<ref>{{Cite news\|last=Knight\|first=Will\|url=https://www.technologyreview.com/s/541326/the-roomba-now-sees-and-maps-a-home/\|title=With a Roomba Capable of Navigation, iRobot Eyes Advanced Home Robots\|work=MIT Technology Review\|date=September 16, 2015\|access-date=2018-04-25\|language=en}}</ref> and [[~~Virtual~~virtual reality headset~~\|virtual reality headsets~~]]s such as the [[Meta Quest 2]] and [[PICO 4]] for markerless inside-out tracking. == See also == {{Div col~~\|small=yes~~}} * [[Computational photography]] * [[Kalman filter]]