Simultaneous localization and mapping: Difference between revisions

Content deleted Content added
m Reverted edits by 2001:8F8:153F:34B0:269A:A0FF:87FE:3602 (talk) (HG) (3.4.9)
Line 101:
 
For 2D robots, the kinematics are usually given by a mixture of rotation and "move forward" commands, which are implemented with additional motor noise. Unfortunately the distribution formed by independent noise in angular and linear directions is non-Gaussian, but is often approximated by a Gaussian. An alternative approach is to ignore the kinematic term and read odometry data from robot wheels after each command—such data may then be treated as one of the sensors rather than as kinematics.
 
=== Acoustic SLAM ===
An extension of the common SLAM problem has been applied to the acoustic ___domain, where environments are represented by the three-dimensional (3D) position of sound sources, termed.<ref>{{Cite journal|last1=Evers|first1=Christine|last2=Naylor|first2=Patrick A.|date=September 2018|title=Acoustic SLAM|journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing|volume=26|issue=9|pages=1484–1498|doi=10.1109/TASLP.2018.2828321|issn=2329-9290|url=https://eprints.soton.ac.uk/437941/1/08340823.pdf|doi-access=free}}</ref> Early implementations of this technique have used direction-of-arrival (DoA) estimates of the sound source ___location, and rely on principal techniques of [[sound localization]] to determine source locations. An observer, or robot must be equipped with a [[microphone array]] to enable use of Acoustic SLAM, so that DoA features are properly estimated. Acoustic SLAM has paved foundations for further studies in acoustic scene mapping, and can play an important role in human-robot interaction through speech. To map multiple, and occasionally intermittent sound sources, an acoustic SLAM system uses foundations in random finite set theory to handle the varying presence of acoustic landmarks.<ref>{{Cite journal|last=Mahler|first=R.P.S.|date=October 2003|title=Multitarget bayes filtering via first-order multitarget moments|journal=IEEE Transactions on Aerospace and Electronic Systems|language=en|volume=39|issue=4|pages=1152–1178|doi=10.1109/TAES.2003.1261119|bibcode=2003ITAES..39.1152M|issn=0018-9251}}</ref> However, the nature of acoustically derived features leaves Acoustic SLAM susceptible to problems of reverberation, inactivity, and noise within an environment.
 
=== Audiovisual SLAM ===
Originally designed for [[human–robot interaction]], Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment.<ref>{{Cite journal|last1=Chau|first1=Aaron|last2=Sekiguchi|first2=Kouhei|last3=Nugraha|first3=Aditya Arie|last4=Yoshii|first4=Kazuyoshi|last5=Funakoshi|first5=Kotaro|date=October 2019|title=Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments|journal=2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)|___location=New Delhi, India|publisher=IEEE|pages=1–8|doi=10.1109/RO-MAN46459.2019.8956321|isbn=978-1-7281-2622-7|s2cid=210697281}}</ref> Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement.
 
=== Collaborative SLAM ===
''Collaborative SLAM'' combines images from multiple robots or users to generate 3D maps.<ref>Zou, Danping, and Ping Tan. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.463.8135&rep=rep1&type=pdf Coslam: Collaborative visual slam in dynamic environments]." IEEE transactions on pattern analysis and machine intelligence 35.2 (2012): 354–366.</ref>
 
=== Moving objects ===
Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges.<ref>{{Cite journal|last1=Perera|first1=Samunda|last2=Pasqual|first2=Ajith|date=2011|editor-last=Bebis|editor-first=George|editor2-last=Boyle|editor2-first=Richard|editor3-last=Parvin|editor3-first=Bahram|editor4-last=Koracin|editor4-first=Darko|editor5-last=Wang|editor5-first=Song|editor6-last=Kyungnam|editor6-first=Kim|editor7-last=Benes|editor7-first=Bedrich|editor8-last=Moreland|editor8-first=Kenneth|editor9-last=Borst|editor9-first=Christoph|title=Towards Realtime Handheld MonoSLAM in Dynamic Environments|journal=Advances in Visual Computing|volume=6938|series=Lecture Notes in Computer Science|language=en|publisher=Springer Berlin Heidelberg|pages=313–324|doi=10.1007/978-3-642-24028-7_29|isbn=9783642240287}}</ref><ref name=":1">{{Citation|last1=Perera|first1=Samunda|title=Exploration: Simultaneous Localization and Mapping (SLAM)|date=2014|work=Computer Vision: A Reference Guide|pages=268–275|editor-last=Ikeuchi|editor-first=Katsushi|publisher=Springer US|language=en|doi=10.1007/978-0-387-31439-6_280|isbn=9780387314396|last2=Barnes|first2=Dr.Nick|last3=Zelinsky|first3=Dr.Alexander|s2cid=34686200}}</ref> SLAM with DATMO is a model which tracks moving objects in a similar way to the agent itself.<ref name=Wang2007>{{cite journal
Line 142 ⟶ 132:
=== Biological inspiration ===
In neuroscience, the [[hippocampus]] appears to be involved in SLAM-like computations,<ref name="Howard">{{cite journal|last1=Howard|first1=MW|last2=Fotedar|first2=MS|last3=Datey|first3=AV|last4=Hasselmo|first4=ME|title= The temporal context model in spatial navigation and relational learning: toward a common explanation of medial temporal lobe function across domains|journal=Psychological Review|volume=112|issue=1|pages=75–116|pmc=1421376|year=2005|pmid=15631589|doi=10.1037/0033-295X.112.1.75}}</ref><ref name="Fox & Prescott">{{cite book|last1=Fox|first1=C|title= The 2010 International Joint Conference on Neural Networks (IJCNN)|pages=1–8|last2=Prescott|first2=T|chapter= Hippocampus as unitary coherent particle filter|doi=10.1109/IJCNN.2010.5596681|year=2010|isbn=978-1-4244-6916-1|s2cid=10838879|url=http://eprints.whiterose.ac.uk/108622/1/Fox2010_HippocampusUnitaryCoherentParticleFilter.pdf}}</ref><ref name="RatSLAM">{{cite book|last1=Milford|first1=MJ|last2=Wyeth|first2=GF|last3=Prasser|first3=D|title=IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004|chapter=RatSLAM: A hippocampal model for simultaneous localization and mapping|year=2004|pages=403–408 Vol.1|doi=10.1109/ROBOT.2004.1307183|isbn=0-7803-8232-3|s2cid=7139556|url=https://eprints.qut.edu.au/37593/1/c37593.pdf}}</ref> giving rise to [[place cells]], and has formed the basis for bio-inspired SLAM systems such as RatSLAM.
 
 
 
== Specialized SLAM methods ==
 
=== Acoustic SLAM ===
An extension of the common SLAM problem has been applied to the acoustic ___domain, where environments are represented by the three-dimensional (3D) position of sound sources, termed.<ref>{{Cite journal|last1=Evers|first1=Christine|last2=Naylor|first2=Patrick A.|date=September 2018|title=Acoustic SLAM|journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing|volume=26|issue=9|pages=1484–1498|doi=10.1109/TASLP.2018.2828321|issn=2329-9290|url=https://eprints.soton.ac.uk/437941/1/08340823.pdf|doi-access=free}}</ref> Early implementations of this technique have used direction-of-arrival (DoA) estimates of the sound source ___location, and rely on principal techniques of [[sound localization]] to determine source locations. An observer, or robot must be equipped with a [[microphone array]] to enable use of Acoustic SLAM, so that DoA features are properly estimated. Acoustic SLAM has paved foundations for further studies in acoustic scene mapping, and can play an important role in human-robot interaction through speech. To map multiple, and occasionally intermittent sound sources, an acoustic SLAM system uses foundations in random finite set theory to handle the varying presence of acoustic landmarks.<ref>{{Cite journal|last=Mahler|first=R.P.S.|date=October 2003|title=Multitarget bayes filtering via first-order multitarget moments|journal=IEEE Transactions on Aerospace and Electronic Systems|language=en|volume=39|issue=4|pages=1152–1178|doi=10.1109/TAES.2003.1261119|bibcode=2003ITAES..39.1152M|issn=0018-9251}}</ref> However, the nature of acoustically derived features leaves Acoustic SLAM susceptible to problems of reverberation, inactivity, and noise within an environment.
 
=== Audiovisual SLAM ===
Originally designed for [[human–robot interaction]], Audio-Visual SLAM is a framework that provides the fusion of landmark features obtained from both the acoustic and visual modalities within an environment.<ref>{{Cite journal|last1=Chau|first1=Aaron|last2=Sekiguchi|first2=Kouhei|last3=Nugraha|first3=Aditya Arie|last4=Yoshii|first4=Kazuyoshi|last5=Funakoshi|first5=Kotaro|date=October 2019|title=Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments|journal=2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)|___location=New Delhi, India|publisher=IEEE|pages=1–8|doi=10.1109/RO-MAN46459.2019.8956321|isbn=978-1-7281-2622-7|s2cid=210697281}}</ref> Human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well; as such, SLAM algorithms for human-centered robots and machines must account for both sets of features. An Audio-Visual framework estimates and maps positions of human landmarks through use of visual features like human pose, and audio features like human speech, and fuses the beliefs for a more robust map of the environment. For applications in mobile robotics (ex. drones, service robots), it is valuable to use low-power, lightweight equipment such as monocular cameras, or microelectronic microphone arrays. Audio-Visual SLAM can also allow for complimentary function of such sensors, by compensating the narrow field-of-view, feature occlusions, and optical degradations common to lightweight visual sensors with the full field-of-view, and unobstructed feature representations inherent to audio sensors. The susceptibility of audio sensors to reverberation, sound source inactivity, and noise can also be accordingly compensated through fusion of landmark beliefs from the visual modality. Complimentary function between the audio and visual modalities in an environment can prove valuable for the creation of robotics and machines that fully interact with human speech and human movement.
 
=== Collaborative SLAM ===
''Collaborative SLAM'' combines images from multiple robots or users to generate 3D maps.<ref>Zou, Danping, and Ping Tan. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.463.8135&rep=rep1&type=pdf Coslam: Collaborative visual slam in dynamic environments]." IEEE transactions on pattern analysis and machine intelligence 35.2 (2012): 354–366.</ref>
 
== Implementation methods ==