Semantic SLAM for better navigation in outdoor scenes.
Our team at Digital Twin Technology (DTT) is exploring and researching semantic SLAM algorithms to address the scope of mapping large-scale outdoor scenes, which is an important part of the DIDYMOS-XR project. SLAM, short for simultaneous localization and mapping, involves the concurrent mapping of an area while determining the device’s location within that space. This capability is fundamental to enabling mobile mapping, facilitating the construction of maps for extensive areas in significantly reduced timeframes. Mobile robots, drones, or vehicles equipped with SLAM systems can efficiently measure and map both outdoor and indoor environments. The utilization of SLAM streamlines data collection processes, offering a versatile solution for rapidly generating maps in various scenarios.
Traditional vision-based SLAM techniques (e.g., ORB-SLAM, EKF-SLAM and Feature-based slam) have had reasonable success; however, they may fail to achieve desired results in challenging environments which involves localization and mapping for large scale outdoor scenes containing dynamic elements. These methods tend to struggle in dynamic environments, where lighting conditions, object appearances, or scene structures change. Moreover, they can be computationally intensive, limiting their real-time performance, especially on resource-constrained platforms (https://arxiv.org/pdf/2210.10491.pdf).
Apart from this, many traditional SLAM approaches require a good initial estimate of the robot’s pose or the environment, and the performance may degrade if the initialization is poor. The type of failures with traditional SLAM can be in many forms, including issues like lost feature tracks, discrepancies in the optimization algorithm, and the accumulation of drift (https://arxiv.org/pdf/2108.10869.pdf). To address these limitations, semantic information can be integrated into the SLAM where semantic information, as high-level environmental information, enables robots to better understand the surrounding environment (https://www.mdpi.com/2072-4292/14/13/3010).
The benefit of Semantic SLAM is that it excels not only in acquiring geometric details of unfamiliar environments and tracking the movement of robots but also in identifying and detecting targets within the scene. It goes beyond by capturing semantic information, encompassing functional attributes and relationships with surrounding objects. Furthermore, Semantic SLAM exhibits the capability to comprehend the entire contents of the environment, offering a comprehensive understanding beyond mere geometry and movement information (https://www.mdpi.com/1424-8220/21/19/6355).
The recent trend of incorporating deep learning approaches into Semantic SLAM elevates its capabilities, enabling advanced recognition, comprehension, and interpretation of environmental semantics. This enhancement positions Semantic SLAM as a powerful technology with diverse applications in large-scale outdoor scenarios. Integrating the deep learning methods into semantic SLAM also enhances quite a few core capabilities such as loop closure, global bundle adjustment and such.
To address the important scope of the DIDYMOS-XR project which focuses on large-scale localization and mapping and aims at investigating the SLAM approaches capable of handling large outdoor scenes, we at DTT are currently working on implementing the VOLUME-DROID SLAM method (https://arxiv.org/pdf/2306.06850.pdf) which is based on the real-time fusion of the DROID-SLAM (https://arxiv.org/pdf/2108.10869.pdf) and Convolutional Bayesian Kernel Inference (ConvBKI) (https://arxiv.org/abs/2209.10663) approaches. The implementation of the VOLUME-DROID involves building the whole system via Docker which will be capable of taking camera images (monocular or stereo) or frames from video as input and generating outputs in the form of real-time 3D semantic mapping of the environment via a combination of DROID-SLAM, point cloud registration, off-the-shelf semantic segmentation and ConvBKI.
We are currently working on the crucial parts of the VOLUME-DROID implementation such as the Docker setup, installation of various dependencies required for the proper execution of the VOLUME-DROID SLAM, and planning and performing initial tests on the local and reference datasets (e.g., TUM datasets). Once the initial tests are successfully performed, we will move towards potential optimization of the implementation towards improving the overall system performance and subsequently, test the optimized method on the dataset(s) captured within the scope of the DIDYMOS-XR project. We invite the readers to stay tuned for more updates on this in future blog posts.
By Tariqul Islam and Mary Jyothi Pudota