Monocular Navigation in Duckietown Using LEDNet Architecture
Project Resources
- Objective: Autonomous lanel following and obstable avoidance in Duckietown using vision and machine learning.
- Approach: Use monocular vision and "LEDNet" with vision transformer models. Simulated tests evaluate LEDNet's high-resolution performance against vision transformer's low-resolution capabilities.
- Authors: Angelo R. Broere
Project highlights
Here is a visual tour of the authors’ work on implementing monocular navigation using LEDNet architecture in Duckietown*.
*Images from “Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers, M. Saavedra-Ruiz, S. Morin, L. Paull. ArXiv: https://arxiv.org/pdf/2203.03682
Why monocular navigation?
Image sensors are ubiquitous for their well-known sensory traits (e.g., distance measurement, robustness, accessibility, variety of form factors, etc.). Achieving autonomy with monocular vision, i.e., using only one image sensor, is desirable, and much work has gone into approaches to achieve this task. Duckietown’s first Duckiebot, the DB17, was designed with only a camera as sensor suite to highlight the importance of this challenge!
But images, due to the integrative nature of image sensors and the physics of the image generation process, are subject to motion blur, occlusions, and sensitivity to environmental lighting conditions, which challenge the effectiveness of “traditional” computer vision algorithms to extract information.
In this work, the author uses “LEDNet” to mitigate some of the known limitations of image sensors for use in autonomy. LEDNet’s encoder-decoder architecture with high resolution enables lane-following and obstacle detection. The model processes images at high frame rates, allowing recognition of turns, bends, and obstacles, which are useful for timely decision-making. The resolution improves the ability to differentiate road markings from obstacles, and classification accuracy.
LEDNet’s obstacle-avoidance algorithm can classify and detect obstacles even at higher speeds. Unlike Vision Transformers (wiki) (ViT) models, LEDNet avoids missing parts of obstacles, preventing robot collisions.
The model handles small obstacles by identifying them earlier and navigating around them. In the simulated Duckietown environment, LEDNet outperforms other models in lane-following and obstacle-detection tasks.
LEDNet uses “real-time” image segmentation to provide the Duckiebot with information for steering decisions. While the study was conducted in a simulation, the model’s performance indicates it would work in real-world scenarios with consistent lighting and predictable obstacles.
The next is to try it out!
Monocular Navigation in Duckietown Using LEDNet Architecture - the challenges
In implementing monocular navigation in this project, the author faced several challenges:
- Computational demands: LEDNet’s high-resolution processing requires computational resources, particularly when handling real-time image segmentation and obstacle detection at high frame rates.
- Limited handling of complex environments: the lane-following and obstacle-avoidance algorithm used in this study does not handle crossroads or junctions, limiting the model’s ability to navigate complex road structures.
- Simulation vs. real-world application: The study relies on a simulated environment where lighting, obstacle behavior, and road conditions are consistent. Implementing the system in the real world introduces variability in these factors, which affects the model’s performance.
- Small obstacle detection: While LEDNet performs well in detecting small obstacles compared to ViT, the detection of small obstacles is still dependent on the resolution and segmentation quality.
Project Report
Project Author
Angelo Broere is currently working as an Oproepkracht at Compressor Parts Service, Netherlands.
Learn more
Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.
It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.