Intersection Navigation in Duckietown Using 3D Image Features

Project Resources

Objective: To evaluate the effectiveness of 3D-encoded image feature representations for intersection navigation in Duckietown using a Bird's Eye View (BEV) approach.
Approach: Integrate the BEV encoding from MILE into the intersection navigation method proposed by Giles et al. (2019) and assess its performance in the Duckietown environment.
Authors: Jasper Mulder

Project highlights

Here is a visual tour of the authors’ work on implementing intersection navigation using 3D image features in Duckietown.

Figure 1. Camera Input to Bird's Eye View (BEV) Transformation.

Figure 2. Classified Stop Line Clusters in BEV.

Figure 3. Possible Intersection Navigation Trajectories.

Figure 4. Camera Input for BEV Generation in Intersection Navigation.

Figure 5. Comparison of BEV Representations During Intersection Navigation.

Figure 6. Comparison of MILE-Generated BEVs from CARLA and Duckietown Simulators.

Intersection Navigation in Duckietown: Advancing with 3D Image Features

Intersection navigation in Duckietown using 3D image features is an approach intented to improve autonomous intersection navigation, enhancing decision-making and path planning in complex Duckietown environments, i.e., made of several road loops and road intersections.

The traditional approach to intersection navigation in Duckietown is naive: (a) stop at the red line before the intersection, (b) read Apriltag-equipped traffic signs (providing information on the shape and coordination mechanism at intersections); (c) decide which direction to take; (d) coordinate with other vehicles at the intersection to avoid collisions; (e) navigate through the intersection. This last step is performed in an open-loop fashion, leveraging the known appearance specifications of intersections in Duckietown.

By incorporating 3D image features in the perception pipeline, extrapolated from the Duckietown road lines, Duckiebots can achieve a representation of their pose while crossing the intersection, closing, therefore, the loop and improving navigation accuracy, in addition to facilitating the development of new strategies for intersection navigation, such as real-time path optimization.

Combining 3D image features with methods, such as Bird’s Eye View (BEV) transformations allows for comprehensive representations of the intersection. The integration of these techniques improves the accuracy of stop line detection and obstacle avoidance contributes to advancing autonomous navigation algorithms and supports real-world deployment scenarios.

The method and the challenges of intersection navigation using 3D features

The thesis involves implementing the MILE model (Model-based Imitation LEarning for urban driving), trained on the CARLA simulator, into the Duckietown environment to evaluate its performance in navigating unprotected intersections.

Experiments were conducted using the Gym-Duckietown simulator, where Duckiebots navigated a 4-way intersection across multiple trajectories. Metrics such as success rate, drivable area compliance, and ride comfort were used to assess performance.

The findings indicate that while the MILE model achieved state-of-the-art performance in the CARLA simulator, its generalization to the Duckietown environment without additional training was, as probably expected due to the sim2real gap, limited.

The BEVs generated by MILE were not sufficiently representative of the actual road surface in Duckietown, leading to suboptimal navigation performance. In contrast, the homographic BEV method, despite its assumption of a flat world plane, provided more accurate representations for intersection navigation in this context.

As for most approaches in robotics, there are limitation and tradeoffs to analyze.

Here are some technical challenges of the proposed approach:

Generalization across environments: one of the challenges is ensuring that the 3D image feature representation generalizes well across different simulation environments, such as Duckietown and CARLA. The differences in scale, road structures, and dynamics between simulators can impact the performance of the navigation system.
Accuracy of BEV representations: the transformation of camera images into Bird’s Eye View (BEV) representations has reduced accuracy, especially when dealing with low-resolution or distorted input data.
Real-time processing: the integration of 3D image features for navigation requires substantial computational resources with respect to utilizing 2D features instead. Achieving near real-time processing speeds for navigation tasks such as intersection navigation, is challenging.

Intersection Navigation in Duckietown Using 3D Image Feature: Full Report

Intersection Navigation in Duckietown Using 3D Image Feature: Authors

Jasper Mulder is currently working as a Junior Outdoor expert at Bever, Netherlands.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

These spotlight projects are shared to exemplify Duckietown’s value for hands-on learning in robotics and AI, enabling students to apply theoretical concepts to practical challenges in autonomous robotics, boosting competence and job prospects.