Duckiebot Localization with Sensor Fusion in Duckietown

Posted on July 8, 2025 | by Duckietown Admin

Duckiebot Localization with Sensor Fusion in Duckietown

Project Resources

Localization with Sensor Fusion in Duckietown - the objectives

The advantage of having multiple sensors on a Duckiebot is that the data provided can be combined to provide additional precision and reduce uncertainty in derived results. This process is generally referred to as sensor fusion, and a typical example is localization, i.e., the problem of finding the pose of the Duckiebot in time, with respect to some reference frame. And if the data is redundant? No problem, just discard it.

In this project, the objective is to implement sensor fusion-based localization and lane-following on a DB21 Duckiebot, integrating odometry (using data from wheel encoders) with visual AprilTag detection for improved positional accuracy.

This process addresses limitations of odometry, i.e., the open-loop reconstruction of the robots’ trajectory using only wheel encoder data in a mathematical approach known as “dead reckoning”, by incorporating AprilTags as global reference landmarks, thereby enhancing spatial awareness in environments where dead reckoning alone is insufficient.

Technical concepts include AprilTag-based localization, PID control for lane following, transform tree management in ROS (tf2), and coordinate frame transformations for pose estimation.

Sensor fusion - visual project highlights

The technical approach and challenges

This approach, at the technical level, involves:

extending ROS-based packages to implement AprilTag detection using the dt-apriltags library,
configuring static transformations for landmark localization in a unified world frame, and
correcting odometry drift by broadcasting transforms from estimated AprilTag poses to the Duckiebot’s base frame.

A full PID controller was moreover implemented, with tunable gains for lateral and heading deviation, and derivative terms were conditionally initialized for stability.

Challenges included:

remapping ROS topics for motor command propagation,
resolving frame connectivity in tf trees,
configuring accurate static transforms for AprilTag landmarks,
debugging quaternion misrepresentation during pose updates, and
correctly applying transform compositions using lookup_transform_full to compute odometry corrections.

Camera and AprilTag coordinate frames used in localization and sensor fusion for pose estimation — AprilTag and Camera Frame Axes for Localization

Duckiebot camera coordinate frame used in localization and sensor fusion for transforming AprilTag detections — Duckiebot Camera Frame Axes for Sensor Fusion

Duckietown global coordinate system used in localization and sensor fusion for world-frame pose calculations — Duckietown Map Coordinate System for Global Localization

Looking for similar projects?

Check out the following works on path planning with Duckietown:

Localization with Sensor Fusion in Duckietown: Authors

Samuel Neumann is a Ph. D. student at the University of Alberta, Canada.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

These spotlight projects are shared to exemplify Duckietown’s value for hands-on learning in robotics and AI, enabling students to apply theoretical concepts to practical challenges in autonomous robotics, boosting competence and job prospects.

Extended Kalman Filter (EKF) SLAM for Duckiebots

Posted on April 26, 2025 | by Duckietown Admin

Extended Kalman Filter (EKF) SLAM for Duckiebots

Project Resources

Project highlights

In SLAM, everything that can drift will drift, and the role of the filter is to drift more slowly than entropy.

Extended Kalman Filter (EKF) SLAM system architecture overview — Figure 1. EKF-SLAM Architecture Overview

Extended Kalman Filter (EKF) SLAM input and output fusion diagram — Figure 2. Extended Kalman Filter (EKF) SLAM input and output flow diagram

Extended Kalman Filter (EKF) SLAM incorporating landmarks into odometry problem — Figure 3. Incorporating landmarks into the odometry problem

Extended Kalman Filter (EKF) SLAM odometry trajectory after incorporating landmarks — Figure 4. Odometry after incorporating landmarks

Extended Kalman Filter (EKF) SLAM odometry before incorporating landmarks — Figure 5. Odometry before incorporating landmarks

Extended Kalman Filter (EKF) SLAM improved versus simple interpolation for trajectory estimation — Figure 6. EKF-SLAM Trajectory Interpolation

The equations describe how the Duckiebot's position and orientation (x,y,θ) evolve over time based on its linear velocity (v) and angular velocity (ω). — Figure 7. EKF's prediction-correction workflow

Extended Kalman Filter (EKF) SLAM AprilTag detection and pose estimation steps — Figure 8. EKF-SLAM AprilTags Detection

The diagrams illustrates the coordinate transformations and the pose estimation workflow. — Figure 9. EKF-SLAM AprilTags Pose Estimation

The visual illustrates the EKF's prediction-correction workflow, showing how odometry data predicts the next state and April tag measurements refine it. By iteratively fusing these data sources, the EKF achieves accurate localization and mapping, even under noisy or incomplete sensor inputs. — Figure 10. EKF-SLAM Motion Model Equations

Figure 11. EKF-SLAM full version (motion+measurement model)

Figure 12. EKF-SLAM measurement model (without motion model)

Figure 13. EKF-SLAM motion model (without measurement model)

Figure 14. Pose estimation using a circular interpolation

Figure 15. Pose estimation using a circular interpolation_DELTA_TIME=2

Figure 16. Pose estimation using a linear interpolation

Figure 17. Pose estimation using a linear interpolation_DELTA_TIME=2

: Figure 1. EKF-SLAM Architecture Overview

: Figure 2. Extended Kalman Filter (EKF) SLAM input and output flow diagram

: Figure 3. Incorporating landmarks into the odometry problem

: Figure 4. Odometry after incorporating landmarks

: Figure 5. Odometry before incorporating landmarks

: Figure 6. EKF-SLAM Trajectory Interpolation

: Figure 7. EKF’s prediction-correction workflow

: Figure 8. EKF-SLAM AprilTags Detection

: Figure 9. EKF-SLAM AprilTags Pose Estimation

: Figure 10. EKF-SLAM Motion Model Equations

: Figure 11. EKF-SLAM full version (motion+measurement model)

: Figure 12. EKF-SLAM measurement model (without motion model)

: Figure 13. EKF-SLAM motion model (without measurement model)

: Figure 14. Pose estimation using a circular interpolation

: Figure 15. Pose estimation using a circular interpolation_DELTA_TIME=2

: Figure 16. Pose estimation using a linear interpolation

: Figure 17. Pose estimation using a linear interpolation_DELTA_TIME=2

Extended Kalman Filter (EKF) SLAM for Duckiebots - the objectives

This SLAM-Duckietown project addresses a famous challenge in robotics: concurrently estimating the agent’s pose and mapping the environment under uncertainty.

This project implements an Extended Kalman Filter (EKF) SLAM algorithm on Duckiebots (DB21-J4), combining odometry from wheel encoders and landmark observations from April tags.

The objective is to maintain an evolving posterior over the Duckiebot’s pose (x,y,θ) and landmark positions by recursively integrating noisy control inputs and observations.

This upgrade shifts Duckiebots from open-loop dead reckoning units into closed-loop, state-estimating agents. For Duckietown, it reinforces its use as an experimental ground for real-world robotics challenges, including data association, observability, filter consistency, and multi-sensor fusion.

The challenges and approach

The system applies the EKF-SLAM pipeline in two stages: motion prediction and measurement correction.

Prediction propagates the robot’s belief through a non-holonomic kinematic model under process noise, using arc-based interpolation to reduce discretization error.

Correction incorporates April tag detections via a Perspective-n-Point (PnP) solution, updating the state with landmark-relative observations under observation noise. The state vector grows dynamically as new landmarks are observed, and the covariance matrix tracks both robot and landmark uncertainty.

The technical challenges include maintaining filter consistency under linearization errors, ensuring landmark observability despite partial fields of view, and synchronizing asynchronous data from wheel encoders, camera frames, and Vicon ground-truth captures.

Moreover, AprilTag detection is constrained by lighting artifacts and pose ambiguity at shallow viewing angles, introducing non-Gaussian errors that the EKF must approximate linearly.

Moreover, tuning noise parameters presents the classical tradeoff: too little noise leads to overconfidence and divergence; too much noise leads to filter paralysis. Deployment exposes the systemic difference between simulation and physical experiments: real Duckiebots do not move with perfect kinematics, cameras suffer from radial distortion, and computation suffers from non-deterministic latency.

In SLAM, everything that can drift will drift, and the role of the filter is to drift more slowly than entropy.

Did this work spark your curiosity?

Check out the follow works on path planning with Duckietown:

Extended Kalman Filter (EKF) SLAM for Duckiebots: Authors

AmirHossein Zamani was a former Duckietown student, and currently, he is pursuing his Ph.D. in Computer Science at Mila (Quebec AI Institute) and Concordia University, Canada. He is also working as an AI Research Scientist Intern at Autodesk in Montreal, Canada.

Léonard Oest O’Leary was a former Duckietown student, and currently, he is pursuing his Master of Science in Computer Science at the University of Montreal, Canada.

Kevin Lessard was a former Duckietown student, and currently, he is pursuing his Master of Science in Machine Learning at Mila – Quebec AI Institute in Montreal, Canada.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

Monocular Navigation in Duckietown Using LEDNet Architecture

Posted on November 30, 2024 | by Duckietown Admin

Monocular Navigation in Duckietown Using LEDNet Architecture

Project Resources

Project highlights

Here is a visual tour of the authors’ work on implementing monocular navigation using LEDNet architecture in Duckietown*.

ViT image segmentation outputs for Duckietown showing the effect of 1 block and 3 blocks in the model. — Figure 1. ViT Image Segmentation Outputs for Duckietown: Comparing 1 Block vs 3 Blocks.

Illustration of an encoder-decoder architecture (SegNet) used for pixelwise segmentation for the monocular navigation project. — Figure 2. Encoder-Decoder Architecture (SegNet) for Pixelwise Segmentation.

Visual representation of the LEDNet architecture showing its lightweight encoder-decoder structure. — Figure 3. The LEDNet Architecture.

LEDNet image segmentation of Duckietown showing multi-scale feature pyramids for pixel-level attention. — Figure 4. LEDNet Image Segmentation of Duckietown.

LEDNet loss graph showing the flattening of the loss curve after 200 epochs. — Figure 5. LEDNet Loss Graph.

Simulated Duckietown map 'loop_empty' showing a simple layout with left and right bends. — Figure 6. Simulated Duckietown Map: 'loop_empty'.

Simulated Duckietown map 'loop_empty' with obstacles such as Duckiebots and rubber ducks. — Figure 7. Simulated Duckietown Map: 'loop_empty' with Obstacles.

Visual representation of the lane-following and obstacle-avoidance algorithm from Saavedra-Ruiz et al. (2022). — Figure 8. Lane-Following and Obstacle-Avoidance Algorithm (Saavedra-Ruiz et al., 2022).

Comparison of image segmentations created by LEDNet, ViT 1 Block, and ViT 3 Blocks, highlighting the detection of small obstacles. — Figure 9. Image Segmentations: LEDNet vs. ViT 1 Block vs. ViT 3 Blocks.

*Images from “Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers, M. Saavedra-Ruiz, S. Morin, L. Paull. ArXiv: https://arxiv.org/pdf/2203.03682

Why monocular navigation?

Image sensors are ubiquitous for their well-known sensory traits (e.g., distance measurement, robustness, accessibility, variety of form factors, etc.). Achieving autonomy with monocular vision, i.e., using only one image sensor, is desirable, and much work has gone into approaches to achieve this task. Duckietown’s first Duckiebot, the DB17, was designed with only a camera as sensor suite to highlight the importance of this challenge!

But images, due to the integrative nature of image sensors and the physics of the image generation process, are subject to motion blur, occlusions, and sensitivity to environmental lighting conditions, which challenge the effectiveness of “traditional” computer vision algorithms to extract information.

In this work, the author uses “LEDNet” to mitigate some of the known limitations of image sensors for use in autonomy. LEDNet’s encoder-decoder architecture with high resolution enables lane-following and obstacle detection. The model processes images at high frame rates, allowing recognition of turns, bends, and obstacles, which are useful for timely decision-making. The resolution improves the ability to differentiate road markings from obstacles, and classification accuracy.

LEDNet’s obstacle-avoidance algorithm can classify and detect obstacles even at higher speeds. Unlike Vision Transformers (wiki) (ViT) models, LEDNet avoids missing parts of obstacles, preventing robot collisions.

The model handles small obstacles by identifying them earlier and navigating around them. In the simulated Duckietown environment, LEDNet outperforms other models in lane-following and obstacle-detection tasks.

LEDNet uses “real-time” image segmentation to provide the Duckiebot with information for steering decisions. While the study was conducted in a simulation, the model’s performance indicates it would work in real-world scenarios with consistent lighting and predictable obstacles.

The next is to try it out!

Monocular Navigation in Duckietown Using LEDNet Architecture - the challenges

In implementing monocular navigation in this project, the author faced several challenges:

Computational demands: LEDNet’s high-resolution processing requires computational resources, particularly when handling real-time image segmentation and obstacle detection at high frame rates.
Limited handling of complex environments: the lane-following and obstacle-avoidance algorithm used in this study does not handle crossroads or junctions, limiting the model’s ability to navigate complex road structures.
Simulation vs. real-world application: The study relies on a simulated environment where lighting, obstacle behavior, and road conditions are consistent. Implementing the system in the real world introduces variability in these factors, which affects the model’s performance.
Small obstacle detection: While LEDNet performs well in detecting small obstacles compared to ViT, the detection of small obstacles is still dependent on the resolution and segmentation quality.

Project Report

Project Author

Angelo Broere is currently working as an Oproepkracht at Compressor Parts Service, Netherlands.

Learn more

Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.