Visual Control for Autonomous Navigation in Duckietown

Posted on September 12, 2025 | by Duckietown Admin

General Information

Title: Visual Urban Navigation for Mobile Robots: Implementation in the Duckietown Environment
Authors: Shima Akbari, Nima Akbari, Giuseppe Oriolo, Sergio Galeani
Institution: Università degli Studi di Roma Tor Vergata, Italy
Citation: S. Akbari, N. Akbari, G. Oriolo and S. Galeani, "Visual Urban Navigation for Mobile Robots: Implementation in the Duckietown Environment," 2025 International Conference on Control, Automation and Diagnosis (ICCAD), Barcelona, Spain, 2025, pp. 1-6, doi: 10.1109/ICCAD64771.2025.11099311.

Visual Control for Autonomous Navigation in Duckietown

This research presents a visual control framework for in Duckietown using only onboard camera feedback for autonomous navigation. The system models the Duckiebot as a unicycle with constant driving velocity and uses steering velocity as the control input. Virtual guidelines are extracted from the lane boundaries to compute two visual features: the middle point and the vanishing point on the image plane.

The controller drives these features to the image center using a mathematically derived control law. The visual features are obtained from the camera feed using a multi-stage image processing pipeline implemented in OpenCV. The pipeline includes frame denoising, grayscale conversion, edge detection using the Canny edge detection algorithm, region of interest masking, and line detection via the Probabilistic Hough Line Transform. This setup provides robust detection of the white and yellow lane markings under varying conditions.

A scenario-driven transition system detects red lines marking intersections and activates artificial guidelines to execute controlled turns. The visual control implementation runs as a single ROS node following a publisher-subscriber architecture, deployed both in the Duckietown Simulator (gym) and in Duckietown.

Visual control scheme for autonomous navigation in Duckietown — Figure 1. General Scheme of Robot Navigation

Visual control coordinate frames in Duckietown — Figure 3. Coordinate Frames and Virtual Guidelines

Visual control processing pipeline in Duckietown — Figure 5. Complete Image Processing Workflow

Visual control turn timing in Duckietown — Figure 6. Turn Timing Schematics

Visual control artificial guidelines in Duckietown — Figure 7. Artificial Guidelines for Turning

Visual control lane centering in Duckietown — Figure 8. Lane Centering Point Convergence

Visual control steering velocity in Duckietown — Figure 9. Velocity Convergence Graph

Visual control turns evolution in Duckietown — Figure 10. Consecutive Turns Point Evolution

Visual control turns velocity in Duckietown — Figure 11. Consecutive Turns Velocity Evolution

Visual control experimental velocity in Duckietown — Figure 14. Experimental Velocity Evolution (Centering)

Visual control right turn in Duckietown — Figure 16. Experimental Right Turn Velocity

Highlights - Visual Control for Autonomous Navigation in Duckietown

Here is a visual tour of the implementation of visual control for autonomous navigation by the authors. For all the details, check out the full paper.

: Figure 1. General Scheme of Robot Navigation

: Figure 2. Duckietown Real vs Simulated Setup

: Figure 3. Coordinate Frames and Virtual Guidelines

: Figure 4. Image Processing Steps Verification

: Figure 5. Complete Image Processing Workflow

: Figure 6. Turn Timing Schematics

: Figure 7. Artificial Guidelines for Turning

: Figure 8. Lane Centering Point Convergence

: Figure 9. Velocity Convergence Graph

: Figure 10. Consecutive Turns Point Evolution

: Figure 11. Consecutive Turns Velocity Evolution

: Figure 12. Live Camera Image Processing

: Figure 13. Experimental Lane Centering

: Figure 14. Experimental Velocity Evolution (Centering)

: Figure 15. Experimental Turning Trials

: Figure 16. Experimental Right Turn Velocity

: Figure 17. Experimental Left Turn Velocity

Abstract

Here is the abstract of the work, directly in the words of the authors:

This paper presents a vision-based control framework for the autonomous navigation of wheeled mobile robots in city-like environments, including both straight roads and turns. The approach leverages Computer Vision techniques and OpenCV to extract lane line features and utilizes a previously established control law to compute the necessary steering commands.

The proposed method enables the robot to accurately follow the lanes and seamlessly handle complex maneuvers such as consecutive turns. The framework has been rigorously validated through extensive simulations and real-world experiments using physical robots equipped with the ROS framework. Experimental evaluations were conducted at the DIAG Robotics Lab at Sapienza University of Rome, Italy, demonstrating the practicality of the proposed solution in realistic settings.

This work bridges the gap between theoretical control strategies and their practical application, offering insights into vision-based navigation systems for autonomous robotics. A video demonstration of the experiments is available at https://youtu.be/tDvpwSj8X28.

Conclusion - Visual Control for Autonomous Navigation in Duckietown

Here is the conclusion according to the authors of this paper:

This paper proposed a vision-based control framework for lane-following tasks in wheeled mobile robots, validated through both simulations and real-world experiments. The approach effectively maintains the robot position at the center of lanes and enables safe left and right turns by relying solely on visual feedback from onboard camera, without requiring external localization systems or pre-mapped environments.

The system’s modular design and simplicity allow for seamless integration with other robotic systems, making it versatile for diverse urban navigation scenarios. Future research will focus on enhancing the framework to handle complex scenarios, such as autonomous lane corrections, and incorporating obstacle detection and avoidance mechanisms for improved performance in dynamic, real-world environments.

These advancements will expand the applicability of the proposed method, confirming its potential as a robust solution for autonomous navigation.

Did this work spark your curiosity?

Check out the following works on vehicle autonomy with Duckietown:

Project Authors

Shima Akbari is a PhD student at Italian National Program in Autonomous Systems at the University of Rome Tor Vergata, Italy.

Nima Akbari is a PhD student at Basel University of Switzerland in privacy technologies for the Internet of Things.

Giuseppe Oriolo is a Full Professor of Automatic Control and Robotics at Sapienza University of Rome.

Sergio Galeani is a full professor at the University of Rome Tor Vergata, Italy.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Sim2Real Lane Segmentation via Domain Adaptation

Posted on July 31, 2025 | by Duckietown Admin

General Information

Title: Simulation to Real Domain Adaptation for Lane Segmentation
Authors: Márton Tim, Márton Szemenyei, Róbert Moni
Institution: Budapest University of Technology and Economics, Budapest, Hungary
Citation: M. Tim, M. Szemenyei and R. Moni, "Simulation to Real Domain Adaptation for Lane Segmentation," 2020 23rd International Symposium on Measurement and Control in Robotics (ISMCR), Budapest, Hungary, 2020, pp. 1-6, doi: 10.1109/ISMCR51255.2020.9263406.

Sim2Real Lane Segmentation via Domain Adaptation

This embodied AI work investigates Sim2Real transfer: the process of applying ML agents trained in simulation to real-world environments, for semantic lane segmentation in mobile robotics using domain adaptation techniques.

The study addresses the distributional shift between synthetic (simulated) and real-world data using unsupervised and semi-supervised learning approaches that minimize the need for manual annotation by learning from unlabeled data or limited labeled samples.

A convolutional neural network (CNN) with an encoder-decoder architecture is trained on labeled synthetic data generated in the Duckietown Gym and adapted to unlabeled real-world images captured in the physical Duckietown setup.

The method integrates:

Feature-level and pixel-level adaptation, aligning internal representations and input appearance between domains to ensure consistent segmentation.
Adversarial training, where a discriminator encourages the CNN to learn domain-invariant features.
Cycle-consistent generative adversarial networks (CycleGANs), which perform image-to-image translation to make synthetic images visually similar to real ones while preserving semantic structure.
Evaluation using mean Intersection over Union (mIoU) and pixel accuracy, both standard metrics for assessing segmentation quality.

The results demonstrate that domain adaptation enables effective Sim2Real transfer for lane detection in Duckietown with minimal supervision advancing the deployment of robust, label-efficient perception systems in embedded robotics and autonomous navigation.

Highlights - Sim2Real lane segmentation via domain adaptation

Here is a visual tour of the implementation of lane segmentation via domain adaptation by the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

As the cost of labelling and collecting real world data remains an issue for companies, simulator training and transfer learning slowly evolved to be the foundation of many state-of the-art projects. In this paper these methods are applied in the Duckietown setup where self-driving agents can be developed and tested.

Our aim was to train a selected artificial neural network for right lane segmentation on simulator generated stream of images as a comparison baseline, then use domain adaptation to be more precise and stable in the real environment. We have tested and compared four knowledge transfer methods that included domain transformation using CycleGAN and semi-supervised domain adaptation via Minimax Entropy.

As the latter was previously untested in semantic segmentation according to our best knowledge, we have contributed to showing it is indeed possible and produces promising results. Finally we have shown that it could also create a model that fulfills our performance requirements of stability and accuracy.We show that the selected methods are equally eligible for the simulation to real transfer learning problem, and that the simplest method delivers the best performance.

Conclusion - Sim2Real lane segmentation via domain adaptation

Here is the conclusion according to the authors of this paper:

Our goal was to create a stable and accurate right lane segmentation network by means of simulator data and domain adaptation techniques. We have tested and compared four knowledge transfer methods that included domain transformation using CycleGAN and semi-supervised domain adaptation via Minimax Entropy. We have shown that in the given scenario simulator-trained models have relatively good performance on real images, though their stability is a key weakness.

Our findings demonstrate that domain transformation using CycleGAN has limited applicability in segmentation tasks due to its distorting effect on road geometry, however the similarity between training and testing domains did result in increased stability.

Unfortunately, histogram matching failed in our case to improve on the baseline solution, producing similar results to CycleGAN.

We have observed that one of the simplest domain adaptation methods, source and target combined domain training helped to produce the best performing model according to numerical evaluation.

We implemented and demonstrated how semi-supervised domain adaptation via Minimax Entropy, a complex, entropybased adversarial method is applicable for segmentation tasks.

In the end, all the existing results were compared and evaluated with the conclusion that source and target combined domain training produced the best results of all investigated methods tied with SSDA via Minimax Entropy. Thereby, the usability of the latter method in segmentation tasks has also been proven.

Did this work spark your curiosity?

Check out the following works on vehicle autonomy with Duckietown:

Project Authors

Márton Tim is currently working as a deep learning engineer at Continental, Hungary.

Márton Szemenyei is an Associate Professor at Budapest University of Technology and Economics, Hungary.

Robert Moni is currently working as a Senior Machine Learning Engineer at Continental, Hungary.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Interpretable Reinforcement Learning for Visual Policies

Posted on July 2, 2025 | by Duckietown Admin

General Information

Title: Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
Authors: Wenjie Shi, Gao Huang, Shiji Song, Zhuoyuan Wang, Tingyu Lin, Cheng Wu
Institution: Tsinghua University, Beijing, China
Citation: W. Shi, G. Huang, S. Song, Z. Wang, T. Lin and C. Wu, "Self-Supervised Discovering of Interpretable Features for Reinforcement Learning," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2712-2724, 1 May 2022, doi: 10.1109/TPAMI.2020.3037898.

Interpretable Reinforcement Learning for Visual Policies

Diagram of a two-stage architecture for Interpretable Reinforcement Learning using Self-Supervised Attention Mechanisms in Duckietown — Figure 1. Two-Stage Framework for Interpretable Reinforcement Learning

Heatmaps showing basic attention patterns for Interpretable Reinforcement Learning using Self-Supervised Attention Mechanisms in Duckietown RL agents — Figure 2. Visual Attention Patterns in Reinforcement Learning

Graph comparing returns for Interpretable Reinforcement Learning with Self-Supervised Attention Mechanisms in Duckietown — Figure 3. Expert vs Mask Policy Return Comparison

Comparison of Interpretable Reinforcement Learning with Self-Supervised Attention Mechanisms applied to Atari and Duckietown tasks — Figure 4. RL Performance on Atari Using Masked Inputs

Saliency map comparison for Interpretable Reinforcement Learning with Self-Supervised Attention Mechanisms in Duckietown and Atari agents — Figure 6. Saliency Map Comparison Across Methods

Example of Interpretable Reinforcement Learning mask evaluation using Self-Supervised Attention Mechanisms in Duckietown — Figure 9. Mask Evaluation Example on Duckietown

Masked state sequences showing PPO, SAC, and TD3 agent behavior in Interpretable Reinforcement Learning with Self-Supervised Attention Mechanisms in Duckietown — Figure 12. Visual Comparison of PPO, SAC, and TD3 Agent Behavior

Masked state sequence comparison of Unet, RefineNet, DeepLab-v3, and FC DenseNet for Interpretable Reinforcement Learning using Self-Supervised Attention Mechanisms in Duckietown — Figure 14. Masked State Visualization Across Actor Architectures

Reinforcement Learning (RL) has enabled solving complex problems, especially in relation to visual perception in robotics. An outstanding challenges is that of allowing humans to make sense of the decision making process, so to enable deployment in safety-critical applications such as, e.g., autonomous driving. This work focuses on the problem of interpretable reinforcement learning in vision-based agents.

In particular, this research introduces a self-supervised framework for interpretable reinforcement learning in vision-based agents. The focus lies in enhancing policy interpretability by generating precise attention maps through Self-Supervised Attention Mechanisms (SSAM).

The method does not rely on external labels and works using data generated by a pretrained RL agent. A self-supervised interpretable network (SSINet) is deployed to identify task-relevant visual features. The approach is evaluated across multiple environments, including Atari and Duckietown.

Key components of the method include:

A two-stage training process using pretrained policies and frozen encoders
Attention masks optimized using behavior resemblance and sparsity constraints
Quantitative evaluation using FOR and BER metrics for attention quality
Comparative analysis with gradient and perturbation-based saliency methods
Application across various architectures and RL algorithms including PPO, SAC, and TD3

The proposed approach isolates relevant decision-making cues, offering insight into agent reasoning. In Duckietown, the framework demonstrates how visual interpretability can aid in diagnosing performance bottlenecks and agent failures, offering a scalable model for interpretable reinforcement learning in autonomous navigation systems.

Highlights - interpretable reinforcement learning for visual policies

Here is a visual tour of the implementation of interpretable reinforcement learning for visual policies by the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent’s decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. While several methods have attempted to interpret vision-based RL, most come without detailed explanation for the agent’s behavior. In this paper, we propose a self-supervised interpretable framework, which can discover interpretable features to enable easy understanding of RL agents even for non-experts. Specifically, a self-supervised interpretable network (SSINet) is employed to produce fine-grained attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown, which is a challenging self-driving car simulator environment. The results show that our method renders empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our method provides valuable insight into the internal decision-making process of vision-based RL. In addition, our method does not use any external labelled data, and thus demonstrates the possibility to learn high-quality mask through a self-supervised manner, which may shed light on new paradigms for label-free vision learning such as self-supervised segmentation and detection.

Conclusion - interpretable reinforcement learning for visual policies

Here is the conclusion according to the authors of this paper:

In this paper, we addressed the growing demand for human-interpretable vision-based RL from a fresh perspective. To that end, we proposed a general self-supervised interpretable framework, which can discover interpretable features for easily understanding the agent’s decision-making process. Concretely, a self-supervised interpretable network (SSINet) was employed to produce high-resolution and sharp attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. Then, our method was applied to render empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our work takes a significant step towards interpretable vision-based RL. Moreover, our method exhibits several appealing benefits. First, our interpretable framework is applicable to any RL model taking as input visual images. Second, our method does not use any external labelled data. Finally, we emphasize that our method demonstrates the possibility to learn high-quality mask through a self-supervised manner, which provides an exciting avenue for applying RL to self automatically labelling and label-free vision learning such as self-supervised segmentation and detection.

Did this work spark your curiosity?

Check out the following works on vehicle autonomy with Duckietown:

Project Authors

Wenjie Shi received the BS degree from the School of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2016. He is currently working toward the Ph.D. degree in control science and engineering from the Department of Automation, Institute of Industrial Intelligence and Systems, Tsinghua University, Beijing, China.

Gao Huang (Member, IEEE) received the B.S. degree in automation from Beihang University, Beijing, China, in 2009, and the Ph.D. degree in automation from Tsinghua University, Beijing, in 2015. He is currently an Associate Professor with the Department of Automation, Tsinghua University.

Shiji Song (Senior Member, IEEE) received the Ph.D. degree in mathematics from the Department of Mathematics, Harbin Institute of Technology, Harbin, China, in 1996. He is currently a Professor at the Department of Automation, Tsinghua University, Beijing, China.

Zhuoyuan Wang (IEEE) is currently a Ph. D. student at Carnegie Mellon University, and holds a B.S. degree in control science and engineering in the Department of Automation, Tsinghua University, Beijing, China.

Tingyu Lin received the B.S. degree and the Ph.D. degree in control system from the School of Automation Science and Electrical Engineering at Beihang University in 2007 and 2014, respectively. He is now a Member of China Simulation Federation (CSF).

Cheng Wu received the M.Sc. degree in electrical engineering from Tsinghua University, Beijing, China, in 1966. He is currently a Professor with the Department of Automation, Tsinghua University.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Visual Feedback for Lane Tracking in Duckietown

Visual Feedback for Autonomous Lane Tracking in Duckietown

Posted on June 20, 2025 | by Duckietown Admin

General Information

Title: Lane Following with a Duckiebot Vehicle using Visual Feedback
Authors: Oscar Castro, Axel Céspedes, Roosevelt Ubaldo, Oscar E. Ramos
Institution: Universidad de Ingenieria y Tecnologia - UTEC Lima, Peru
Citation: O. Castro, A. Céspedes, R. Ubaldo and O. E. Ramos, "Lane Following with a Duckiebot Vehicle using Visual Feedback," 2019 IEEE Sciences and Humanities International Research Conference (SHIRCON), Lima, Peru, 2019, pp. 1-4, doi: 10.1109/SHIRCON48091.2019.9024875.

Visual Feedback for Autonomous Lane Tracking in Duckietown

How can vehicle autonomy be achieved by relying only on visual feedback from the onboard camera?

This work presents an implementation of lane following for the Duckietbot (DB17) using visual feedback as the only onboard sensor. The approach relies on real-time lane detection, and pose estimation, eliminating the need for wheel encoders.

The onboard computation is provided by a Raspberry Pi, which performs low-level motor control, while high-level image processing and decision-making are offloaded to an external ROS-enabled computer.

The key technical aspects of the implemented autonomy pipeline include:

Camera calibration to correct fisheye lens distortion;
HSV-based image segmentation for lane line detection;
Aerial perspective transformation for geometric consistency;
Histogram-based color separation of continuous and dashed lines;
Piecewise polynomial fitting for path curvature estimation;
Closed-loop motion control based on computed linear and angular velocities.

The methodology demonstrates the feasibility of using camera-based perception to control robot motion in structured environments. By using Duckiebot and Duckietown as the development platform, this work is another example of how to bridge the gap between real-world testing and cost-effective prototyping, making vehicle autonomy research more accessible in educational and research contexts.

Highlights - visual feedback for lane tracking in Duckietown

Here is a visual tour of the implementation of vehicle autonomy by the authors. For all the details, check out the full paper.

Visual feedback communication structure for vehicle autonomy in Duckietown using Duckiebot — ROS Node Communication

Histogram analysis for lane segmentation using visual feedback in Duckiebot for vehicle autonomy in Duckietown — Lane Histogram Segmentation

Visual feedback processing in Duckiebot for vehicle autonomy in Duckietown using HSV filtering and edge detection — HSV and Edge Detection

Polynomial segmentation using visual feedback in Duckiebot for vehicle autonomy in Duckietown — Polynomial Adjustment Techniques

: Real-Time Lane Overlay

: ROS Node Communication

: System Workflow Overview

: Image Distortion Correction

: Lane Histogram Segmentation

: Perspective Warping Process

: HSV and Edge Detection

: Polynomial Adjustment Techniques

: Duckiebot used at UTEC

Abstract

Here is the abstract of the work, directly in the words of the authors:

The autonomy of a vehicle can be achieved by a proper use of the information acquired with the sensors. Real-sized autonomous vehicles are expensive to acquire and to test on; however, the main algorithms that are used in those cases are similar to the ones that can be used for smaller prototypes. Due to these budget constraints, this work uses the Duckiebot as a testbed to try different algorithms as a first step to achieve full autonomy. This paper presents a methodology to properly use visual feedback, with the information of the robot camera, in order to detect the lane of a circuit and to drive the robot accordingly.

Conclusion - visual feedback for lane tracking in Duckietown

Here is the conclusion according to the authors of this paper:

Autonomous cars are currently a vast research area. Due to this increase in the interest of these vehicles, having a costeffective way to implement algorithms, new applications, and to test them in a controlled environment will further help to develop this technology. In this sense, this paper has presented a methodology for following a lane using a cost-effective robot, called the Duckiebot, using visual feedback as a guide for the motion. Although the whole system was capable of detecting the lane that needs to be followed, it is still sensitive to illumination conditions. Therefore, in places with a lot of lighting and brightness variations, the lane recognition algorithm can affect the autonomy of the vehicle.
As future work, machine learning, and particularly convolutional neural networks, is devised as a means to develop robust lane detectors that are not sensitive to brightness variation. Moreover, more than one Duckiebot is intended to drive simultaneously in the Duckietown.

Did this work spark your curiosity?

Check out the following works on vehicle autonomy with Duckietown:

Project Authors

Oscar Castro is currently working at Blume, Peru.

Axel Eliam Céspedes Duran is currently working as a Laboratory Professor of the Industrial Instrumentation course at the UTEC – Universidad de Ingeniería y Tecnología, Peru.

Roosevelt Jhans Ubaldo Chavez is currently working as a Laboratory Professor of the Industrial Instrumentation course at the UTEC – Universidad de Ingeniería y Tecnología, Peru.

Oscar E. Ramos is currently working toward the Ph.D. degree in robotics with the Laboratory for Analysis and Architecture of Systems, Centre National de la Recherche Scientifique, University of Toulouse, Toulouse, France.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Reproducible Sim-to-Real Traffic Signal Control Environment

Posted on June 12, 2025 | by Duckietown Admin

General Information

Title: Reproducible and Low-cost Sim-to-Real Environment for Traffic Signal Control
Authors: Yiran Zhang, Khoa Vo, Longchao Da, Tiejin Chen, Xiaoou Liu, Hua Wei
Institution: Arizona State University, USA
Citation: Zhang, Y., Vo, K., Da, L., Chen, T., Liu, X. and Wei, H., 2025, May. Reproducible and Low-cost Sim-to-Real Environment for Traffic Signal Control. In Proceedings of the ACM/IEEE 16th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2025) (pp. 1-2).

Reproducible Sim-to-Real Traffic Signal Control Environment

As urban environments become increasingly populated and automobile traffic soars, with US citizens spending on average 54 hours a year stuck on the roads, active traffic control management promises to mitigate traffic jams while maintaining (or improving) safety.

LibSignal++ is a Duckietown-based testbed for reproducible and low-cost sim-to-real evaluation of traffic signal control (TSC) algorithms. Using Duckietown enables consistent, small-scale deployment of both rule-based and learning-based TSC models.

LibSignal++ integrates visual control through camera-based sensing and object detection via the YOLO-v5 model. It features modular components, including Duckiebots, signal controllers, and an indoor positioning system for accurate vehicle trajectory tracking. The testbed supports dynamic scenario replication by enabling both manual and automated manipulation of sensor inputs and road layouts.

Key aspects of the research include:

Sim-to-real pipeline for Reinforcement Learning (RL)-based traffic signal control training and deployment
Multi-simulator training support with SUMO, CityFlow, and CARLA
Reproducibility through standardized and controllable physical components
Integration of real-world sensors and visual control systems
Comparative evaluation using rule-based policies on 3-way and 4-way intersections

The work concludes with plans to extend to Machine Learning (ML)-based TSC models and further sim-to-real adaptation.

Highlights - Reproducible Sim-to-Real Traffic Signal Control Environment

Here is a visual tour of the sim-to-real work of the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

This paper presents a unique sim-to-real assessment environment for traffic signal control (TSC), LibSignal++, featuring a 14-ft by 14-ft scaled-down physical replica of a real-world urban roadway equipped with realistic traffic sensors such as cameras, and actual traffic signal controllers. Besides, it is supported by a precise indoor positioning system to track the actual trajectories of vehicles. To generate various plausible physical conditions that are difficult to replicate with computer simulations, this system supports automatic sensor manipulation to mimic observation changes and also supports manual adjustment of physical traffic network settings to reflect the influence of dynamic changes on vehicle behaviors. This system will enable the assessment of traffic policies that are otherwise extremely difficult to simulate or infeasible for full-scale physical tests, providing a reproducible and low-cost environment for sim-to-real transfer research on traffic signal control problems.

Results

Three traffic control policies were tested over a number of experiment repetitions, evaluating each time traffic throughput, average vehicle waiting times, and vehicle battery consumption. Standard deviations for all policies were found to be within acceptable ranges, leading the authors to confirm the ability of the testbed to deliver reproducible results within controlled environments.

Did this work spark your curiosity?

Check out the follow works on machine learning with Duckietown:

Project Authors

Yiran Zhang is associated with the Arizona State University, USA.

Khoa Vo is associated with the Arizona State University, USA.

Longchao Da is pursuing his Ph.D. at the Arizona State University, USA.

Tiejin Chen is pursuing his Ph.D. at the Arizona State University, USA.

Xiaoou Liu is pursuing her Ph.D. at the Arizona State University, USA.

Hua Wei is an Assistant Professor at the School of Computing and Augmented Intelligence, Arizona State University, USA.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Adapting World Models with Latent-State Dynamics Residuals

Posted on May 24, 2025 | by Akshet Patel

General Information

Title: Adapting World Models with Latent-State Dynamics Residuals
Authors: JB Lanier, Kyungmin Kim, Armin Karamzade, Yifei Liu, Ankita Sinha, Kat He, Davide Corsi, Roy Fox
Institution: University of California Irvine, USA
Citation: Lanier, J.B., Kim, K., Karamzade, A., Liu, Y., Sinha, A., He, K., Corsi, D. and Fox, R., 2025. Adapting World Models with Latent-State Dynamics Residuals. arXiv preprint arXiv:2504.02252.

Adapting World Models with Latent-State Dynamics Residuals

Training agents for robotics applications requires a substantial amount of data, which is typically costly to collect in the real world. Running simulations is, therefore a logical approach to training agents. But to what degree do simulations provide information that correctly predicts behavior in the real world? In other words, how well do “things” learned in simulation transfer to reality? Sim2Real transfer is an exciting topic and an active area of research.

Simulation-based reinforcement learning often encounters transfer failures due to discrepancies between simulated and real-world dynamics.

This work introduces a method for model adaptation using Latent-State Dynamics Residuals, which correct transition functions in a learned latent space. A latent-variable world model, DRAW, is trained in simulation using variational inference to encode high-dimensional observations into compact multi-categorical latent variables.

The forward dynamics are modeled via autoregressive prediction of latent transitions. A residual learning function is trained on a small, offline real-world dataset without reward supervision to adjust the simulated dynamics. The resulting model, ReDRAW, modifies the forward dynamics logits using residual corrections and enables policy training via actor-critic reinforcement learning on imagined rollouts.

The reward model is reused from the simulation without retraining. To generate diverse training data, the method uses Plan2Explore, which promotes exploration by maximizing model uncertainty. Visual encoders trained in simulation are reused for real-world inputs through zero-shot perception transfer, without fine-tuning.

The approach avoids explicit observation-space correction and operates entirely in the latent space, achieving efficient sim-to-real policy deployment.

Highlights - adapting world models with latent-state dynamics residuals

Here is a visual tour of the sim-to-real work of the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

Simulation-to-reality (sim-to-real) reinforcement learning (RL) faces the critical challenge of reconciling discrepancies between simulated and real-world dynamics, which can severely degrade agent performance. A promising approach involves learning corrections to simulator forward dynamics represented as a residual error function, however this operation is impractical with high-dimensional states such as images. To overcome this, we propose ReDRAW, a latent-state autoregressive world model pretrained in simulation and calibrated to target environments through residual corrections of latent-state dynamics rather than of explicit observed states. Using this adapted world model, ReDRAW enables RL agents to be optimized with imagined rollouts under corrected dynamics and then deployed in the real world. In multiple vision-based MuJoCo domains and a physical robot visual lane-following task, ReDRAW effectively models changes to dynamics and avoids overfitting in low data regimes where traditional transfer methods fail.

Limitations and Future Work - adapting world models with latent-state dynamics residuals

Here are the limitations and future work according to the authors of this paper:

A potential limitation with ReDRAWis that it excels at maintaining high target-environment performance over many updates because the residual avoids overfitting due to its low complexity. This suggests that only conceptually simple changes to dynamics may effectively be modeled with low amounts of data, warranting future investigation. We additionally want to explore if residual adaptation methods can be meaningfully applied to foundation world models, efficiently converting them from generators of plausible dynamics to generators of specific dynamics.

Did this work spark your curiosity?

Check out the follow works on machine learning with Duckietown:

Project Authors

JB (John Banister) Lanier is a Computer Science PhD Student at UC Irvine, USA.

Kyungmin Kim is a Computer Science PhD Student at UC Irvine, USA.

Armin Karamzade is a Computer Science PhD Student at UC Irvine, USA.

Yifei Liu is a currently an M.S. in Robotics at Carnegie Mellon University, USA.

Ankita Sinha is currenly working as a senior LLM engineer at NVIDIA, USA.

Kat He was affiliated to UC Irvine, USA during this research.

Davide Corsi is a Postdoctoral Researcher at UC Irvine, USA.

Roy Fox is an Assistant Professor and director of the Intelligent Dynamics Lab (indylab) in the Department of Computer Science in the Donald Bren School of Information & Computer Science at the University of California, Irvine.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Transformer Visual Control for Dynamic Obstacle Avoidance

Posted on May 10, 2025 | by Duckietown Admin

General Information

Title: An Adaptive Obstacle Avoidance Model for Autonomous Robots Based on Dual-Coupling Grouped Aggregation and Transformer Optimization
Authors: Yuhu Tang, Ying Bai, and Qiang Chen
Institution: Hefei University, China
Citation: Tang, Y., Bai, Y., & Chen, Q. (2025). An Adaptive Obstacle Avoidance Model for Autonomous Robots Based on Dual-Coupling Grouped Aggregation and Transformer Optimization. Sensors, 25(6), 1839. https://doi.org/10.3390/s25061839

Transformer Visual Control for Dynamic Obstacle Avoidance

This work details a transformer visual control approach for autonomous robotic obstacle avoidance in dynamic environments. It introduces the GAS-H-Trans model, which integrates a dual-coupling grouped aggregation strategy with transformer-based attention mechanisms.

Key components of the approach include grouped spatial feature aggregation, Harris hawk optimization (HHO) for parameter tuning, and semantic segmentation for real-time visual perception. The output of the segmentation is used to compute potential fields for navigation. An artificial potential field (APF) method, further optimized using particle swarm optimization (PSO), enhances obstacle avoidance. The system was evaluated in Unity3D virtual environments and on datasets including KITTI, and ImageNet.

The model architecture improves local and global feature extraction, enabling adaptive navigation. Simulation results demonstrate that GAS-H-Trans outperforms baseline models in segmentation accuracy and avoidance reliability. The implementation uses Transformer structures, self-attention, and heuristic optimization for enhanced environmental understanding.

Experiments using Duckietown-based simulations confirm that the proposed Transformer Visual Control strategy with GAS-H-Trans significantly improves obstacle avoidance reliability with respect to typical approaches.

Highlights - Transformer Visual Control for Dynamic Obstacle Avoidance

Here is a visual tour of this work. For all the details, check out the full paper.

GAS-H-Trans Transformer Visual Control system architecture for Duckietown robotic navigation — Figure 1. GAS-H-Trans Framework Architecture

Transformer Visual Control model architecture using GAS-H-Trans and grouped aggregation in Duckietown simulation — Figure 2. Detailed Architecture of GAS-H-Trans Model

PSO-optimized artificial potential field for Transformer Visual Control in Duckietown simulation — Figure 3. Artificial Potential Field Optimization Flow

Transformer Visual Control model accuracy chart in Duckietown obstacle avoidance task — Figure 4. Model Accuracy Comparison Across Epochs

Swin Transformer comparison with GAS-H-Trans in Transformer Visual Control on Duckietown data — Figure 6. Swin Transformer vs GAS-H-Trans Performance

Duckietown virtual environment dataset creation for Transformer Visual Control experiments — Figure 7. Dataset Creation Process

GAS-H-Trans Transformer Visual Control segmentation accuracy in Duckietown virtual dataset — Figure 8. Image Segmentation Accuracy in Duckietown Scene

Segmentation output of Transformer Visual Control with JSON mapping in Duckietown task — Figure 9. Segmentation Mask and Data Structure

Transformer Visual Control path planning with GAS-H-Trans and PSO in Duckietown — Figure 10. Obstacle Avoidance in Duckietown Environment

Comparison of traditional and PSO-optimized APF for Transformer Visual Control in Duckietown — Figure 11. Effect of PSO Optimization on APF

Enhanced Transformer Visual Control robot navigation path in Duckietown simulation — Figure 12. Optimized Obstacle Avoidance Path

Transformer Visual Control success rate with traditional vs PSO-optimized APF in Duckietown — Figure 13. Obstacle Avoidance Success Rate

: Figure 1. GAS-H-Trans Framework Architecture

: Figure 2. Detailed Architecture of GAS-H-Trans Model

: Figure 3. Artificial Potential Field Optimization Flow

: Figure 4. Model Accuracy Comparison Across Epochs

: Figure 5. Segmentation Output Comparison

: Figure 6. Swin Transformer vs GAS-H-Trans Performance

: Figure 7. Dataset Creation Process

: Figure 8. Image Segmentation Accuracy in Duckietown Scene

: Figure 9. Segmentation Mask and Data Structure

: Figure 10. Obstacle Avoidance in Duckietown Environment

: Figure 11. Effect of PSO Optimization on APF

: Figure 12. Optimized Obstacle Avoidance Path

: Figure 13. Obstacle Avoidance Success Rate

Abstract

In the author’s words:

Accurate obstacle recognition and avoidance are critical for ensuring the safety and operational efficiency of autonomous robots in dynamic and complex environments. Despite significant advances in deep-learning techniques in these areas, their adaptability in dynamic and complex environments remains a challenge. To address these challenges, we propose an improved Transformer-based architecture, GAS-H-Trans.

This approach uses a grouped aggregation strategy to improve the robot’s semantic understanding of the environment and enhance the accuracy of its obstacle avoidance strategy. This method employs a Transformer-based dual-coupling grouped aggregation strategy to optimize feature extraction and improve global feature representation, allowing the model to capture both local and long-range dependencies.

The Harris hawk optimization (HHO) algorithm is used for hyperparameter tuning, further improving model performance. A key innovation of applying the GAS-H-Trans model to obstacle avoidance tasks is the implementation of a secondary precise image segmentation strategy. By placing observation points near critical obstacles, this strategy refines obstacle recognition, thus improving segmentation accuracy and flexibility in dynamic motion planning. The particle swarm optimization (PSO) algorithm is incorporated to optimize the attractive and repulsive gain coefficients of the artificial potential field (APF) methods.

This approach mitigates local minima issues and enhances the global stability of obstacle avoidance. Comprehensive experiments are conducted using multiple publicly available datasets and the Unity3D virtual robot environment. The results show that GAS-H-Trans significantly outperforms existing baseline models in image segmentation tasks, achieving the highest mIoU (85.2%). In virtual environment obstacle avoidance tasks, the GAS-H-Trans + PSO-optimized APF framework achieves an impressive obstacle avoidance success rate of 93.6%. These results demonstrate that the proposed approach provides superior performance in dynamic motion planning, offering a promising solution for real-world autonomous navigation applications.

Conclusion - Transformer Visual Control for Dynamic Obstacle Avoidance

Here is the author’s summary and overview of lessons learned from this work:

In this study, we proposed the GAS-H-Trans framework for image segmentation and dynamic obstacle avoidance in autonomous robots. The key contributions are summarized as follows. (1) Dual-coupling grouped aggregation strategy: A Transformer-based dualcoupling grouped aggregation method optimizes feature extraction and enhances global feature representation, thereby improving the model’s perception performance in dynamic motion planning. (2) Harris hawk optimization (HHO): The integration of the HHO algorithm into the GAS-Trans framework optimizes the number of Transformer layers and iterations, improving model accuracy and reducing computational costs. (3) PSOoptimized artificial potential field (APF): We integrated the PSO algorithm with APF to optimize the attractive and repulsive gain coefficients, addressing local minima issues and enhancing the global stability of the obstacle avoidance system.

This study also proposes a secondary precise image segmentation strategy. By setting the observation points near critical obstacles for fine-tuned segmentation, the flexibility and accuracy of the segmentation model’s environmental perception are effectively enhanced, thereby improving the robot’s obstacle avoidance capabilities.

Through the integration of PSO-optimized APF with image segmentation, the GAS-HTrans + PSO-optimized APF framework demonstrated significant improvements in obstacle avoidance. In the experimental validation of this study, the obstacles remained static throughout the navigation process. Using this method, the autonomous robot dynamically adjusted its obstacle avoidance trajectory based on segmented environmental features. This integration significantly enhanced environmental perception capabilities and the accuracy of obstacle avoidance decisions, enabling more efficient navigation in static obstacle environments.

Extensive experiments on publicly available datasets (Duckiebot, KITTI, ImageNet) and in the Unity3D virtual robot environment validate the effectiveness of the proposed framework. The GAS-H-Trans framework outperformed traditional models in image segmentation tasks, achieving the highest mIoU of 85.2%. Furthermore, in virtual obstacle avoidance experiments, the GAS-H-Trans + PSO-optimized APF framework achieved an obstacle avoidance success rate of 93.6%.

These results effectively validate the proposed strategy, which combines secondary image segmentation from GAS-H-Trans with the PSO-optimized APF method, significantly improving obstacle avoidance performance in dynamic motion planning. Additionally, the GAS-H-Trans framework has the potential to be extended to fully dynamic environments by incorporating real-time object tracking and adaptive obstacle modeling. However, some limitations exist. The majority of the experiments were conducted in simulated environments, and future research will focus on validating the framework in real-world scenarios and improving real-time performance.

Additionally, the integration of multi-modal sensor data (such as LiDAR and ultrasonic sensors) will be an important direction for future work to further enhance environmental perception and robustness.

In conclusion, the new framework offers an innovative solution for autonomous robot obstacle avoidance in dynamic motion planning. Its powerful environmental perception and obstacle avoidance performance demonstrate significant potential for practical applications. With further optimization and real-world validation, this framework will play a crucial role in the future development of autonomous navigation and robotics technology.

Did this work spark your curiosity?

Check out the follow works on machine learning with Duckietown:

Project Authors

Yuhu Tang is affiliated with the School of Artificial Intelligence and Big Data, Hefei University, Hefei 230601, China.

Ying Bai is affiliated with the School of Artificial Intelligence and Big Data, Hefei University, Hefei 230601, China.

Qiang Chen is affiliated with School of Electrical Engineering and Automation
National and Local Joint Engineering Laboratory for Renewable Energy Access to Grid Technology, Hefei University of Technology, Hefei, China, Hefei University, Hefei 230601, China.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

VAE-Based Out-of-Distribution Detectors for Embedded Deployment

VAE-Based Out-of-Distribution Detectors for Embedded Systems

Posted on April 11, 2025 | by Duckietown Admin

General Information

Title: Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment
Authors: Aditya Bansal, Michael Yuhas, Arvind Easwaran
Institution: Nanyang Technological University, SIngapore
Citation: Bansal, A., Yuhas, M., & Easwaran, A. (2024). Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment. In Proceedings - 2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2024 (pp. 37-42).

VAE-Based Out-of-Distribution Detectors for Embedded Systems

Out-of-distribution (OOD) detection is essential for maintaining safety in machine learning systems, especially those operating in the real world. It helps identify inputs that differ significantly from the training data, which could lead to unexpected or unsafe behavior.

Variational Autoencoders (VAEs) are neural networks that compress input data into a smaller latent space (a compact set of features) and reconstructs the input from this compressed version.

In OOD detection, if the reconstruction fails or doesn’t fit the expected latent space, the input is flagged as unfamiliar, i.e., out-of-distribution. While VAEs are effective, they are computationally expensive, making them hard to deploy on small, embedded devices like Duckiebots.

To solve this challenge, building upon previous work (Embedded Out-of-Distribution Detection on an Autonomous Robot Platform), the researchers applied three model compression techniques:

Pruning: Removes low-importance weights or neurons to shrink and speed up the model.
Knowledge distillation: Trains a smaller “student” model to mimic a larger “teacher” model.
Quantization: Lowers numerical precision (e.g., from 32-bit to 8-bit) to save memory and improve speed.

Two VAE-based OOD detectors were evaluated:

β-VAE: A variant of VAE that learns more interpretable latent features (controlled by a parameter called β).
Optical Flow Detector: Analyzes how pixels move across video frames to detect unusual motion.

Both models were trained and tested using data collected in Duckietown, and the models were measured using Area under the Receiver Operating Characteristic Curve (AUROC), which shows how well the model separates known from unknown inputs, memory footprint, and execution latency. The compressed models achieved faster inference times, smaller memory usage, and only minor drops in detection accuracy.

Highlights - VAE-Based Out-of-Distribution Detectors for Embedded Systems

Here is a visual tour of the work of the authors. For all the details, check out the full paper.

Visual control architecture in Duckietown showing VAE OOD compression methodology — Figure 1. Design Methodology for VAE Compression

Visual control models in Duckietown showing β-VAE and Optical Flow OOD detectors — Figure 2. OOD Detector Block Diagrams

Visual control metrics in Duckietown with β-VAE under varying sparsity — Figure 4. β-VAE AUROC vs. Sparsity

Visual control performance in Duckietown using various compression methods — Figure 5. Compression Impact on β-VAE

Visual control optimization in Duckietown using compressed optical flow models — Figure 6. Knowledge Distillation Effects on Optical Flow Model

: Figure 1. Design Methodology for VAE Compression

: Figure 2. OOD Detector Block Diagrams

: Figure 3. Reconstructed Images from β-VAE

: Figure 4. β-VAE AUROC vs. Sparsity

: Figure 5. Compression Impact on β-VAE

: Figure 6. Knowledge Distillation Effects on Optical Flow Model

: Figure 7. Sparsity Impact on Optical Flow Models

Abstract

In the author’s words:

Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model’s training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation.

These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE’s test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector.

We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.

Conclusion - VAE-Based Out-of-Distribution Detectors for Embedded Systems

Here are the conclusions from the author of this paper:

We explored different neural network compression techniques on β-VAE and optical flow OOD detectors using a mobile robot powered by a Jetson Nano. Based on our analysis of results for quantization, knowledge distillation, and pruning, we proposed a design strategy to find the model with the best execution time and memory usage while maintaining some accuracy metric for a given VAE-based OOD detector. We successfully demonstrated this methodology on an optical flow OOD detector and showed that our methodology’s ability to aggressively prune and compress a model is due to the unique attributes of VAE-based OOD detection.

Despite our methodology’s good performance, it requires access to OOD samples at design time to act as a crossvalidation set. In our case study, we assume OOD samples arise from a particular generating distribution, but this may not be the case in general. Furthermore, it only guides the search for a faster architecture, but does not guarantee the optimum result. Nevertheless, we believe having a design methodology that combines quantization, knowledge distillation, and pruning allows engineers to exploit the combined powers of these techniques instead of considering them individually.

Project Authors

Aditya Bansal is currently working as a Machine Learning Engineer at Adobe, United States.

Michael Yuhas is currenly working as a Research Assistant at Nanyang Technological University, Singapore.

Arvind Easwaran is an Associate Professor at Nanyang Technological University, Singapore.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Semantic Image Segmentation Methods in Duckietown

Posted on March 27, 2025 | by Duckietown Admin

General Information

Title: Semantic Image Segmentation Methods in the Duckietown Project
Authors: Kristina S. Lanchukovskaya; Dasha E. Shabalina; Tatiana V. Liakh
Institution: Novosibirsk State University, The Russian Federation
Citation: Lanchukovskaya, K.S., Shabalina, D.E. and Liakh, T.V., 2022, June. Semantic image segmentation methods in the duckietown project. In 2022 IEEE 23rd International Conference of Young Professionals in Electron Devices and Materials (EDM) (pp. 611-617). IEEE.

Semantic Image Segmentation Methods in Duckietown

In Duckietown, where self-driving agents (i.e., Duckiebots) operate in structured environments, segmentation is essential for lane detection, object recognition, and obstacle avoidance. Semantic Image Segmentation assigns a class label to each pixel in an image, allowing autonomous systems to interpret their surroundings.

This research evaluates four deep learning models – SegNet, U-Net, FC-DenseNet, and DeepLab-v3 by comparing their efficiency, accuracy, and real-time applicability. Understanding the trade-offs between these models helps optimize perception for Duckiebots navigating the Duckietown.

These models rely on Convolutional Neural Networks (CNNs) to extract hierarchical features. SegNet prioritizes memory efficiency, U-Net incorporates skip connections for improved localization, FC-DenseNet enhances feature reuse through dense connectivity, and DeepLab-v3 captures multi-scale context with atrous spatial pyramid pooling. Each model presents a balance between computational cost and segmentation accuracy, influencing its suitability for embedded systems like Duckiebots.

Implementing semantic segmentation in Duckietown enhances autonomy by enabling self-driving agents to interpret complex visual inputs. The selection of an appropriate segmentation model depends on processing constraints and real-time performance needs. By integrating optimized segmentation techniques, Duckiebots improve decision-making in structured environments.

Highlights - Semantic Image Segmentation Methods in Duckietown

Here is a visual tour of the work of the authors. For all the details, check out the full paper.

Abstract

In the author’s words:

The article focuses on evaluation of the applicability of existing semantic segmentation algorithms for the Duckietown simulator. Duckietown is an open research project in the field of autonomously controlled robots. The article explores classical semantic image segmentation algorithms. Their analysis for applicability in Duckietown is carried out.

With the help of them, we want to make a dataset for training neural networks. The following was investigated: edge-detection techniques, threshold algorithms, region growing, segmentation algorithms based on clustering, neural networks. The article also reviewed networks designed for semantic image segmentation and machine learning frameworks, taking into account all the limitations of the Duckietown simulator.

Experiments were conducted to evaluate the accuracy of semantic segmentation algorithms on such classes of Duckietown objects as road and background. Based on the results of the analysis, region growing algorithms and clustering algorithms were selected and implemented.

Experiments were conducted to evaluate the accuracy on such classes of Duckietown objects as road, background and traffic signs. After evaluating the accuracy of the algorithms considered, it was decided to use Color segmentation, Mean Shift, Thresholding algorithms and Segmentation of signs by April-tag for image preprocessing. For neural networks, experiments were conducted to evaluate the accuracy of semantic segmentation algorithms on such classes of Duckietown objects as road and background. After evaluating the accuracy of the algorithms considered, it was decided to select the DeepLab-v3 neural network. Separate module was created for semantic image segmentation in Duckietown.

Conclusion - Semantic Image Segmentation Methods in Duckietown

Here are the conclusions from the author of this paper:

The article analyzes the applicability of semantic segmentation algorithms in the Duckietown simulator, which simulates autopilot robots in an urban environment.

It was found that methods based on classical computer vision algorithms are inferior to methods based on neural networks in terms of stability, segmentation accuracy and speed of operation. It was proposed to use classical computer vision algorithms for marking images and preparing datasets and neural networks for segmentation on robots.

CV algorithms taking into account the features of the Duckietown simulator. Thus, classical computer vision algorithms, such as area-building algorithms and clustering algorithms, were chosen for image preprocessing. OpenCV and Scikit-image libraries were selected for the experiment. The best result during the testing was obtained using MeanShift and cv2.threshold together, and road signs were segmented most successfully using April tag.

Also, after testing the selected neural networks, it was decided to select the DeepLab-v3 neural network as an adapted semantic segmentation algorithm for the Duckietown simulator. After testing the trained DeepLab-v3 neural network model on Duckiebot, a separate module for semantic image segmentation was created in the Duckietown open research project. In the future, it is planned to add such classes of Duckietown objects as a duck in the role of a pedestrian, road markings (red, yellow, white) and Duckiebot.

Project Authors

Kristina S. Lanchukovskaya is affiliated with the department of IT, Novosibirsk State University, Novosibirsk, Russia.

Dasha E. Shabalina is affiliated with the department of IT, Novosibirsk State University, Novosibirsk, Russia.

Tatiana V. Liakh is a Senior Lecturer at the Department of Computer Science, Electrical and Space Engineering, Novosibirsk State University, Novosibirsk, Russia.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Proxy Domains for Evaluation and Learning

Posted on March 15, 2025 | by Duckietown Admin

General Information

Title: On Assessing the Usefulness of Proxy Domains for Developing and Evaluating Embodied Agents
Authors: Anthony Courchesne, Andrea Censi, Liam Paull
Institution: Montreal Robotics and Embodied AI Lab (REAL) at Université de Montréal, Mila.
Citation: A. Courchesne, A. Censi and L. Paull, "On Assessing the Usefulness of Proxy Domains for Developing and Evaluating Embodied Agents," 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021, pp. 4298-4305, doi: 10.1109/IROS51168.2021.9635977.

Proxy Domains for Evaluation and Learning

Running robotics experiments in the real world is often costly in terms of time, money, and effort. For this reason, robotics development and testing rely on proxy domains (e.g., simulations) before real-world deployment. But how to gauge the degree of usefulness of using proxy domains in the development process, and are all domains equally useful?

Intuitively, the answer to the above questions will depend on the type of robot, the task it has to achieve, and the environment in which it operates. Evaluating a proxy domain’s usefulness for a specific combination of these circumstances, specifically for the training of autonomous agents, is tackled in this work by establishing quantification metrics and assessing them in Duckietown.

The key aspects of this work are:

Proxy Usefulness Metrics: introduction of Proxy Relative Predictivity Value (PRPV) and Proxy Learning Value (PLV) to measure a proxy’s ability to predict real-world performance and aid agent learning. PRPV helps identify simulations that accurately predict real-world results, while PLV measures their effectiveness in training agents.
Prediction vs. Learning: differentiation of proxies used for accurate performance prediction from those for data generation in training.
Experiments: demonstration of how tuning proxy domain parameters (e.g., sensor delays, camera angle) affects predictivity and learning efficiency.

These metrics improve proxy selection and tuning for robotics research and education, and Duckietown enables rapid prototyping of these ideas for mobile autonomous vehicles.

Highlights - Proxy Domains for Evaluation and Learning in Duckietown

Here is a visual tour of the work of the authors. For all the details, check out the full paper.

Proxy domains and Duckietown comparison for evaluating robotics task performance. — Figure 1. Comparing Proxy Domain Performance in Robotics

Proxy domains and Duckietown agent interaction for simulation vs. real robot evaluation. — Figure 2. Agent-Environment Interface in Proxy and Target Domains

Proxy domains and Duckietown usefulness metrics including PRPV, PPV, SRCC, and POD. — Figure 3. Proxy Usefulness Metrics Comparison

Proxy domains and Duckietown evaluation for AI Driving Olympics parameter optimization. — Figure 4. Evaluating Duckietown Proxy Domains for AIDO Challenge

Proxy domains and Duckietown side-by-side visualization of environment and trajectories. — Figure 5. Proxy vs. Target Domain in Duckietown

Proxy domains and Duckietown performance comparison for imitation learning agents. — Figure 6. Performance of an Imitation Learning Agent in Duckietown

: Figure 1. Comparing Proxy Domain Performance in Robotics

: Figure 2. Agent-Environment Interface in Proxy and Target Domains

: Figure 3. Proxy Usefulness Metrics Comparison

: Figure 4. Evaluating Duckietown Proxy Domains for AIDO Challenge

: Figure 5. Proxy vs. Target Domain in Duckietown

: Figure 6. Performance of an Imitation Learning Agent in Duckietown

Abstract

In the author’s words:

In many situations it is either impossible or impractical to develop and evaluate agents entirely on the target domain on which they will be deployed. This is particularly true in robotics, where doing experiments on hardware is much more arduous than in simulation. This has become arguably more so in the case of learning-based agents. To this end, considerable recent effort has been devoted to developing increasingly realistic and higher fidelity simulators. However, we lack any principled way to evaluate how good a “proxy domain” is, specifically in terms of how useful it is in helping us achieve our end objective of building an agent that performs well in the target domain. In this work, we investigate methods to address this need. We begin by clearly separating two uses of proxy domains that are often conflated: 1) their ability to be a faithful predictor of agent performance and 2) their ability to be a useful tool for learning. In this paper, we attempt to clarify the role of proxy domains and establish new proxy usefulness (PU) metrics to compare the usefulness of different proxy domains. We propose the relative predictive PU to assess the predictive ability of a proxy domain and the learning PU to quantify the usefulness of a proxy as a tool to generate learning data. Furthermore, we argue that the value of a proxy is conditioned on the task that it is being used to help solve. We demonstrate how these new metrics can be used to optimize parameters of the proxy domain for which obtaining ground truth via system identification is not trivial.

Conclusion - Proxy Domains for Evaluation and Learning in Duckietown

Here are the conclusions from the author of this paper:

“We introduce new metrics to assess the usefulness of proxy domains for agent learning. In a robotics setting it is common to use simulators for development and evaluation to reduce the need to deploy on real hardware. We argue that it is necessary to to take into account the specific task when evaluating the usefulness of the the proxy. We establish novel metrics for two specific uses of a proxy. When the proxy domain is used to predict performance in the target domain, we offer the PRPV to assess the usefulness of the proxy as a predictor, and we argue that the task needs to be imposed but not the agent. When a proxy is used to generate training data for a learning algorithm, we propose the PLV as a metric to assess usefulness of the source domain, which is dependent on a specific task and a learning algorithm. We demonstrated the use of these measures for predicting parameters in the Duckietown environment. Future work will involve more rigorous treatment of the optimization problems posed to find optimal parameters, possibly in connection with differentiable simulation environments.”

Project Authors

Anthony Courchesne is currently working as an MLOps Engineer ar Maneva, Canada.

Andrea Censi is currently working as the Deputy Director, Chair of Dynamic Systems and Control at ETH Zurich, Switzerland.

Liam Paull is an Associate Professor at the Universite de Montreal, Canada and also serves as the Chief Education Officer at Duckietown.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.