Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Enhancing Visual Domain Randomization for Sim2Real Transfer

General Information

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Image showing the high level overview of the proposed method in the research Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

One of the classical objections made to machine learning approaches to embeddded autonomy (i.e., to create agents that are deployed on real, physical, robots) is that training requires data, data requires experiement, and experiment are “expensive” (time, money, etc.). 

The natural counter argument to this is to use simulation to create the training data, because simulations are much less expensive than real world experiment; they can be ran continuously, with accellerated time, don’t require supervision, nobody gets tired, etc. 

But, as the experienced roboticist knows, “simulations are doomed to succeed”. This phrase encapsulates the notion that simulations do not contain the same wealth if information as the real world, because they are programmed to be what the programmer wants them to be useful for – they do not capture the complexity of the real world. Eventually things will “work” in simulation, but does that mean they will “work” in the real-world, too?

As Carl Sagan once said: “If you wish to make an applie pie from scratch, you must first reinvent the universe”. 

Domain randomization is an approach to mitigate the limitations of simulations. Instead of training an agent on one set of parameters defining the simulation, many simulations are instead ran, with different values of this parameters. E.g., in the context of a driving simulator like Duckietown, one set of parameters could make the sky purple instead of blue, or the lane markings have slightly different geometric properties, etc. The idea behind this approach is that the agent will be trained on a distribution of datasets that are all slightly different, hopefully making the agent more robust to real world nuisances once deployed in a physical body. 

In this paper,  the authors investigate specifically visual domain randomization. 

Learn about RL, navigation, and other robot autonomy topics at the link below!


In order to train reinforcement learning algorithms, a significant amount of experience is required, so it is common practice to train them in simulation, even when they are intended to be applied in the real world. To improve robustness, camerabased agents can be trained using visual domain randomization, which involves changing the visual characteristics of the simulator between training episodes in order to improve their resilience to visual changes in their environment.

In this work, we propose a method, which includes realworld images alongside visual domain randomization in the reinforcement learning training procedure to further enhance the performance after sim-to-real transfer. We train variational autoencoders using both real and simulated frames, and the representations produced by the encoders are then used to train reinforcement learning agents.

The proposed method is evaluated against a variety of baselines, including direct and indirect visual domain randomization, end-to-end reinforcement learning, and supervised and unsupervised state representation learning.

By controlling a differential drive vehicle using only camera images, the method is tested in the Duckietown self-driving car environment. We demonstrate through our experimental results that our method improves learnt representation effectiveness and robustness by achieving the best performance of all tested methods.

Highlights - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Conclusion - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here are the conclusions from the authors of this paper:

“In this work we proposed a novel method for learning effective image representations for reinforcement learning, whose core idea is to train a variational autoencoder using visually randomized images from the simulator, but include images from the real world as well, as if it was just another visually different version of the simulator.

We evaluated the method in the Duckietown self-driving environment on the lane-following task, and our experimental results showed that the image representations of our proposed method improved the performance of the tested reinforcement learning agents both in simulation and reality. This demonstrates the effectiveness and robustness of the representations learned by the proposed method. We benchmarked our method against a wide range of baselines, and the proposed method performed among the best in all cases.

Our experiments showed that using some type of visual domain randomization is necessary for a successful simto- real transfer. Variational autoencoder-based representations tended to outperform supervised representations, and both outperformed representations learned during end-to-end reinforcement learning. Also, for visual domain randomization, when using no real images, invariance regularization-based methods seemed to outperform direct methods. Based on our results, we conclude that including real images in simulation-based reinforcement learning trainings is able to enhance the real world performance of the agent – when using the two-stage approach, proposed in this paper.”

Project Authors

András Béres is currently working as a Junior Deep Learning Engineer at Continental, Hungary.

Bálint Gyires-Tóth is an associate professor at
Budapest University of Technology and Economics, Hungary.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Reward Consistency for Interpretable Feature Discovery in RL

General Information

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Interpretable feature discovery RL

What is interpretable feature discovery in reinforcement learning?

To understand this, let’s introduce a few important topics:

Reinforcement Learning (RL): A machine learning approach where an agent gains the ability to make decisions by engaging with an environment to accomplish a specific objective. Interpretable Feature Discovery in RL is an approach that aims to make the decision-making process of RL agents more understandable to humans.

The need for interpretability: In real-world applications, especially in safety-critical domains like self-driving cars, it is crucial to understand why an RL agent makes a certain decision. Interpretability helps:

  • Build trust in the system
  • Debug and improve the model
  • Ensure compliance with regulations and ethical standards
  • Understand fault if accidents arise

Feature discovery: Feature discovery in this context refers to identifying the key artifacts (features) of the environment that the RL agent is focusing on while making decisions. For example, in a self-driving car simulation, relevant features might include the position of other cars, road signs, or lane markings.

Learn about RL, navigation, and other robot autonomy topics at the link below!


The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this article, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. 

It may lead to irrelevant or misplaced feature attribution when different DNNs’ outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. 

We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle’s limitations.

Highlights - Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Here is a visual tour of the work of the authors. For more details, check out the full paper.


Here are the conclusions from the authors of this paper:

“In this article, we discussed the limitations of the commonly used assumption, the action matching principle, in RL interpretation methods. It is suggested that action matching cannot truly interpret the agent since it differs from the reward-oriented goal of RL. Hence, the proposed method first leverages reward consistency during feature attribution and models the interpretation problem as a new RL problem, denoted as RL-in-RL. 

Moreover, it provides an adjustable observation length for one-step reward or multistep reward (or return) consistency, depending on the requirements of behavior analyses. Extensive experiments validate the proposed model and support our concerns that action matching would lead to redundant and noncausal attention during interpretation since it is dedicated to exactly identical actions and thus results in a sort of “overfitting.”

 Nevertheless, although RL-in-RL shows superior interpretability and dispenses with redundant attention, further exploration of interpreting RL tasks with explicit causality is left for future work.”

Project Authors

Qisen Yang is an Artificial Intelligence PhD Student at Tsinghua University, China.

Huanqian Wang is currently pursuing the B.E. degree in control science and engineering with the Department of Automation, Tsinghua University, Beijing, China.

Mukun Tong is currently pursuing the B.E. degree in control science and engineering with the Department of Automation, Tsinghua University,
Beijing, China.

Wenjie Shi received his Ph.D. degree in control science and engineering from the Department of Automation, Institute of Industrial Intelligence and System, Tsinghua University, Beijing, China, in 2022.

Guang-Bin Huang is in the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.

Shiji Song is currently a Professor with the Department of Automation, Tsinghua University, Beijing, China.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Graph autonomous bots history

Towards Autonomous Driving with Small-Scale Cars: A Survey of Recent Development

General Information

Towards Autonomous Driving with Small-Scale Cars: A Survey of Recent Development

Towards Autonomous Driving with Small-Scale Cars: A Survey of Recent Development

Towards Autonomous Driving with Small-Scale Cars: A Survey of Recent Development by Dianzhao Li, Paul Auerbach, and Ostap Okhrin is a review that highlights the rapid development of the industry and the important contributions of small-scale car platforms to robot autonomy research.

This survey is a valuable resource for anyone looking to get their bearings in the landscape of autonomous driving research.

We are glad see Duckietown – not only included on the list – but identified as one of the platforms that started a marked increase in the trend of yearly published papers. 

The mission of Duckietown, since we started at as a class at MIT, is to democratize access to the science and technology of robot autonomy. Part of how we intended to achieve this mission was to streamline the way autonomous behaviors for non-trivial robots were developed, tested and deployed in the real world. 

From 2018-2021 we ran several editions of the AI Driving Olympics (AI-DO): an international competition to benchmark the state of the art of embodied AI for safety-critical applications. It was a great experience – not only because it led to the development of the Challenges infrastructure, the Autolab infrastructure, and many agent baselines that catalyze further developments that are now available to the broader community, but even because it was the first time physical robots were brought the world’s leading scientific conference in Machine Learning (NeurIPS: the Neural Information Processing Systems conference – known as NIPS the first time AI-DO was launched). 

All this infrastructure development and testing might have been instrumental in making R&D in autonomous mobile robotics more efficient. Practitioners in the field know-how doing R&D is particularly difficult because final outcomes are the result of the tuple (robot) x (environment) x (task) – so not standardizing everything other than the specific feature under development (i.e., not following the ceteris paribus principle) often leads to apples and pair comparisons, i.e., bad science, which hampers the overall progress of the field.

We are happy to see Duckietown recognized as a contributor to facilitating the making of good science in the field. We beleive that even better and more science will come in the next years, as the students being educated with the Duckietown system start their professional journeys in academia or the workforce.

We are excited to see what the future of robot autonomy will look like, and we will continue doing our best by providing tools, workflows, and comprehensive resources to facilitate the professional development of the next generations of scientists, engineers, and practicioners in the field!

To learn more about Duckietown teaching resources follow the link below.

Starting around 2016, with the introduction of Duckietown, BARC, and Autorally, there was a significant increase in research papers.


We report the abstract of the authors’ work:

“While engaging with the unfolding revolution in autonomous driving, a challenge presents itself, how can we effectively raise awareness within society about this transformative trend? While full-scale autonomous driving vehicles often come with a hefty price tag, the emergence of small-scale car platforms offers a compelling alternative. 

These platforms not only serve as valuable educational tools for the broader public and young generations but also function as robust research platforms, contributing significantly to the ongoing advancements in autonomous driving technology. 

This survey outlines various small-scale car platforms, categorizing them and detailing the research advancements accomplished through their usage. The conclusion provides proposals for promising future directions in the field.”

Towards Autonomous Driving with Small-Scale Cars: A Survey of Recent Development

Here is a visual tour of the work. For more details, check out the full paper.

Summary and conclusion

Here is what the authors learned from this survey:

“In this paper, we offer an overview of the current state-of-the- art developments in small-scale autonomous cars. Through a detailed exploration of both past and ongoing research in this domain, we illuminate the promising trajectory for the advancement of autonomous driving technology with small-scale cars. We initially enumerate the presently predominant small-scale car platforms widely employed in academic and educational domains and present the configuration specifics of each platform. Similar to their full-size counterparts, the deployment of hyper-realistic simulation environments is imperative for training, validating, and testing autonomous systems before real-world implementation. To this end, we show the commonly employed universal simulators and platform-specific simulators.

Furthermore, we provide a detailed summary and categorization of tasks accomplished by small-scale cars, encompassing localization and mapping, path planning and following, lane-keeping, car following, overtaking, racing, obstacle avoidance, and more. Within each benchmarked task, we classify the literature into distinct categories: end-toend systems versus modular systems and traditional methods 20 versus ML-based methods. This classification facilitates a nuanced understanding of the diverse approaches adopted in the field. The collective achievements of small-scale cars are thus showcased through this systematic categorization. Since this paper aims to provide a holistic review and guide, we also outline the commonly utilized in various well-known platforms. This information serves as a valuable resource, enabling readers to leverage our survey as a guide for constructing their own platforms or making informed decisions when considering commercial options within the community.

We additionally present future trends concerning small-scale car platforms, focusing on different primary aspects. Firstly, enhancing accessibility across a broad spectrum of enthusiasts: from elementary students and colleagues to researchers, demands the implementation of a comprehensive learning pipeline with diverse entry levels for the platform. Next, to complete the whole ecosystem of the platform, a powerful car body, varying weather conditions, and communications issues should be addressed in a smart city setup. These trends are anticipated to shape the trajectory of the field, contributing significantly to advancements in real-world autonomous driving research.
While we have aimed to achieve maximum comprehensiveness, the expansive nature of this topic makes it challenging to encompass all noteworthy works. Nonetheless, by illustrating the current state of small-scale cars, we hope to offer a distinctive perspective to the community, which would generate more discussions and ideas leading to a brighter future of autonomous driving with small-scale cars.”

Project Authors

Dianzhao Li

Dianzhao Li is a research assistant at the Technische Universität Dresden, Dresden, Germany.

Paul Auerbach

Paul Auerbach is with Barkhausen Institut gGmbH, Dresden, Germany

Ostap Okhrin Technische Universität Dresden portrait

Ostap Okhrin is Chair of Statistics and Econometrics at the Institute of Economics and Transport, School of Transportation, Technische Universitat Dresden in Germany.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.


End-to-end Deep RL (DRL) systems: in autonomous driving environments that rely on visual input for vehicle control face potential security risks, including:

  • State Adversarial Perturbations: Subtle alterations to visual input that mislead the DRL agent, causing incorrect decision-making.
  • Reward Tampering: Manipulation of the reward signal to misguide the learning process, leading the agent to adopt unsafe or inefficient policies.

These vulnerabilities can compromise the safety and reliability of self-driving vehicles.

Vision-based reinforcement learning for lane-tracking control

Vision-based Reinforcement Learning for Lane-Tracking Control

General Information

Vision-based reinforcement learning for lane-tracking control

a) Test track used for simulated reinforcement learning and baseline evaluations; b) and c) real and simulated test track used for the evaluation of the simulation-to-reality transfer

What is Vision-based Reinforcement Learning? A few important topics:

Reinforcement Learning: a machine learning paradigm where an agent learns to make decisions by interacting with an environment to achieve a goal. In this context, reinforcement learning is used to teach a vehicle how to drive within Duckietown lanes by providing rewards or penalties based on its actions.

Vision-based Control: The control of the vehicle is based on visual inputs, specifically images captured by a forward-facing camera. These images are processed by a neural network to determine appropriate steering actions, allowing the vehicle to track lanes and avoid collisions.

Simulation-to-Reality (sim2real) Transfer Learning: The trained policy, which learns to control the vehicle in a simulated environment, is transferred to real-world scenarios. The effectiveness of the trained model in real-world driving situations is evaluated, demonstrating the ability to generalize learning from simulation to reality.

Domain Randomization: This technique involves introducing variations or randomizations into the simulation environment during training. By exposing the agent to a wide range of simulated scenarios with different lighting conditions, road surfaces, and other environmental factors, domain randomization helps improve the model’s ability to generalize to unseen real-world conditions.

Learn about RL, navigation and other robot autonomy topics at the link below!


The present study focused on vision-based end-to-end reinforcement learning in relation to vehicle control problems such as lane following and collision avoidance. The controller policy presented in this paper is able to control a small-scale robot to follow the right-hand lane of a real two-lane road, although its training has only been carried out in a simulation.

This model, realised by a simple, convolutional network, relies on images of a forward-facing monocular camera and generates continuous actions that directly control the vehicle. To train this policy, proximal policy optimization was used, and to achieve the generalisation capability required for real performance, domain randomisation was used. A thorough analysis of the trained policy was conducted by measuring multiple performance metrics and comparing these to baselines that rely on other methods.

To assess the quality of the simulation-to-reality transfer learning process and the performance of the controller in the real world, simple metrics were measured on a real track and compared with results from a matching simulation. Further analysis was carried out by visualising salient object maps.

Highlights - Vision-based reinforcement learning for lane-tracking control

Here is a visual tour of the work of the authors. For more details, check out the full paper.


Here are the conclusions from the authors of this paper:

“This work presented a solution to the problem of complex, vision-based lane following in the Duckietown environment using reinforcement learning to train an end-to-end steering policy capable of simulation-to-real transfer learning. It was found that the training is sensitive to problem formulation, such as the representation of actions. 

This study has demonstrated that by using domain randomisation, a moderately detailed and accurate simulation is sufficient for training end-to-end lane-following agents that operate in a real environment. The performance of these agents was evaluated by comparing some basic metrics to match real and simulated scenarios. 

Agents were also successfully trained to perform collision avoidance in addition to lane following. Finally, salient object visualisation was used to give an illustrative explanation of the inner workings of the policies in both the real and simulated domains.”.

Project Authors

András Kalapos

András Kalapos is a Machine Learning PhD Student at Budapest University of Technology and Economics, Hungary.

Csaba Gór

Csaba Gór is a Machine Learning Engineer at Turbine, in Hungary.

Róbert Moni

Róbert Moni is a Senior Machine Learning Engineer at Continental.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.


End-to-end Deep RL (DRL) systems: in autonomous driving environments that rely on visual input for vehicle control face potential security risks, including:

  • State Adversarial Perturbations: Subtle alterations to visual input that mislead the DRL agent, causing incorrect decision-making.
  • Reward Tampering: Manipulation of the reward signal to misguide the learning process, leading the agent to adopt unsafe or inefficient policies.

These vulnerabilities can compromise the safety and reliability of self-driving vehicles.

Deep Reinforcement Learning for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness

Evaluating Adversarial Robustness in Duckietown Navigation

General Information

Deep RL for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness

Adversarial Navigation Robustness - Sequence of robot positions with DRL agent trained under adversarial and non-adversarial settings in a lane following experiment. The UAPFGSM method, making the agent move in circular movements with minimal perturbations, while adversarial reward tampering forces it to move in the opposite direction of the road.

What is adversarial robustness in navigation tasks all about? A few important topics:

Reinforcement Learning (RL) is a type of machine learning where agents learn to make decisions by receiving rewards or penalties based on their actions in an environment. This is great because it removed the need for curated training datasets.

Deep Reinforcement Learning (DRL) enhances RL by using deep neural networks to process complex inputs and make decisions. Deep networks are neural networks with multiple layers.

Adversarial Robustness refers to a system’s ability to resist and maintain performance despite deliberate attacks or input perturbations.

Navigation is the task of finding feasible paths between points in the environment like Google Maps or similar systems provide us in everyday life. 

Learn about RL, navigation and other robot autonomy topics at the link below.


Self-driving cars have gained widespread attention in recent years due to their potential to revolutionize the transportation industry. However, their success critically depends on the ability of reinforcement learning (RL) algorithms to navigate complex environments safely. In this paper, we investigate the potential security risks associated with end-to-end deep RL (DRL) systems in autonomous driving environments that rely on visual input for vehicle control, using the open-source Duckietown platform for robotics and self-driving vehicles.

We demonstrate that current DRL algorithms are inherently susceptible to attacks by designing a general state adversarial perturbation and a reward tampering approach. Our strategy involves evaluating how attacks can manipulate the agent’s decision-making process and using this understanding to create a corrupted environment that can lead the agent towards low-performing policies. We introduce our state perturbation method, accompanied by empirical analysis and extensive evaluation, and then demonstrate a targeted attack using reward tampering that leads the agent to catastrophic situations.

Our experiments show that our attacks are effective in poisoning the learning of the agent when using the gradient-based Proximal Policy Optimization algorithm within the Duckietown environment. The results of this study are of interest to researchers and practitioners working in the field of autonomous driving, DRL, and computer security, and they can help inform the development of safer and more reliable autonomous driving systems.

Highlights - Evaluation of Adversarial Robustness Results

Here is a visual tour of the work of the authors. For more details, check out the paper link.


Here are the conclusions from the authors of this paper:

“The focus of our study was to address adversarial attacks on deep reinforcement learning (DRL) agents, specifically examining state adversarial attacks and reward-tampering attacks. 

We developed a parametric framework for state adversarial attacks and a non-parametric framework for reward tampering attacks, which enabled us to create effective attacks. We found that the performance of a DRL agent declined rapidly after the attack, and the deviation from the road was worse than that of standard DRL. 

We used salient maps to provide a clear explanation of the policies’ internal operations in both the adversarial and non-adversarial aspects. Our research provides insight into the potential vulnerabilities of DRL agents and highlights the need for more robust and secure agents to mitigate the risk of adversarial attacks. 

Moving forward, future work will focus on incorporating real-world analysis to test the performance of the DuckieBot under both adversarial and non-adversarial settings”.

Project Authors

Abdullah Hosseini is a Research and Development Specialist at Weill Cornell Medicine in Qatar.

Junaid Qadir is a Professor of Computer Engineering at Qatar University.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.


End-to-end Deep RL (DRL) systems: in autonomous driving environments that rely on visual input for vehicle control face potential security risks, including:

  • State Adversarial Perturbations: Subtle alterations to visual input that mislead the DRL agent, causing incorrect decision-making.
  • Reward Tampering: Manipulation of the reward signal to misguide the learning process, leading the agent to adopt unsafe or inefficient policies.

These vulnerabilities can compromise the safety and reliability of self-driving vehicles.

Visual deep reinforcement learning paper snippet

Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer

General Information

Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer

Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer

One way to obtain quick and cheap training data is to use simulation instead of real-world experiments. The question remains if the learnings of a simulation-trained agent apply to the real world. Sim2Real transfer is the field of research that studies this problem.

The challenge is particularly meaningful when using vision as the primary sensing capability for robots. Vision-based deep reinforcement learning (DRL) refers to a technique where ML agents, typically modeled as multi-layered neural networks, learn to “make decisions” directly from visual input. 

The essence of RL is training robotic agents based on policies that reward desirable outcomes. This family of techniques typically leads to increased adaptability to operational scenarios.

To learn about RL and its place in the larger context of robot autonomy, check out the resources below.


To achieve fully autonomous driving, vehicles must be capable of continuously performing various driving tasks, including lane keeping and car following, both of which are fundamental and well-studied driving ones. However, previous studies have mainly focused on individual tasks, and car following tasks have typically relied on complete leader-follower information to attain optimal performance.

To address this limitation, we propose a vision-based deep reinforcement learning (DRL) agent that can simultaneously perform lane-keeping and car-following maneuvers.

To evaluate the performance of our DRL agent, we compare it with a baseline controller and use various performance metrics for quantitative analysis. Furthermore, we conduct a real-world evaluation to demonstrate the Sim2Real transfer capability of the trained DRL agent.

To the best of our knowledge, our vision-based car following and lane-keeping agent with Sim2Real transfer capability is the first of its kind.

Highlights - Sim2Real transfer results

Here is a visual tour of the work of the authors. For all the details, check out the paper link.


This study proposes a vision-based DRL agent that can simultaneously perform lane-keeping and car-following tasks.

The overall system is divided into two modules: the perception module and the control module. The perception module extracts task-relevant attributes of the surroundings, while the control module is a DRL agent that takes these attributes as input. To evaluate the performance of the DRL agent, we compare it with a baseline algorithm in both simulation and real-world environments.

In the simulation, we compare the car following and lane-keeping capabilities of the DRL agent and baseline controller using various performance metrics. In the real-world environment, we demonstrate that the DRL agent can follow the leading vehicle while maintaining lane-keeping ability.

In future work, we plan to enhance our DRL agent by incorporating a comfort factor to address unstable driving behavior. Additionally, we aim to deploy more advanced algorithms for improved generalization.

Project Authors

Dianzhao Li is a research assistant at the Technische Universität Dresden, Dresden, Germany.

Ostap Okhrin is Chair of Statistics and Econometrics at the Institute of Economics and Transport, School of Transportation, Technische Universitat Dresden in Germany.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm snippet

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

General Information

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

Duckietown Reinforcement Learning Paper - Sequential path planning
Reinforcement learning (RL) is a rising star approach for developing autonomous robot agents. The essence of RL is training agents based on policies that reward desirable outcomes, which leads to increased adaptability to operational scenarios. Through iterations, robots refine their decision-making, optimizing actions based on rewards and penalties. This method provides robots with the flexibility to handle unpredictable situations, enhancing their efficiency and effectiveness in real-world tasks. To learn about RL with Duckietown, check out the resources below.


Solving complex problems using reinforcement learning necessitates breaking down the problem into manageable tasks, and learning policies to solve these tasks. These policies, in turn, have to be controlled by a master policy that takes high-level decisions. Hence learning policies involves hierarchical decision structures. However, training such methods in practice may lead to poor generalization, with either sub-policies executing actions for too few time steps or devolving into a single policy altogether. In our work, we introduce an alternative approach to learn such skills sequentially without using an overarching hierarchical policy. We propose this method in the context of environments where a major component of the objective of a learning agent is to prolong the episode for as long as possible. We refer to our proposed method as Sequential Soft Option Critic. We demonstrate the utility of our approach on navigation and goal-based tasks in a flexible simulated 3D navigation environment that we have developed. We also show that our method outperforms prior methods such as Soft Actor-Critic and Soft Option Critic on various environments, including the Atari River Raid environment and the Gym-Duckietown self-driving car simulator.


Here is a visual tour of the work of the authors. 

For all the details, check out the paper link!


In this paper, the authors proposed an algorithm called “Sequential Soft Option Critic” that allows adding new skills dynamically without the need for a higher-level master policy. This can be applicable to environments where a primary component of the objective is to prolong the episode. We show that this algorithm can be used to effectively incorporate diverse skills into an overall skill set, and it outperforms prior methods in several environments.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Duckietown is modular, customizable and state-of-the-art. It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

Monocular Robot Navigation with Self-Supervised Pre-trained Vision Transformers

Duckietown’s infrastructure is used by researchers worldwide to push the boundaries of knowledge. Of the many outstanding works published, today we’d like to highlight “Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers” by Saavedra-Ruiz et al. at the University of Montreal.

Using visual transformers (ViT) for understanding their surroundings, Duckiebots are made capable of detecting and avoiding obstacles, while safely driving inside lanes. ViT is an emerging machine vision technique that has its root in Natural Language Processing (NLP) applications. The use of this architecture is recent and promising in Computer Vision. Enjoy the read and don’t forget to reproduce these results on your Duckiebots!


“In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8×8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.”


“We propose to train a classifier to predict labels for every 8×8 patch in an image. Our classifier is a fully-connected network which we apply over ViT patch encodings to predict a coarse segmentation mask:”


“In this work, we study how embodied agents with visionbased motion can benefit from ViTs pretrained via SSL methods. Specifically, we train a perception model with only 70 images to navigate a real robot in two monocular visual-servoing tasks. Additionally, in contrast to previous SSL literature for general computer vision tasks, our agent appears to benefit more from small high-throughput models rather than large high-capacity ones. We demonstrate how ViT architectures can flexibly adapt their inference resolution based on available resources, and how they can be used in robotic application depending on the precision needed by the embodied agent. Our approach is based on predicting labels for 8×8 image patches, and is not well-suited for predicting high-resolution segmentation masks, in which case an encoder-decoder architecture should be preferred. The low resolution of our predictions does not seem to hinder navigation performance however, and we foresee as an interesting research direction how those high-throughput low-resolution predictions affect safety-critical applications. Moreover, training perception models in an SSL fashion on sensory data from the robot itself rather than generic image datasets (e.g., ImageNet) appears to be a promising research avenue, and is likely to yield visual representations that are better adapted to downstream visual servoing applications.”

Learn more

The Duckietown platform offers robotics and AI learning experiences.

Duckietown is modular, customizable and state-of-the-art. It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

AI Driving Olympics 2021: Urban League Finalists

AI Driving Olympics 2021 - Urban League Finalists

This year’s embodied urban league challenges were lane following (LF), lane following with vehicles (LFV) and lane following with intersections, (LFI). To account for differences between the real world and simulation, this edition finalists can make one additional submission to the real challenges to improve their scores. Finalists are the authors of AI-DO 2021 submissions in the top 5 ranks for each challenge. This year’s finalists are:


  • András Kalapos
  • Bence Haromi
  • Sampsa Ranta
  • ETU-JBR Team
  • Giulio Vaccari


  • Sampsa Ranta
  • Adrian Brucker
  • Andras Beres
  • David Bardos


  • András Kalapos
  • Sampsa Ranta
  • Adrian Brucker
  • Andras Beres

The deadline for submitting the “final” submissions is Dec. 9th, 2 pm CET. All submissions received after this time will count towards the next edition of AI-DO.

Don’t forget to join the #aido channel on the Duckietown Slack for updates!

Congratulations to all the participants, and best of luck to the finalists!

Amazon Web Services (AWS)

Join the AI Driving Olympics, 6th edition, starting now!

The 2021 AI Driving Olympics

Compete in the 2021 edition of the Artificial Intelligence Driving Olympics (AI-DO 6)!

The AI-DO serves to benchmark the state of the art of artificial intelligence in autonomous driving by providing standardized simulation and hardware environments for tasks related to multi-sensory perception and embodied AI.

Duckietown traditionally hosts AI-DO competitions biannually, with finals events held at machine learning and robotics conferences such as the International Conference on Robotics and Automation (ICRA) and the Neural Information Processing Systems (NeurIPS). 

AI-DO 6 will be in conjunction with NeurIPS 2021 and have three leagues: urban driving, advanced perception, and racing. The winter champions will be announced during NeurIPS 2021, on December 10, 2021!

Urban driving league

The urban driving league uses the Duckietown platform and presents several challenges, each of increasing complexity.

The goal in each challenge is to develop a robotic agent for driving Duckiebots “well”. Baseline implementations are provided to test different approaches. There are no constraints on how your agents are designed.

Each challenge adds a layer of complexity: intersections, other vehicles, pedestrians, etc. You can check out the existing challenges on the Duckietown challenges server.

AI-DO 2021 features four challenges: lane following (LF), lane following with intersections (LFI), lane following with vehicles (LFV) and lane following with vehicles and intersections, multi-body, with full information (LFVI-multi-full).

All challenges have a simulation and hardware component (🚙,💻), except for LFVI-multi-full, which is simulation (💻) only.

The first phase (until Nov. 7) is a practice one. Results do not count towards leaderboards.

The second phase (Nov. 8-30) is the live competition and results count towards official leaderboards. 

Selected submissions (that perform well enough in simulation) will be evaluated on hardware in Autolabs. The submissions scoring best in Autolabs will access the finals.

During the finals (Dec. 1-8) one additional submission is possible for each finalist, per challenge.

Winners (top 3) of the resulting leaderboard will be declared AI-DO 2021 winter champions and celebrated live during NeurIPS 2021. We require champions to submit a short video (2 mins) introducing themselves and describing their submission.

Winners are invited to join (not mandatory) the NeurIPS event, on December 10th, 2021, starting at 11.25 GMT (Zoom link will follow).   

🎯Goal: develop robotic agents for challenges of increasing complexity
🚙Robot: Duckiebot (DB21M/J)
👀Sensors: camera, wheel encoders
🏖️Practice: Nov. 1-7
🚙Competition: Nov. 8-30
🏘️Finals: Dec. 1 – 8
🏆Winners: Dec. 10
🏖️Practice: unlimited non-competing submissions
🚙Competition: best in sim are evaluated on hardware in Autolabs
🏘️Finals: one additional submission for Autolabs
🏆Winners: 2 mins video submission description for NeurIPS 2021 event.

The challenges

Lane following 🚙 💻

LF – The most traditional of AI-DO challenges: have a Duckiebot navigate a road loop without intersection, pedestrians (duckies) nor other vehicles. The objective is to travel the longest path in a given time while staying in the lane, i.e., not committing driving infractions.

Current AI-DO leaderboards: LF-sim-validation, LF-sim-testing.

Previous AI-DO leaderboards: sim-validation, sim-testing, real-validation.

A DB21 Duckietown in a Duckietown equipped with Autolab infrastructure.

Lane following with intersections 🚙 💻

LFI – This challenge builds upon LF by increasing the complexity of the road network, now featuring 3 and/or 4-way intersections, defined according to the Duckietown appearance specifications. Traffic lights will not be present on the map. The objective is to drive the longest distance while not breaking the rules of the road, now more complex due to the presence of traffic signs.

Current AI-DO leaderboards: LFI-sim-validation, LFI-sim-testing.

Previous AI-DO leaderboards: sim-validation, sim-testing.

Duckiebot facing a lane following with intersections (LFI) challenge

Lane following with vehicles 🚙 💻

LFV – In this traditional AI-DO challenge, contestants seek to travel the longest path in a city without intersections nor pedestrians, but with other vehicles on the road. Non-playing vehicles (i.e., not running the user’s submitted agent) can be in the same and/or opposite lanes and have variable speed.

Current AI-DO leaderboards: LFV-sim-validation, LFV-sim-testing.

Previous AI-DO leaderboards: (LFV-multi variant): sim-validation, sim-testing, real-validation.

Lane following with vehicles and intersections (stateful) 💻

LFVI-multi-full – this debuting challenge brings together roads with intersections and other vehicles. The submitted agent is deployed on all Duckiebots on the map (-multi), and is provided with full information, i.e., the state of the other vehicles on the map (-full). This challenge is in simulation only.

Getting started

All you need to get started and participate in the AI-DO is a computer, a good internet connection, and the ambition to challenge your skills against the international community!  

We provide webinars, operation manuals, and baselines to get started.

May the duck be with you! 

Thank you to our generous sponsors!