Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Posted on September 2, 2024 | by Duckietown Admin

General Information

A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms
Armin Mokhtarian, Jianye Xu, Patrick Scheffe, Maximilian Kloock, Simon Schäfer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, Johannes Betz, Sean Wilson, Spring Berman, Liam Paull, Amanda Prorok, Bassam Alrifaee
RWTH Aachen University, Germany
Mokhtarian, Armin & Scheffe, Patrick & Kloock, Maximilian & Schäfer, Simon & Bang, Heeseung & Le, Viet-Anh & Sankaramangalam Ulhas, Sangeet & Betz, Johannes & Wilson, Sean & Berman, Spring & Prorok, Amanda & Alrifaee, Bassam. (2024). A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms. 10.13140/RG.2.2.16176.74248/1.

Survey on Testbeds for Vehicle Autonomy & Robot Swarms

“A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms“ by Armin Mokhtarian et al. offers a comparison of current small-scale testbeds for Connected and Automated Vehicles (CAVs), Vehicle Autonomy and Robot Swarms (RS).

As mentioned in , small-scale autonomous vehicle testbeds are paving the way to faster and more meaningful research and development in vehicle autonomy, embodied AI, and AI robotics as a whole.

Although small-scale, often made of off-the-shelf components and relatively low-cost, these platforms provide the opportunity for deep insights into specific scientific and technological challenges of autonomy.

Duckietown, in particular, is highlighted for its modular, miniature-scale smart-city environment, which facilitates the study of autonomous vehicle localization and traffic management through onboard sensors.

Learn about robot autonomy, traditional robotics autonomy architectures, agent training, sim2real, navigation, and other topics with Duckietown, starting from the link below!

Abstract

Connected and Automated Vehicles (CAVs) and Robot Swarms (RS) have the potential to transform the transportation and manufacturing sectors into safer, more efficient, sustainable systems.

However, extensive testing and validation of their algorithms are required. Small-scale testbeds offer a cost-effective and controlled environment for testing algorithms, bridging the gap between full-scale experiments and simulations. This paper provides a structured overview of characteristics of testbeds based on the sense-plan-act paradigm, enabling the classification of existing testbeds.

Its aim is to present a comprehensive survey of various testbeds and their capabilities. We investigated 17 testbeds and present our results on the public webpage www.cpm-remote.de/testbeds.

Furthermore, this paper examines seven testbeds in detail to demonstrate how the identified characteristics can be used for classification purposes.

Highlights - Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Here is a visual tour of the authors’ work. For more details, check out the full paper or the corresponding up-to-date project website.

Collage of various small-scale testbeds for Connected and Automated Vehicles (CAVs), Vehicle Autonomy and Robot Swarms, highlighting different setups and testing environments. — Figure 1. Collage showcasing diverse testbeds in the realm of Connected and Automated Vehicles and Robot Swarms.

Screenshot of a webpage displaying a list of testbeds for Connected and Automated Vehicles and Robot Swarms, as investigated in a research study. — Figure 2. Screenshot of Public Webpage Listing Investigated Testbeds in Connected Vehicles & Robot Swarms Study.

Cyber-Physical Mobility Lab at RWTH Aachen University, featuring testbeds for research on Connected and Automated Vehicles and Robot Swarms. — Figure 3. Cyber-Physical Mobility Lab at RWTH Aachen University.

IDS Scaled Smart City testbed at Cornell University, designed for research in connected vehicles and smart city technologies. — Figure 4. IDS Scaled Smart City at Cornell University.

Robotarium testbed at Georgia Institute of Technology, featuring multiple small robots in a collaborative swarm setup. — Figure 5. Robotarium Testbed at Georgia Institute of Technology.

Cambridge Minicars testbed at the Prorok Lab, Cambridge University, showcasing miniature vehicles for multi-agent systems research. — Figure 6. Cambridge Minicars at the Prorok Lab at Cambridge University.

Go-CHART testbed at Arizona State University, featuring scaled autonomous vehicles for testing control and coordination strategies. — Figure 7. The Go-CHART at Arizona State University.

F1TENTH vehicle built at the University of Pennsylvania, a scaled-down autonomous race car used for research and competitions. — Figure 8. An exemplar F1TENTH vehicle built at the University of Pennsylvania.

Duckietown testbed at MIT, featuring miniature autonomous vehicles navigating a small-scale city environment with roads and traffic signs. — Figure 9. Duckietown at Massachusetts Institute of Technology.

Conclusion - Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Here are the conclusions from the authors of this paper:

“This survey provides a detailed overview of small-scale CAV/RS testbeds, with the aim of helping researchers in these fields to select or build the most suitable testbed for their experiments and to identify potential research focus areas. We structured the survey according to characteristics derived from potential use cases and research topics within the sense-plan-act paradigm.

Through an extensive investigation of 17 testbeds, we have evaluated 56 characteristics and have made the results of this analysis available on our webpage. We invited the testbed creators to assist in the initial process of gathering information and updating the content of this webpage. This collaborative approach ensures that the survey maintains its relevance and remains up to date with the latest developments.

The ongoing maintenance will allow researchers to access the most recent information. In addition, this paper can serve as a guide for those interested in creating a new testbed. The characteristics and overview of the testbeds presented in this survey can help identify potential gaps and areas for improvement.

One ongoing challenge that we identified with small-scale testbeds is the enhancement of their ability to accurately map to realworld conditions, ensuring that experiments conducted are as realistic and applicable as possible.

Overall, this paper provides a resource for researchers and developers in the fields of connected and automated vehicles and robot swarms, enabling them to make informed decisions when selecting or replicating a testbed and supporting the advancement of testbed technologies by identifying research gaps.”

Project Authors

Armin Mokhtarian is currently working as a Research Associate & PhD Candidate at RWTH Aachen University, Germany.

Patrick Scheffe is a Research Associate at Lehrstuhl Informatik 11 – Embedded Software, Germany.

Maximilian Kloock is working as a Team Manager Advanced Battery Management System Technologies at FEV Europe, Germany.

Simon Schäfer is a Visiting Researcher at Faculty of Engineering, University of Alberta, Canada.

Heeseung Bang is currently a Postdoctoral Associate at Cornell University, USA.

Viet-Anh Le is a Visiting Graduate Student at Cornell University, USA.

Sangeet Ulhas is a PhD candidate at Ira A. Fulton Schools of Engineering at Arizona State University, USA.

Johannes Betz is a Assistant Professor at Technische Universität München, Germany.

Sean Wilson is a Senior Research Engineer at Georgia Institute of Technology, USA.

Spring Berman is an Associate Professor at Arizona State University, USA.

Liam Paull is an Associate Professor at Université de Montréal, Canada and he is also the Chief Education Officer at Duckietown, USA.

Amanda Prorok is an associate professor at University of Cambridge, UK.

Bassam Alrifaee is a Professor at Bundeswehr University Munich, Germany.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Posted on August 10, 2024 | by Duckietown Admin

General Information

Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real
Eduardo Candela, Leandro Parada, Luis Marques, Tiberiu-Andrei Georgescu, Yiannis Demiris, Panagiotis Angeloudis
Imperial College London, United Kingdom
E. Candela, L. Parada, L. Marques, T. -A. Georgescu, Y. Demiris and P. Angeloudis, "Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 8814-8820, doi: 10.1109/IROS47612.2022.9981319.

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

In the field of autonomous driving, transferring policies from simulation to the real world (Sim-to-real transfer, or Sim2Real) is theoretically desirable, as it is much faster and more cost-effective to train agents in simulation rather than in the real world.

Given simulations are just that – representations of the real world – the question of whether the trained policies will actually perform well enough in the real world is always open. This challenge is known as “Sim-to-Real gap”.

This gap is especially pronounced in Multi-Agent Reinforcement Learning (MARL), where agent collaboration and environmental synchronization significantly complicate policy transfer.

The authors of this work propose employing “Multi-Agent Proximal Policy Optimization” (MAPPO) in conjunction with domain randomization techniques, to create a robust pipeline for training MARL policies that is not only effective in simulation but also adaptable to real-world conditions.

Through varying levels of parameter randomization—such as altering lighting conditions, lane markings, and agent behaviors— the authors enhance the robustness of trained policies, ensuring they generalize effectively across a wide range of real-world scenarios.

Learn about training, sim2real, navigation, and other robot autonomy topics with Duckietown starting from the link below!

Abstract

Autonomous Driving requires high levels of coordination and collaboration between agents. Achieving effective coordination in multi-agent systems is a difficult task that remains largely unresolved. Multi-Agent Reinforcement Learning has arisen as a powerful method to accomplish this task because it considers the interaction between agents and also allows for decentralized training—which makes it highly scalable.

However, transferring policies from simulation to the real world is a big challenge, even for single-agent applications. Multi-agent systems add additional complexities to the Sim-to-Real gap due to agent collaboration and environment synchronization.

In this paper, we propose a method to transfer multi-agent autonomous driving policies to the real world. For this, we create a multi-agent environment that imitates the dynamics of the Duckietown multi-robot testbed, and train multi-agent policies using the MAPPO algorithm with different levels of domain randomization. We then transfer the trained policies to the Duckietown testbed and show that when using our method, domain randomization can reduce the reality gap by 90%.

Moreover, we show that different levels of parameter randomization have a substantial impact on the Sim-to-Real gap. Finally, our approach achieves significantly better results than a rule-based benchmark.

Highlights - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Image of a test environment where autonomous Duckiebot robots navigate a track while avoiding parked obstacles. The setup includes a motion capture camera system tracking the real-time pose of the vehicles. — Figure 1. Autonomous Vehicle Training on a Real-World Track Using Multi-Agent Deep Reinforcement Learning with Sim2Real Transfer.

Diagram of a test track used in both simulation and real-world settings, showing waypoints, robot positions, goal points, and the steering angle required for path following in the Duckie-MAAD environment. — Figure 2. Test Track for Duckie-MAAD Gym Environment with Waypoints and Path Following Function.

Flowchart of the Duckie-MAAD architecture showing the step update loop for multi-agent autonomous driving, including action selection, path following, inverse kinematics, domain randomization, and policy updates with MAPPO. — Figure 3. Duckie-MAAD Architecture: Step Update Loop for Sim2Real Transfer in Multi-Agent Autonomous Driving.

Graph showing the reward convergence over time during the training of a Medium Domain Randomization (D.R.) policy in a multi-agent reinforcement learning setup. — Figure 4. Reward Convergence During Training of Medium Domain Randomization (D.R.) Policy.

Bar chart comparing average rewards for multi-agent reinforcement learning (MARL) policies in simulation and real life, showing the effectiveness of medium domain randomization in bridging the Sim2Real gap. — Figure 5. Comparison of Average Rewards for MARL Policies in Simulation and Real Life.

Box plots comparing the distribution of speed and track exit metrics across different policies, including rule-based and MARL policies with varying levels of domain randomization (D.R.), in both simulation and real-world scenarios. — Figure 6. Performance Metrics Distribution for Various Policies: Speed and Track Exits.

Figure 7. Performance Metrics Distribution: Collisions and Lane Changes Across Different Policies.

Conclusion - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here are the conclusions from the authors of this paper:

“AVs will lead to enormous safety and efficiency benefits across multiple fields, once the complex problem of multiagent coordination and collaboration is solved. MARL can help towards this, as it enables agents to learn to collaborate by sharing observations and rewards.

However, the successful application of MARL, is heavily dependent on the fidelity of the simulation environment they were trained in. We present a method to train policies using MARL and to reduce the reality gap when transferring them to the real world via adding domain randomization during training, which we show has a significant and positive impact in real performance compared to rule-based methods or policies trained without different levels of domain randomization.

It is important to mention that despite the performance improvements observed when using domain randomization, its use presents diminishing returns as seen with the overly conservative policy, for it cannot completely close the reality gap without increasing the fidelity of the simulator. Additionally, the amount of domain randomization to be used is case-specific and a theory for the selection of domain randomization remains an open question. The quantification and description of reality gaps presents another opportunity for future research.”

Project Authors

Eduardo Candela is currently working as the Co-Founder & CTO of MAIHEM (YC W24), California.

Leandro Parada is a Research Associate at Imperial College London, United Kingdom.

Luís Marques is a Doctoral Researcher in the Department of Robotics at the University of Michigan, USA.

Tiberiu Andrei Georgescu is a Doctoral Researcher at Imperial College London, United Kingdom.

Yiannis Demiris is a Professor of Human-Centred Robotics and Royal Academy of Engineering Chair in Emerging Technologies at Imperial College London, United Kingdom.

Panagiotis Angeloudis is a Reader in Transport Systems and Logistics at Imperial College London, United Kingdom.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Enhancing Visual Domain Randomization for Sim2Real Transfer

Posted on July 27, 2024 | by Duckietown Admin

General Information

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer
András Béres and Bálint Gyires-Tóth
Budapest University of Technology and Economics, Hungary
Béres, András and Gyires-Tóth, Bálint (2023) Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer. INFOCOMMUNICATIONS JOURNAL, 15 (1). pp. 15-25. ISSN 2061-2079

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

One of the classical objections made to machine learning approaches to embeddded autonomy (i.e., to create agents that are deployed on real, physical, robots) is that training requires data, data requires experiement, and experiment are “expensive” (time, money, etc.).

The natural counter argument to this is to use simulation to create the training data, because simulations are much less expensive than real world experiment; they can be ran continuously, with accellerated time, don’t require supervision, nobody gets tired, etc.

But, as the experienced roboticist knows, “simulations are doomed to succeed”. This phrase encapsulates the notion that simulations do not contain the same wealth if information as the real world, because they are programmed to be what the programmer wants them to be useful for – they do not capture the complexity of the real world. Eventually things will “work” in simulation, but does that mean they will “work” in the real-world, too?

As Carl Sagan once said: “If you wish to make an applie pie from scratch, you must first reinvent the universe”.

Domain randomization is an approach to mitigate the limitations of simulations. Instead of training an agent on one set of parameters defining the simulation, many simulations are instead ran, with different values of this parameters. E.g., in the context of a driving simulator like Duckietown, one set of parameters could make the sky purple instead of blue, or the lane markings have slightly different geometric properties, etc. The idea behind this approach is that the agent will be trained on a distribution of datasets that are all slightly different, hopefully making the agent more robust to real world nuisances once deployed in a physical body.

In this paper, the authors investigate specifically visual domain randomization.

Learn about RL, navigation, and other robot autonomy topics at the link below!

Abstract

In order to train reinforcement learning algorithms, a significant amount of experience is required, so it is common practice to train them in simulation, even when they are intended to be applied in the real world. To improve robustness, camerabased agents can be trained using visual domain randomization, which involves changing the visual characteristics of the simulator between training episodes in order to improve their resilience to visual changes in their environment.

In this work, we propose a method, which includes realworld images alongside visual domain randomization in the reinforcement learning training procedure to further enhance the performance after sim-to-real transfer. We train variational autoencoders using both real and simulated frames, and the representations produced by the encoders are then used to train reinforcement learning agents.

The proposed method is evaluated against a variety of baselines, including direct and indirect visual domain randomization, end-to-end reinforcement learning, and supervised and unsupervised state representation learning.

By controlling a differential drive vehicle using only camera images, the method is tested in the Duckietown self-driving car environment. We demonstrate through our experimental results that our method improves learnt representation effectiveness and robustness by achieving the best performance of all tested methods.

Highlights - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Example frames demonstrating visual domain randomization in the Duckietown Gym self-driving simulation environment, featuring varied visual characteristics to enhance training robustness. — Figure 1. Example frames with visual domain randomization from the Duckietown Gym self-driving simulated environment.

Diagram illustrating the proposed method that combines real images with visually randomized simulated images in unsupervised state representation learning, followed by training a control agent with reinforcement learning using a pretrained encoder network. — Figure 2. High level overview of the proposed method.

A block diagram showing the performance of direct versus invariance regularized domain randomization and supervised versus self-supervised state representation learning methods. — Figure 3. Benchmarked Baseline Methods for Domain Randomization and State Representation Learning.

Diagram showing the preprocessing pipeline. — Figure 4. An illustration of the preprocessing pipeline.

Images from the offline dataset displaying three distinct renderings of the same scene, showcasing varied visual perspectives. — Figure 5. Samples from the Offline Dataset with Three Different Renderings for Each Scene.

Table of evaluation results in the simulator without visual domain randomization, highlighting the proposed method in bold at the bottom row and underlining completion ratios above 70%. — Table 1. Evaluation Results in the Simulator Without Visual Domain Randomization.

Table showing evaluation results in the simulator with visual domain randomization, underlining completion ratios above 70% and highlighting the proposed method in bold at the bottom row. — Table 2. Evaluation Results in the Simulator with Visual Domain Randomization.

Table displaying evaluation results from real-world testing, underlining survival times of 20 or more and featuring the proposed method in bold at the bottom row. — Table 3. Evaluation Results in Real-World Testing.

Conclusion - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here are the conclusions from the authors of this paper:

“In this work we proposed a novel method for learning effective image representations for reinforcement learning, whose core idea is to train a variational autoencoder using visually randomized images from the simulator, but include images from the real world as well, as if it was just another visually different version of the simulator.

We evaluated the method in the Duckietown self-driving environment on the lane-following task, and our experimental results showed that the image representations of our proposed method improved the performance of the tested reinforcement learning agents both in simulation and reality. This demonstrates the effectiveness and robustness of the representations learned by the proposed method. We benchmarked our method against a wide range of baselines, and the proposed method performed among the best in all cases.

Our experiments showed that using some type of visual domain randomization is necessary for a successful simto- real transfer. Variational autoencoder-based representations tended to outperform supervised representations, and both outperformed representations learned during end-to-end reinforcement learning. Also, for visual domain randomization, when using no real images, invariance regularization-based methods seemed to outperform direct methods. Based on our results, we conclude that including real images in simulation-based reinforcement learning trainings is able to enhance the real world performance of the agent – when using the two-stage approach, proposed in this paper.”

Project Authors

András Béres is currently working as a Junior Deep Learning Engineer at Continental, Hungary.

Bálint Gyires-Tóth is an associate professor at
Budapest University of Technology and Economics, Hungary.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Reward Consistency for Interpretable Feature Discovery in RL

Posted on July 13, 2024 | by Duckietown Admin

General Information

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, and Shiji Song
Tsinghua University, China
Q. Yang, H. Wang, M. Tong, W. Shi, G. Huang and S. Song, "Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 2, pp. 1014-1025, Feb. 2024, doi: 10.1109/TSMC.2023.3312411.

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

What is interpretable feature discovery in reinforcement learning?

To understand this, let’s introduce a few important topics:

Reinforcement Learning (RL): A machine learning approach where an agent gains the ability to make decisions by engaging with an environment to accomplish a specific objective. Interpretable Feature Discovery in RL is an approach that aims to make the decision-making process of RL agents more understandable to humans.

The need for interpretability: In real-world applications, especially in safety-critical domains like self-driving cars, it is crucial to understand why an RL agent makes a certain decision. Interpretability helps:

Build trust in the system
Debug and improve the model
Ensure compliance with regulations and ethical standards
Understand fault if accidents arise

Feature discovery: Feature discovery in this context refers to identifying the key artifacts (features) of the environment that the RL agent is focusing on while making decisions. For example, in a self-driving car simulation, relevant features might include the position of other cars, road signs, or lane markings.

Learn about RL, navigation, and other robot autonomy topics at the link below!

Abstract

The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this article, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.

It may lead to irrelevant or misplaced feature attribution when different DNNs’ outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards.

We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle’s limitations.

Highlights - Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Figure showing the limitations of action matching: In the driving task illustrated in (a), two actions—turning left by 0.12 as shown in (a.1) and turning left by 0.15 as shown in (a.2)—represent the same behavior of "avoiding collision" and receive the same reward. In the scenario depicted in (b), the same actions—shown in (b.1) and (b.2)—result in different rewards depending on the task, but action matching methods would provide the same explanation for both. — Figure 1. Examples of action matching’s limitations.

Diagram of the reward-oriented interpretation method: a DNN with an encoder and decoder learns the mask and attentive state. The pretrained policy takes actions from both primitive and attentive states, with the environment providing corresponding rewards. Reward matching is hindered by disconnected gradient backward in supervised learning. — Figure 2. Architecture of Reward-Oriented Interpretation Method for Reinforcement Learning.

Diagram of interaction process: (a) shows how the pretrained policy (πpre) interacts with the environment during pretraining. (b) depicts the interpretation task where reward matching is treated as an RL problem, with the RL-in-RL policy (π̃) corresponding to the gray box in Fig. 2. The reward (r̃) is provided by both the environment (E) and the pretrained policy (πpre). — Figure 3. Interaction Process in Pretraining and Interpretation Tasks.

𝖠𝗅𝗀𝗈𝗋𝗂𝗍𝗁𝗆 𝟣 𝗈𝗎𝗍𝗅𝗂𝗇𝖾𝗌 𝗍𝗁𝖾 𝖱𝖫-𝗂𝗇-𝖱𝖫 𝗆𝗈𝖽𝖾𝗅, 𝗐𝗁𝖾𝗋𝖾 𝖺 𝗉𝗋𝖾𝗍𝗋𝖺𝗂𝗇𝖾𝖽 𝗉𝗈𝗅𝗂𝖼𝗒 𝜋 𝗉𝗋𝖾 π 𝗉𝗋𝖾 𝗂𝗇𝗍𝖾𝗋𝖺𝖼𝗍𝗌 𝗐𝗂𝗍𝗁 𝗍𝗁𝖾 𝖾𝗇𝗏𝗂𝗋𝗈𝗇𝗆𝖾𝗇𝗍 𝗍𝗈 𝖻𝖾 𝗂𝗇𝗍𝖾𝗋𝗉𝗋𝖾𝗍𝖾𝖽. 𝖳𝗁𝖾 𝗋𝖾𝗐𝖺𝗋𝖽 𝖿𝗎𝗇𝖼𝗍𝗂𝗈𝗇 𝑅 ( 𝑠 𝑡 , 𝑎 𝑡 ) 𝖱(𝗌 𝗍 ,𝖺 𝗍 ) 𝗂𝗌 𝗉𝗋𝗈𝗏𝗂𝖽𝖾𝖽 𝖻𝗒 𝗍𝗁𝖾 𝖾𝗇𝗏𝗂𝗋𝗈𝗇𝗆𝖾𝗇𝗍 𝐸 𝖤. 𝖳𝗁𝖾 𝖺𝗅𝗀𝗈𝗋𝗂𝗍𝗁𝗆 𝗂𝗇𝗏𝗈𝗅𝗏𝖾𝗌 𝗂𝗇𝗂𝗍𝗂𝖺𝗅𝗂𝗓𝗂𝗇𝗀 𝖺𝗇𝖽 𝗎𝗉𝖽𝖺𝗍𝗂𝗇𝗀 𝗌𝗉𝖾𝖼𝗂𝖿𝗂𝖼 𝖿𝗎𝗇𝖼𝗍𝗂𝗈𝗇𝗌 𝖺𝗇𝖽 𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝖾𝗋𝗌, 𝗂𝗍𝖾𝗋𝖺𝗍𝗂𝗇𝗀 𝗈𝗏𝖾𝗋 𝖾𝗉𝗈𝖼𝗁𝗌 𝗎𝗇𝗍𝗂𝗅 𝖼𝗈𝗇𝗏𝖾𝗋𝗀𝖾𝗇𝖼𝖾. 𝖣𝗎𝗋𝗂𝗇𝗀 𝖾𝖺𝖼𝗁 𝖾𝗉𝗈𝖼𝗁, 𝗍𝗁𝖾 𝗌𝗍𝖺𝗍𝖾 𝗂𝗌 𝗂𝗇𝗂𝗍𝗂𝖺𝗅𝗂𝗓𝖾𝖽, 𝖺𝖼𝗍𝗂𝗈𝗇𝗌 𝖺𝗇𝖽 𝗋𝖾𝗐𝖺𝗋𝖽𝗌 𝖺𝗋𝖾 𝖼𝗈𝗆𝗉𝗎𝗍𝖾𝖽, 𝖺𝗇𝖽 𝗍𝗋𝖺𝗃𝖾𝖼𝗍𝗈𝗋𝗂𝖾𝗌 𝖺𝗋𝖾 𝗌𝖺𝗏𝖾𝖽. 𝖳𝗁𝖾 𝗎𝗉𝖽𝖺𝗍𝖾 𝗌𝗍𝖾𝗉 𝖾𝗆𝗉𝗅𝗈𝗒𝗌 𝗍𝗁𝖾 𝖯𝗋𝗈𝗑𝗂𝗆𝖺𝗅 𝖯𝗈𝗅𝗂𝖼𝗒 𝖮𝗉𝗍𝗂𝗆𝗂𝗓𝖺𝗍𝗂𝗈𝗇 (𝖯𝖯𝖮) 𝖺𝗅𝗀𝗈𝗋𝗂𝗍𝗁𝗆 𝗍𝗈 𝗆𝖺𝗑𝗂𝗆𝗂𝗓𝖾 𝖺 𝗌𝗉𝖾𝖼𝗂𝖿𝗂𝖾𝖽 𝗈𝖻𝗃𝖾𝖼𝗍𝗂𝗏𝖾. — Algorithm 1. RL-in-RL Model.

The attentive state is an overlaid combination of the state and the attentive heatmap of feature attribution. The feature importance ranges from 0 to 1 as the heatmap color changes from blue to red. — Figure 4. Performance of the proposed RL-in-RL model in the Duckietown environment.

Figure 5. Attentive features of RL-in-RL^K with observation length K = 10.

Comparison of interpretation methods on Atari 2600 games. The RL-in-RL model and the supervision-based method are shown with overlaid heatmaps. Perturbation-based and gradient-based methods highlight attention areas on the saliency-overlaid state. Subfigures include (a) Pong, (b) Ms. Pac-Man, and (c) Space Invaders. — Figure 6. Comparisons among different interpretation methods on Atari 2600.

Figure 7. Comparisons between the best-performing action matching method and our reward-oriented RL-in-RL on the Duckietown environment.

(a) shows the percent episode return of the pretrained policy under various lane patterns, illustrating the impact of different lines (Fleft_w, Fy, Fright_w) on policy performance. (b) depicts the percent action divergence of the interpretation model under these lane patterns, comparing action consistency with the pretrained policy's actions and showing the effect of the lines on action consistency. — Figure 8. Quantitative analyses averaged across 100 random seeds.

Figure 9. Interpretation results from the action matching method, when the policy is pretrained without the middle yellow line.

Figure 10. Attention pattern of RL−in−RLᵃ.

Figure 11. Architecture of the encoder–decoder network in RL-in-RL.

Additional results of the RL-in-RL model on Atari 2600 games. Attentive states are visualized as in Figure 4. The games displayed are: (a) Enduro, (b) Assault, and (c) Breakout. — Figure 12. More results of the RL-in-RL model on Atari2600.

𝖳𝗐𝗈 𝖺𝖻𝗅𝖺𝗍𝗂𝗈𝗇 𝗌𝗍𝗎𝖽𝗂𝖾𝗌 𝗈𝗇 𝗍𝗁𝖾 𝗁𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝖾𝗋𝗌 𝗈𝖿 𝗍𝗁𝖾 𝖱𝖫-𝗂𝗇-𝖱𝖫 𝗆𝗈𝖽𝖾𝗅. (𝖺) 𝖲𝗁𝗈𝗐𝗌 𝗍𝗁𝖾 𝖾𝖿𝖿𝖾𝖼𝗍 𝗈𝖿 𝗌𝖾𝗍𝗍𝗂𝗇𝗀 𝛼 = 𝟢.𝟣 α=𝟢.𝟣, 𝗐𝗁𝖾𝗋𝖾 𝛼 α 𝗂𝗌 𝗍𝗁𝖾 𝗐𝖾𝗂𝗀𝗁𝗍 𝗈𝖿 𝗍𝗁𝖾 𝖺𝗎𝗑𝗂𝗅𝗂𝖺𝗋𝗒 𝗍𝖺𝗌𝗄 𝗅𝗈𝗌𝗌 𝖺𝗌 𝖽𝖾𝖿𝗂𝗇𝖾𝖽 𝗂𝗇 𝖤𝗊𝗎𝖺𝗍𝗂𝗈𝗇 (𝟪). (𝖻) 𝖲𝗁𝗈𝗐𝗌 𝗍𝗁𝖾 𝖾𝖿𝖿𝖾𝖼𝗍 𝗈𝖿 𝗌𝖾𝗍𝗍𝗂𝗇𝗀 𝛽 = 𝟢.𝟣 β=𝟢.𝟣, 𝗐𝗁𝖾𝗋𝖾 𝛽 β 𝗂𝗌 𝗎𝗌𝖾𝖽 𝗍𝗈 𝗌𝗉𝖾𝖾𝖽 𝗎𝗉 𝗍𝗁𝖾 𝗍𝗋𝖺𝗂𝗇𝗂𝗇𝗀 𝗉𝗋𝗈𝖼𝖾𝗌𝗌 𝖺𝗌 𝖽𝖾𝗌𝖼𝗋𝗂𝖻𝖾𝖽 𝗂𝗇 𝖠𝗉𝗉𝖾𝗇𝖽𝗂𝗑 𝖠. — Figure 13. Two ablation studies on the hyperparameters.

Figure 14. Comparative results of the RL-in-RL method and the action matching method’s ablation studies on the hyperparameter α.

Visulization of attention shift in the zigzag turn. Three rows correspond to original states, attentive states, and masked states, respectively. — Figure 15. Visulization of attention shift in the zigzag turn.

Conclusion

Here are the conclusions from the authors of this paper:

“In this article, we discussed the limitations of the commonly used assumption, the action matching principle, in RL interpretation methods. It is suggested that action matching cannot truly interpret the agent since it differs from the reward-oriented goal of RL. Hence, the proposed method first leverages reward consistency during feature attribution and models the interpretation problem as a new RL problem, denoted as RL-in-RL.

Moreover, it provides an adjustable observation length for one-step reward or multistep reward (or return) consistency, depending on the requirements of behavior analyses. Extensive experiments validate the proposed model and support our concerns that action matching would lead to redundant and noncausal attention during interpretation since it is dedicated to exactly identical actions and thus results in a sort of “overfitting.”

Nevertheless, although RL-in-RL shows superior interpretability and dispenses with redundant attention, further exploration of interpreting RL tasks with explicit causality is left for future work.”

Project Authors

Qisen Yang is an Artificial Intelligence PhD Student at Tsinghua University, China.

Huanqian Wang is currently pursuing the B.E. degree in control science and engineering with the Department of Automation, Tsinghua University, Beijing, China.

Mukun Tong is currently pursuing the B.E. degree in control science and engineering with the Department of Automation, Tsinghua University,
Beijing, China.

Wenjie Shi received his Ph.D. degree in control science and engineering from the Department of Automation, Institute of Industrial Intelligence and System, Tsinghua University, Beijing, China, in 2022.

Guang-Bin Huang is in the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.

Shiji Song is currently a Professor with the Department of Automation, Tsinghua University, Beijing, China.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Vision-based Reinforcement Learning for Lane-Tracking Control

Posted on June 8, 2024 | by Duckietown Admin

General Information

Vision-based Reinforcement Learning for Lane-Tracking Control
András Kalapos, Csaba Gór, Róbert Moni, István Harmati
Budapest University of Technology and Economics
Kalapos, A., Gór, C., Moni, R. and Harmati, I., 2021. Vision-based reinforcement learning for lane-tracking control. Acta IMEKO, 10(3), pp.7-14.

Vision-based reinforcement learning for lane-tracking control

What is Vision-based Reinforcement Learning? A few important topics:

Reinforcement Learning: a machine learning paradigm where an agent learns to make decisions by interacting with an environment to achieve a goal. In this context, reinforcement learning is used to teach a vehicle how to drive within Duckietown lanes by providing rewards or penalties based on its actions.

Vision-based Control: The control of the vehicle is based on visual inputs, specifically images captured by a forward-facing camera. These images are processed by a neural network to determine appropriate steering actions, allowing the vehicle to track lanes and avoid collisions.

Simulation-to-Reality (sim2real) Transfer Learning: The trained policy, which learns to control the vehicle in a simulated environment, is transferred to real-world scenarios. The effectiveness of the trained model in real-world driving situations is evaluated, demonstrating the ability to generalize learning from simulation to reality.

Domain Randomization: This technique involves introducing variations or randomizations into the simulation environment during training. By exposing the agent to a wide range of simulated scenarios with different lighting conditions, road surfaces, and other environmental factors, domain randomization helps improve the model’s ability to generalize to unseen real-world conditions.

Learn about RL, navigation and other robot autonomy topics at the link below!

Abstract

The present study focused on vision-based end-to-end reinforcement learning in relation to vehicle control problems such as lane following and collision avoidance. The controller policy presented in this paper is able to control a small-scale robot to follow the right-hand lane of a real two-lane road, although its training has only been carried out in a simulation.

This model, realised by a simple, convolutional network, relies on images of a forward-facing monocular camera and generates continuous actions that directly control the vehicle. To train this policy, proximal policy optimization was used, and to achieve the generalisation capability required for real performance, domain randomisation was used. A thorough analysis of the trained policy was conducted by measuring multiple performance metrics and comparing these to baselines that rely on other methods.

To assess the quality of the simulation-to-reality transfer learning process and the performance of the controller in the real world, simple metrics were measured on a real track and compared with results from a matching simulation. Further analysis was carried out by visualising salient object maps.

Highlights - Vision-based reinforcement learning for lane-tracking control

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Fig 1. Illustration of the policy architecture with the notations used.

Fig 2. Explanation of the proposed orientation reward

Fig 3. Examples of domain randomised observations

Fig 4. .a) Test track used for simulated reinforcement learning and baseline evaluations; b) and c) real and simulated test track used for the evaluation of the simulation-to-reality transfer.

Fig 5. Learning curves for the reinforcement learning agent with different action representations and reward functions.

Sequence of robot positions in a collision avoidance experiment with a policy trained using the modified orientation reward. After 𝑡=6 s, the controlled robot follows the vehicle in front at a short but safe distance until the end of the episode (approximate distance is calculated as the distance between the centre points of the robots minus the length of a robot). — Fig 6. Sequence of Robot Positions in Collision Avoidance Experiment with Modified Orientation Reward Trained Policy

Fig 7. Salient objects highlighted on observations in different domains and tasks. Blue regions represent high activations throughout the network.

Conclusion

Here are the conclusions from the authors of this paper:

“This work presented a solution to the problem of complex, vision-based lane following in the Duckietown environment using reinforcement learning to train an end-to-end steering policy capable of simulation-to-real transfer learning. It was found that the training is sensitive to problem formulation, such as the representation of actions.

This study has demonstrated that by using domain randomisation, a moderately detailed and accurate simulation is sufficient for training end-to-end lane-following agents that operate in a real environment. The performance of these agents was evaluated by comparing some basic metrics to match real and simulated scenarios.

Agents were also successfully trained to perform collision avoidance in addition to lane following. Finally, salient object visualisation was used to give an illustrative explanation of the inner workings of the policies in both the real and simulated domains.”.

Project Authors

András Kalapos is a Machine Learning PhD Student at Budapest University of Technology and Economics, Hungary.

Csaba Gór is a Machine Learning Engineer at Turbine, in Hungary.

Róbert Moni is a Senior Machine Learning Engineer at Continental.

István Harmati is an Associate Professor at Budapest University of Technology and Economics.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

End-to-end Deep RL (DRL) systems: in autonomous driving environments that rely on visual input for vehicle control face potential security risks, including:

State Adversarial Perturbations: Subtle alterations to visual input that mislead the DRL agent, causing incorrect decision-making.
Reward Tampering: Manipulation of the reward signal to misguide the learning process, leading the agent to adopt unsafe or inefficient policies.

These vulnerabilities can compromise the safety and reliability of self-driving vehicles.

Deep Reinforcement Learning for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness

Evaluating Adversarial Robustness in Duckietown Navigation

Posted on June 4, 2024 | by Duckietown Admin

General Information

Deep Reinforcement Learning for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness
Abdullah Hosseini, Saeid Houti, Junaid Qadir
Qatar University
A. Hosseini, S. Houti and J. Qadir, "Deep Reinforcement Learning for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness," 2023 International Symposium on Networks, Computers and Communications (ISNCC), Doha, Qatar, 2023, pp. 1-6, doi: 10.1109/ISNCC58260.2023.10323905.

Deep RL for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness

What is adversarial robustness in navigation tasks all about? A few important topics:

Reinforcement Learning (RL) is a type of machine learning where agents learn to make decisions by receiving rewards or penalties based on their actions in an environment. This is great because it removed the need for curated training datasets.

Deep Reinforcement Learning (DRL) enhances RL by using deep neural networks to process complex inputs and make decisions. Deep networks are neural networks with multiple layers.

Adversarial Robustness refers to a system’s ability to resist and maintain performance despite deliberate attacks or input perturbations.

Navigation is the task of finding feasible paths between points in the environment like Google Maps or similar systems provide us in everyday life.

Learn about RL, navigation and other robot autonomy topics at the link below.

Abstract

Self-driving cars have gained widespread attention in recent years due to their potential to revolutionize the transportation industry. However, their success critically depends on the ability of reinforcement learning (RL) algorithms to navigate complex environments safely. In this paper, we investigate the potential security risks associated with end-to-end deep RL (DRL) systems in autonomous driving environments that rely on visual input for vehicle control, using the open-source Duckietown platform for robotics and self-driving vehicles.

We demonstrate that current DRL algorithms are inherently susceptible to attacks by designing a general state adversarial perturbation and a reward tampering approach. Our strategy involves evaluating how attacks can manipulate the agent’s decision-making process and using this understanding to create a corrupted environment that can lead the agent towards low-performing policies. We introduce our state perturbation method, accompanied by empirical analysis and extensive evaluation, and then demonstrate a targeted attack using reward tampering that leads the agent to catastrophic situations.

Our experiments show that our attacks are effective in poisoning the learning of the agent when using the gradient-based Proximal Policy Optimization algorithm within the Duckietown environment. The results of this study are of interest to researchers and practitioners working in the field of autonomous driving, DRL, and computer security, and they can help inform the development of safer and more reliable autonomous driving systems.

Highlights - Evaluation of Adversarial Robustness Results

Here is a visual tour of the work of the authors. For more details, check out the paper link.

Fig. 1 - An illustration of the MDP framework, a fundamental concept in RL, where an agent interacts with an environment in a sequential manner to learn an optimal policy.

Fig. 2 - High-level architecture for training a DRL agent.

Average reward obtained by the agent for various reward functions in Duckietown — Fig. 3 - Average reward obtained by the agent for various reward functions.

Average reward of the agent under two different reward functions. Increasing the value of ϵ resulted in a decrease in the agent’s distance traveled and position-orientation reward as it completely deviated from the roadable area and faced towards the wrong direction — Fig. 4 - Average reward of the agent under two different reward functions.

Adversarial Navigation Robustness - Sequence of robot positions with DRL agent trained under adversarial and non-adversarial settings in a lane following experiment. The UAPFGSM method, making the agent move in circular movements with minimal perturbations, while adversarial reward tampering forces it to move in the opposite direction of the road. — Fig. 5 - Sequence of robot positions with DRL agent trained under adversarial and non-adversarial settings in a lane following experiment. The UAP- FGSM method, making the agent move in circular movements with minimal perturbations, while adversarial reward tampering forces it to move in the opposite direction of the road.

Fig. 6 - Saliency map provides insight into this aspect of the model’s behavior, particularly in the context of adversarial attacks. Due to the universal perturbation of the UAP-FGSM attack, the agent’s attention to nearby yellow- dashed lines is reduced.

Conclusion

Here are the conclusions from the authors of this paper:

“The focus of our study was to address adversarial attacks on deep reinforcement learning (DRL) agents, specifically examining state adversarial attacks and reward-tampering attacks.

We developed a parametric framework for state adversarial attacks and a non-parametric framework for reward tampering attacks, which enabled us to create effective attacks. We found that the performance of a DRL agent declined rapidly after the attack, and the deviation from the road was worse than that of standard DRL.

We used salient maps to provide a clear explanation of the policies’ internal operations in both the adversarial and non-adversarial aspects. Our research provides insight into the potential vulnerabilities of DRL agents and highlights the need for more robust and secure agents to mitigate the risk of adversarial attacks.

Moving forward, future work will focus on incorporating real-world analysis to test the performance of the DuckieBot under both adversarial and non-adversarial settings”.

Project Authors

Abdullah Hosseini is a Research and Development Specialist at Weill Cornell Medicine in Qatar.

Saeid Houti is a Software Developer at Ministry of Education and Higher Education Qatar.

Junaid Qadir is a Professor of Computer Engineering at Qatar University.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

End-to-end Deep RL (DRL) systems: in autonomous driving environments that rely on visual input for vehicle control face potential security risks, including:

State Adversarial Perturbations: Subtle alterations to visual input that mislead the DRL agent, causing incorrect decision-making.
Reward Tampering: Manipulation of the reward signal to misguide the learning process, leading the agent to adopt unsafe or inefficient policies.

These vulnerabilities can compromise the safety and reliability of self-driving vehicles.

Visual deep reinforcement learning paper snippet

Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer

Posted on May 6, 2024 | by Duckietown Admin

General Information

Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer
Dianzhao Li and Ostap Okhrin
Universität Dresden, Dresden
D. Li and O. Okhrin, "Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer," 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023, pp. 866-873, doi: 10.1109/ITSC57777.2023.10422677.

Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer

One way to obtain quick and cheap training data is to use simulation instead of real-world experiments. The question remains if the learnings of a simulation-trained agent apply to the real world. Sim2Real transfer is the field of research that studies this problem.

The challenge is particularly meaningful when using vision as the primary sensing capability for robots. Vision-based deep reinforcement learning (DRL) refers to a technique where ML agents, typically modeled as multi-layered neural networks, learn to “make decisions” directly from visual input.

The essence of RL is training robotic agents based on policies that reward desirable outcomes. This family of techniques typically leads to increased adaptability to operational scenarios.

To learn about RL and its place in the larger context of robot autonomy, check out the resources below.

Abstract

To achieve fully autonomous driving, vehicles must be capable of continuously performing various driving tasks, including lane keeping and car following, both of which are fundamental and well-studied driving ones. However, previous studies have mainly focused on individual tasks, and car following tasks have typically relied on complete leader-follower information to attain optimal performance.

To address this limitation, we propose a vision-based deep reinforcement learning (DRL) agent that can simultaneously perform lane-keeping and car-following maneuvers.

To evaluate the performance of our DRL agent, we compare it with a baseline controller and use various performance metrics for quantitative analysis. Furthermore, we conduct a real-world evaluation to demonstrate the Sim2Real transfer capability of the trained DRL agent.

To the best of our knowledge, our vision-based car following and lane-keeping agent with Sim2Real transfer capability is the first of its kind.

Highlights - Sim2Real transfer results

Here is a visual tour of the work of the authors. For all the details, check out the paper link.

Fig. 1. The proposed DRL framework for vision-based multi-task autonomous driving agents. The perception module leverages camera images to produce impact attributes regarding the environment, then the DRL control module utilizes the information to control the agent with enhanced generalization.

Duckiebot DB21 — Fig. 2. Robot car [Duckiebot] used during the real-world evaluation. (a) Side view of the robot car which equipped with front-view camera and Jetson Nano 2GB. (b) Back view of the car with a pattern of circles.

Fig. 3. An example velocity trajectory for the leading vehicle generated by the Ornstein-Uhlenbeck process for the training process.

Fig. 4. Training results of PPO agent with ten independent seeds over one million steps.

Fig. 5. Example trajectories of DRL agent (red) and baseline controller (green) following random leader trajectory (orange) during the evaluation.

Fig. 6. Distribution of TTC and time headway for DRL agent (red) and baseline controller (green).

Fig. 7. Example trajectories of DRL agent (red) and baseline controller (green) following a self defined external leader trajectory (orange) during the evaluation.

Conclusion

This study proposes a vision-based DRL agent that can simultaneously perform lane-keeping and car-following tasks.

The overall system is divided into two modules: the perception module and the control module. The perception module extracts task-relevant attributes of the surroundings, while the control module is a DRL agent that takes these attributes as input. To evaluate the performance of the DRL agent, we compare it with a baseline algorithm in both simulation and real-world environments.

In the simulation, we compare the car following and lane-keeping capabilities of the DRL agent and baseline controller using various performance metrics. In the real-world environment, we demonstrate that the DRL agent can follow the leading vehicle while maintaining lane-keeping ability.

In future work, we plan to enhance our DRL agent by incorporating a comfort factor to address unstable driving behavior. Additionally, we aim to deploy more advanced algorithms for improved generalization.

Project Authors

Dianzhao Li is a research assistant at the Technische Universität Dresden, Dresden, Germany.

Ostap Okhrin is Chair of Statistics and Econometrics at the Institute of Economics and Transport, School of Transportation, Technische Universitat Dresden in Germany.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

Posted on April 4, 2024 | by Duckietown Admin

General Information

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm
Ambedkar Dukkipati, Rajarshi Banerjee, Ranga Shaarad Ayyagari, Dhaval Parmar Udaybhai
Indian Institute of Science
A. Dukkipati, R. Banerjee, R. S. Ayyagari and D. P. Udaybhai, "Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 2483-2489, doi: 10.1109/IROS47612.2022.9981607.

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

Reinforcement learning (RL) is a rising star approach for developing autonomous robot agents. The essence of RL is training agents based on policies that reward desirable outcomes, which leads to increased adaptability to operational scenarios. Through iterations, robots refine their decision-making, optimizing actions based on rewards and penalties. This method provides robots with the flexibility to handle unpredictable situations, enhancing their efficiency and effectiveness in real-world tasks. To learn about RL with Duckietown, check out the resources below.

Abstract

Solving complex problems using reinforcement learning necessitates breaking down the problem into manageable tasks, and learning policies to solve these tasks. These policies, in turn, have to be controlled by a master policy that takes high-level decisions. Hence learning policies involves hierarchical decision structures. However, training such methods in practice may lead to poor generalization, with either sub-policies executing actions for too few time steps or devolving into a single policy altogether. In our work, we introduce an alternative approach to learn such skills sequentially without using an overarching hierarchical policy. We propose this method in the context of environments where a major component of the objective of a learning agent is to prolong the episode for as long as possible. We refer to our proposed method as Sequential Soft Option Critic. We demonstrate the utility of our approach on navigation and goal-based tasks in a flexible simulated 3D navigation environment that we have developed. We also show that our method outperforms prior methods such as Soft Actor-Critic and Soft Option Critic on various environments, including the Atari River Raid environment and the Gym-Duckietown self-driving car simulator.

Highlights

Here is a visual tour of the work of the authors.

For all the details, check out the paper link!

(a) An illustration of policies learned in our approach. (b) A representation of the state space partitioned by the termination functions of the options. Each oval corresponds to the set of states classified as non-termination states by the corresponding nested termination function. — Fig. 1. (a) An illustration of policies learned in our approach. Different policies learn different skills out of necessity to traverse the environment and not terminate the episode. The trajectory spawned by the sequence of policies that correctly learns to avoid terminating the episode is shown in green. The policy π1 is used when the agent is in the middle of the corridor. When the end of the corridor is reached, π2 is selected to take a turn and avoid a collision. (b) A representation of the state space partitioned by the termination functions of the options. Each oval corresponds to the set of states classified as non-termination states by the corresponding nested termination function.

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm image — Fig. 2. (a) & (b) The input to the neural network in our 3D navigation environment. These images obtained from the simulation are scaled (and also transformed to grayscale in the case of (a)) and appended with K previous images, and sent to the agent as input. (c) A view of the Duckietown environment.

Autonomous robot Navigation performance results obtained on various environments: SAC, SOC, SAC with HER — Fig. 3. The results obtained on various environments.

Outputs of the three learned policies when fed the corresponding input shown on the left. πω1 : orange, πω2 : blue, πω3 : green. Each policy outputs a Gaussian distribution, the active policy is the one filled with color. A positive value for the output action corresponds to turning right, and a negative value indicates a left turn. — Fig. 4. Outputs of the three learned policies when fed the corresponding input shown on the left. πω_1: orange, πω_2 : blue, πω_3 : green. Each policy outputs a Gaussian distribution, the active policy is the one filled with color. A positive value for the output action corresponds to turning right, and a negative value indicates a left turn.

Conclusion

In this paper, the authors proposed an algorithm called “Sequential Soft Option Critic” that allows adding new skills dynamically without the need for a higher-level master policy. This can be applicable to environments where a primary component of the objective is to prolong the episode. We show that this algorithm can be used to effectively incorporate diverse skills into an overall skill set, and it outperforms prior methods in several environments.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

Duckietown is modular, customizable and state-of-the-art. It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

Posted on May 9, 2022 | by Ivano Marocchi

Monocular Robot Navigation with Self-Supervised Pre-trained Vision Transformers

Duckietown’s infrastructure is used by researchers worldwide to push the boundaries of knowledge. Of the many outstanding works published, today we’d like to highlight “Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers” by Saavedra-Ruiz et al. at the University of Montreal.

Using visual transformers (ViT) for understanding their surroundings, Duckiebots are made capable of detecting and avoiding obstacles, while safely driving inside lanes. ViT is an emerging machine vision technique that has its root in Natural Language Processing (NLP) applications. The use of this architecture is recent and promising in Computer Vision. Enjoy the read and don’t forget to reproduce these results on your Duckiebots!

Abstract

“In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8×8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.”

Authors

Pipeline

“We propose to train a classifier to predict labels for every 8×8 patch in an image. Our classifier is a fully-connected network which we apply over ViT patch encodings to predict a coarse segmentation mask:”

Conclusions

“In this work, we study how embodied agents with visionbased motion can benefit from ViTs pretrained via SSL methods. Specifically, we train a perception model with only 70 images to navigate a real robot in two monocular visual-servoing tasks. Additionally, in contrast to previous SSL literature for general computer vision tasks, our agent appears to benefit more from small high-throughput models rather than large high-capacity ones. We demonstrate how ViT architectures can flexibly adapt their inference resolution based on available resources, and how they can be used in robotic application depending on the precision needed by the embodied agent. Our approach is based on predicting labels for 8×8 image patches, and is not well-suited for predicting high-resolution segmentation masks, in which case an encoder-decoder architecture should be preferred. The low resolution of our predictions does not seem to hinder navigation performance however, and we foresee as an interesting research direction how those high-throughput low-resolution predictions affect safety-critical applications. Moreover, training perception models in an SSL fashion on sensory data from the robot itself rather than generic image datasets (e.g., ImageNet) appears to be a promising research avenue, and is likely to yield visual representations that are better adapted to downstream visual servoing applications.”

Learn more

The Duckietown platform offers robotics and AI learning experiences.

Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots

Posted on July 21, 2021 | by Konstantin Chaika

Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots

After assembling the robot, components such as the camera and wheels need to be calibrated. This requires human participation and depends on human factors. We describe the approach to fully automatic calibration of a robot’s camera and wheels.

The camera calibration collects the necessary set of images by automatically moving the robot in front of the chess boards, and then moving it on the marked floor, assessing its trajectory curvature. As a result of the calibration, coefficient k is calculated for the wheels, and camera matrix K (which includes the focal length, the optical center, and the skew coefficient) and distortion coefficients D are calculated for the camera.

Proposed approach has been tested on duckiebots in Alexander Popov’s International Innovation Institute for Artificial Intelligence, Cybersecurity and Communication, SPbETU “LETI”. This solution is comparable to manual calibrations and is capable of replacing a human for this task.

Camera calibration process

The initial position of the robot is a part of the floor with chessboards in front, where the robot is located from the very beginning, on which its camera is directed and the floorsurface is marked with aruco markers on the other side of it.

There can be any number of chessboards, determined by the amount of free space around the robot. To a greater extent, the accuracy of calibration is affected by the frames with different positions of the boards, e.g., boards located at different distances from the robot and at different angles. The physical size and type of all the boards around the robot must be the same.

In fact, the camera calibration implies that the robot is rotating around its axis and taking pictures of all the viewable chessboards in turn. In this case, the ability to make several “passes” during the shooting process should be provided for, to control which of the boards the robot is currently observing and in which direction it should turn. As a result, the algorithm can be represented as a sequence of actions: “get a frame from the camera” and “turn” a little. The final algorithm comprises the following sequence of actions:

Obtain frame from the camera;
Find a chessboard on the camera frame;
Save information about board corners found in the image;
Determine the direction of rotation according to the schedule;
Make a step;
Either repeat the steps described above, or complete the data
collection and proceed with the camera calibration using OpenCV.

Wheels calibration process

Floor markers should be oriented towards the chessboards and begin as close to the robot as possible. The distance between the markers depends on camera’s resolution, as well as its height and angle of inclination, but it must be such that at least three recognizable markers can simultaneously be in the frame. For ours experiments, the distance between the markers was set as 15 cm with a marker size of 6.5 cm. The algorithm does not take into account the relative position of the markers against each other; however, the orientation of all markers must be strictly the same.

Let us consider the first iteration of the automatic wheel calibration algorithm:

The robot receives the orientation of the marker closest to it and remembers it.
Next, the robot moves forward with thespeeds of the left and right wheels equal to
ω₁ω₂ for some fixed time t. The speeds are calculated taking into account the calibration
coefficient k, which for the first iteration is chosen to equal 1 – that is, it is assumed that
the real wheel speeds are equal.
The robot obtains the orientation of the marker closest to it again and calculates the
difference in angles between them.
The coefficient k_i for this step is calculated.
The robot moves back for the same time t.

In order to reduce the influence of the error in calculating ki, coefficient k is refined only by the value of (k_i−1)/2 after each iteration. It is important to complete this step after the robot moves back, because it reduces the chance of the robot moving outside the area width. If, after the next step, the modulus of the difference between (k_i−1)/2 and 1.0 becomes less than the pre-selected E, then at this iteration (k_i−1)/2 is not taken into account. If after three successive iterations k_i is not taken into account, the wheel calibration is considered to be completed.

Accuracy Evaluation

To compare camera calibration errors, the knowledge of how to calculate these errors is needed. Since the calibration mechanism is used by the OpenCV library, the error is also calculated by the method offered by this library.

As noted earlier, with respect to calibration factors, the approach used to calibrate the camera is not applicable. Therefore, the influence of the coefficient on the robot’s trajectory curvature is estimated. To do this, the robot was located at a certain fixed distance from a straight line, along which it was oriented and then moved in manual mode strictly directly to a distance of two meters from the start point along the axis, relative to which it was oriented. Then, the robot stopped and the distance between the initial distance to the line and the final one was calculated.

Two metrices were estimated – reprojection error and straight line deviation. First one shows the quality of camera calibration, and the second one represents the quality of wheels calibration. Two pictures below present result of 10 independent tests in comparison with manual calibration.

The tests found that the suggested solution, on average, shows that the results are not much worse, than the classical manual solution when calibrating the camera, as well as when calibrating the wheels with a well known calibrated camera. However, when calibrating both the wheels and the camera, the wheel calibration can be significantly affected by the camera calibration effect. As a result of testing, a clear relationship was found between the reprojection error and the straight line deviation.

Method Modifications

After the integration of this approach, it became necessary to automate the last step-moving the robot to the field. Due to the fact that after the calibration step completion the robot becomes fully prepared for launching autonomous driving algorithms on it, the automation of this step further reduces the time spent by the operator when calibrating the robot, since instead of moving the robot to the field manually, he can place the next robot at the starting position. In our case, the calibration field was located at the side of the road lane so that the floor markers used to calibrate the wheels are oriented perpendicular to the road lane.

Thus, the first stage of the robot automatic removal from the calibration zone is to return its orientation back to the same state, as it was at the moment when the wheel calibration started. This was carried out using exactly the same approach that was described earlier—depending on the orientation of the floor marker closest to the robot, the robot rotates step by step about its axis clockwise or counterclockwise until the value of the robot’s orientation angle is modulo less than some preselected value.

At this point, the robot is still on the wheel calibration field, but in this case, it is oriented towards the lane. Thus, the last step is to move the robot outside the border of the field with markers. To do this, it is enough to give the robot a command to move directly until it stops observing the markers, when the last marker is hidden from the camera view. This means that the robot has left the calibration zone, and the robot can be put into the lane following mode.

Future Work

During the robot’s operation, the wheels calibration may become irrelevant. It can be influenced by various factors: a change in the wheel diameter due to wear of the wheel coating, a slight change in the characteristics of motors due to the wear of the gearbox plastic, and a change in the robot’s weight distribution, e.g., laying the cables on the other side of the case after charging the robot, and so a slight calibration mismatch can occur. However, all these factors have a rather small impact, and the robot will still have a satisfactory calibration. There is no need to re-perform the calibration process, just a little refinement of the current one seems to be enough. To do this, a section of the road along which the robots will be guaranteed to pass regularly, was selected.

Further, markers were placed in this lane according to the rules described earlier: the distance between the markers is 15 cm; the size of the marker is 6.5 cm. The markers are located in the center of the lane. The distance between the markers may be not completely accurate, but they should be oriented in the same direction and co-directed with the movement in the lane on which they are placed.

The first marker in the direction of travel must have a predefined ID. It can be anything, the only limitation is that it must be unique for a current robot environment. Further, the following changes were made to the algorithm for the standard control of the robot: when the robot recognizes the first marker with a predetermined ID while driving right in the lane, it corrects its orientation relative to this marker and continues to move strictly straight ahead. Further, the algorithm is similar to the one described earlier—the robot recognizing the next marker can refine its wheel calibration coefficient, apply it, and change the orientation coaxially with the next marker.

Conclusions

As a result, a solution was developed that allows a fully automatic calibration of the camera and the Duckiebot’s wheels. The main feature is the autonomy of the process, which allows one person to run the calibration of an arbitrary number of robots in parallel and not be blocked during their calibration. In addition, the robot is able to improve its calibration as it operates in default mode.

Comparing the developed solution with the initial one resulted in finding a slight deterioration in accuracy, which is primarily associated with the accuracy of the camera calibration; however, the result obtained is sufficient for the robot’s initial calibration and is comparable to manual calibration.

General Information

Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Abstract

Highlights - Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Conclusion - Survey on Testbeds for Vehicle Autonomy & Robot Swarms

Project Authors

Learn more

General Information

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Abstract

Highlights - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Conclusion - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Project Authors

Learn more

General Information

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Abstract

Highlights - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Conclusion - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Project Authors

Learn more

General Information

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Abstract

Highlights - Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Conclusion

Project Authors

Learn more

General Information

Vision-based reinforcement learning for lane-tracking control

Abstract

Highlights - Vision-based reinforcement learning for lane-tracking control

Conclusion

Project Authors

Learn more

General Information

Deep RL for Autonomous Navigation on Duckietown Platform: Evaluation of Adversarial Robustness

Abstract

Highlights - Evaluation of Adversarial Robustness Results

Conclusion

Project Authors

Learn more

General Information

Vision-Based DRL Autonomous Driving Agent with Sim2Real Transfer

Abstract

Highlights - Sim2Real transfer results

Conclusion

Project Authors

Learn more

General Information

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

Abstract

Highlights

Conclusion

Learn more

Monocular Robot Navigation with Self-Supervised Pre-trained Vision Transformers

Abstract

Authors

Pipeline

Conclusions

Learn more

Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots

Camera calibration process

Wheels calibration process

Accuracy Evaluation

Method Modifications

Future Work

Conclusions

Did you find this interesting?