General Information

Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real
Eduardo Candela, Leandro Parada, Luis Marques, Tiberiu-Andrei Georgescu, Yiannis Demiris, Panagiotis Angeloudis
Imperial College London, United Kingdom
E. Candela, L. Parada, L. Marques, T. -A. Georgescu, Y. Demiris and P. Angeloudis, "Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 8814-8820, doi: 10.1109/IROS47612.2022.9981319.

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

In the field of autonomous driving, transferring policies from simulation to the real world (Sim-to-real transfer, or Sim2Real) is theoretically desirable, as it is much faster and more cost-effective to train agents in simulation rather than in the real world.

Given simulations are just that – representations of the real world – the question of whether the trained policies will actually perform well enough in the real world is always open. This challenge is known as “Sim-to-Real gap”.

This gap is especially pronounced in Multi-Agent Reinforcement Learning (MARL), where agent collaboration and environmental synchronization significantly complicate policy transfer.

The authors of this work propose employing “Multi-Agent Proximal Policy Optimization” (MAPPO) in conjunction with domain randomization techniques, to create a robust pipeline for training MARL policies that is not only effective in simulation but also adaptable to real-world conditions.

Through varying levels of parameter randomization—such as altering lighting conditions, lane markings, and agent behaviors— the authors enhance the robustness of trained policies, ensuring they generalize effectively across a wide range of real-world scenarios.

Learn about training, sim2real, navigation, and other robot autonomy topics with Duckietown starting from the link below!

Abstract

Autonomous Driving requires high levels of coordination and collaboration between agents. Achieving effective coordination in multi-agent systems is a difficult task that remains largely unresolved. Multi-Agent Reinforcement Learning has arisen as a powerful method to accomplish this task because it considers the interaction between agents and also allows for decentralized training—which makes it highly scalable.

However, transferring policies from simulation to the real world is a big challenge, even for single-agent applications. Multi-agent systems add additional complexities to the Sim-to-Real gap due to agent collaboration and environment synchronization.

In this paper, we propose a method to transfer multi-agent autonomous driving policies to the real world. For this, we create a multi-agent environment that imitates the dynamics of the Duckietown multi-robot testbed, and train multi-agent policies using the MAPPO algorithm with different levels of domain randomization. We then transfer the trained policies to the Duckietown testbed and show that when using our method, domain randomization can reduce the reality gap by 90%.

Moreover, we show that different levels of parameter randomization have a substantial impact on the Sim-to-Real gap. Finally, our approach achieves significantly better results than a rule-based benchmark.

Highlights - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Image of a test environment where autonomous Duckiebot robots navigate a track while avoiding parked obstacles. The setup includes a motion capture camera system tracking the real-time pose of the vehicles. — Figure 1. Autonomous Vehicle Training on a Real-World Track Using Multi-Agent Deep Reinforcement Learning with Sim2Real Transfer.

Diagram of a test track used in both simulation and real-world settings, showing waypoints, robot positions, goal points, and the steering angle required for path following in the Duckie-MAAD environment. — Figure 2. Test Track for Duckie-MAAD Gym Environment with Waypoints and Path Following Function.

Flowchart of the Duckie-MAAD architecture showing the step update loop for multi-agent autonomous driving, including action selection, path following, inverse kinematics, domain randomization, and policy updates with MAPPO. — Figure 3. Duckie-MAAD Architecture: Step Update Loop for Sim2Real Transfer in Multi-Agent Autonomous Driving.

Graph showing the reward convergence over time during the training of a Medium Domain Randomization (D.R.) policy in a multi-agent reinforcement learning setup. — Figure 4. Reward Convergence During Training of Medium Domain Randomization (D.R.) Policy.

Bar chart comparing average rewards for multi-agent reinforcement learning (MARL) policies in simulation and real life, showing the effectiveness of medium domain randomization in bridging the Sim2Real gap. — Figure 5. Comparison of Average Rewards for MARL Policies in Simulation and Real Life.

Box plots comparing the distribution of speed and track exit metrics across different policies, including rule-based and MARL policies with varying levels of domain randomization (D.R.), in both simulation and real-world scenarios. — Figure 6. Performance Metrics Distribution for Various Policies: Speed and Track Exits.

Figure 7. Performance Metrics Distribution: Collisions and Lane Changes Across Different Policies.

Conclusion - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here are the conclusions from the authors of this paper:

“AVs will lead to enormous safety and efficiency benefits across multiple fields, once the complex problem of multiagent coordination and collaboration is solved. MARL can help towards this, as it enables agents to learn to collaborate by sharing observations and rewards.

However, the successful application of MARL, is heavily dependent on the fidelity of the simulation environment they were trained in. We present a method to train policies using MARL and to reduce the reality gap when transferring them to the real world via adding domain randomization during training, which we show has a significant and positive impact in real performance compared to rule-based methods or policies trained without different levels of domain randomization.

It is important to mention that despite the performance improvements observed when using domain randomization, its use presents diminishing returns as seen with the overly conservative policy, for it cannot completely close the reality gap without increasing the fidelity of the simulator. Additionally, the amount of domain randomization to be used is case-specific and a theory for the selection of domain randomization remains an open question. The quantification and description of reality gaps presents another opportunity for future research.”

Project Authors

Eduardo Candela is currently working as the Co-Founder & CTO of MAIHEM (YC W24), California.

Leandro Parada is a Research Associate at Imperial College London, United Kingdom.

Luís Marques is a Doctoral Researcher in the Department of Robotics at the University of Michigan, USA.

Tiberiu Andrei Georgescu is a Doctoral Researcher at Imperial College London, United Kingdom.

Yiannis Demiris is a Professor of Human-Centred Robotics and Royal Academy of Engineering Chair in Emerging Technologies at Imperial College London, United Kingdom.

Panagiotis Angeloudis is a Reader in Transport Systems and Logistics at Imperial College London, United Kingdom.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.