Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

General Information

Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Flowchart illustrating the step update loop in the Duckie-MAAD architecture, detailing the process of agent action, path following, wheel velocity calculation, pose estimation, and policy update when training multi-agent reinforcement learning (MARL).

In the field of autonomous driving, transferring policies from simulation to the real world (Sim-to-real transfer, or Sim2Real) is theoretically desirable, as it is much faster and more cost-effective to train agents in simulation rather than in the real world. 

Given simulations are just that – representations of the real world – the question of whether the trained policies will actually perform well enough in the real world is always open. This challenge is known as “Sim-to-Real gap”. 

This gap is especially pronounced in Multi-Agent Reinforcement Learning (MARL), where agent collaboration and environmental synchronization significantly complicate policy transfer.

The authors of this work propose employing “Multi-Agent Proximal Policy Optimization” (MAPPO) in conjunction with domain randomization techniques, to create a robust pipeline for training MARL policies that is not only effective in simulation but also adaptable to real-world conditions.

Through varying levels of parameter randomization—such as altering lighting conditions, lane markings, and agent behaviors— the authors enhance the robustness of trained policies, ensuring they generalize effectively across a wide range of real-world scenarios.

Learn about training, sim2real, navigation, and other robot autonomy topics with Duckietown starting from the link below!

Abstract

Autonomous Driving requires high levels of coordination and collaboration between agents. Achieving effective coordination in multi-agent systems is a difficult task that remains largely unresolved. Multi-Agent Reinforcement Learning has arisen as a powerful method to accomplish this task because it considers the interaction between agents and also allows for decentralized training—which makes it highly scalable. 

However, transferring policies from simulation to the real world is a big challenge, even for single-agent applications. Multi-agent systems add additional complexities to the Sim-to-Real gap due to agent collaboration and environment synchronization. 

In this paper, we propose a method to transfer multi-agent autonomous driving policies to the real world. For this, we create a multi-agent environment that imitates the dynamics of the Duckietown multi-robot testbed, and train multi-agent policies using the MAPPO algorithm with different levels of domain randomization. We then transfer the trained policies to the Duckietown testbed and show that when using our method, domain randomization can reduce the reality gap by 90%. 

Moreover, we show that different levels of parameter randomization have a substantial impact on the Sim-to-Real gap. Finally, our approach achieves significantly better results than a rule-based benchmark.

 

Highlights - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here is a visual tour of the work of the authors. For more details, check out the full paper.

 

Conclusion - Sim2Real Transfer of Multi-Agent Policies for Self-Driving

Here are the conclusions from the authors of this paper:

“AVs will lead to enormous safety and efficiency benefits across multiple fields, once the complex problem of multiagent coordination and collaboration is solved. MARL can help towards this, as it enables agents to learn to collaborate by sharing observations and rewards. 

However, the successful application of MARL, is heavily dependent on the fidelity of the simulation environment they were trained in. We present a method to train policies using MARL and to reduce the reality gap when transferring them to the real world via adding domain randomization during training, which we show has a significant and positive impact in real performance compared to rule-based methods or policies trained without different levels of domain randomization. 

It is important to mention that despite the performance improvements observed when using domain randomization, its use presents diminishing returns as seen with the overly conservative policy, for it cannot completely close the reality gap without increasing the fidelity of the simulator. Additionally, the amount of domain randomization to be used is case-specific and a theory for the selection of domain randomization remains an open question. The quantification and description of reality gaps presents another opportunity for future research.”

Project Authors

Eduardo Candela

Eduardo Candela is currently working as the Co-Founder & CTO of MAIHEM (YC W24), California.

 
Leandro Parada

Leandro Parada is a Research Associate at Imperial College London, United Kingdom.

 

Luís Marques is a Doctoral Researcher in the Department of Robotics at the University of Michigan, USA.

 
 
 
Tiberiu Andrei Georgescu

Tiberiu Andrei Georgescu is a Doctoral Researcher at Imperial College London, United Kingdom.

 
 
 
 
Yiannis Demiris

Yiannis Demiris is a Professor of Human-Centred Robotics and Royal Academy of Engineering Chair in Emerging Technologies at Imperial College London, United Kingdom.

 
Panagiotis Angeloudis

Panagiotis Angeloudis is a Reader in Transport Systems and Logistics at Imperial College London, United Kingdom.

 

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.