General Information
- Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real
- Eduardo Candela, Leandro Parada, Luis Marques, Tiberiu-Andrei Georgescu, Yiannis Demiris, Panagiotis Angeloudis
- Imperial College London, United Kingdom
- E. Candela, L. Parada, L. Marques, T. -A. Georgescu, Y. Demiris and P. Angeloudis, "Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 8814-8820, doi: 10.1109/IROS47612.2022.9981319.
Sim2Real Transfer of Multi-Agent Policies for Self-Driving
In the field of autonomous driving, transferring policies from simulation to the real world (Sim-to-real transfer, or Sim2Real) is theoretically desirable, as it is much faster and more cost-effective to train agents in simulation rather than in the real world.
Given simulations are just that – representations of the real world – the question of whether the trained policies will actually perform well enough in the real world is always open. This challenge is known as “Sim-to-Real gap”.
This gap is especially pronounced in Multi-Agent Reinforcement Learning (MARL), where agent collaboration and environmental synchronization significantly complicate policy transfer.
The authors of this work propose employing “Multi-Agent Proximal Policy Optimization” (MAPPO) in conjunction with domain randomization techniques, to create a robust pipeline for training MARL policies that is not only effective in simulation but also adaptable to real-world conditions.
Through varying levels of parameter randomization—such as altering lighting conditions, lane markings, and agent behaviors— the authors enhance the robustness of trained policies, ensuring they generalize effectively across a wide range of real-world scenarios.
Learn about training, sim2real, navigation, and other robot autonomy topics with Duckietown starting from the link below!
Abstract
Autonomous Driving requires high levels of coordination and collaboration between agents. Achieving effective coordination in multi-agent systems is a difficult task that remains largely unresolved. Multi-Agent Reinforcement Learning has arisen as a powerful method to accomplish this task because it considers the interaction between agents and also allows for decentralized training—which makes it highly scalable.
However, transferring policies from simulation to the real world is a big challenge, even for single-agent applications. Multi-agent systems add additional complexities to the Sim-to-Real gap due to agent collaboration and environment synchronization.
In this paper, we propose a method to transfer multi-agent autonomous driving policies to the real world. For this, we create a multi-agent environment that imitates the dynamics of the Duckietown multi-robot testbed, and train multi-agent policies using the MAPPO algorithm with different levels of domain randomization. We then transfer the trained policies to the Duckietown testbed and show that when using our method, domain randomization can reduce the reality gap by 90%.
Moreover, we show that different levels of parameter randomization have a substantial impact on the Sim-to-Real gap. Finally, our approach achieves significantly better results than a rule-based benchmark.
Highlights - Sim2Real Transfer of Multi-Agent Policies for Self-Driving
Here is a visual tour of the work of the authors. For more details, check out the full paper.
Conclusion - Sim2Real Transfer of Multi-Agent Policies for Self-Driving
Here are the conclusions from the authors of this paper:
“AVs will lead to enormous safety and efficiency benefits across multiple fields, once the complex problem of multiagent coordination and collaboration is solved. MARL can help towards this, as it enables agents to learn to collaborate by sharing observations and rewards.
However, the successful application of MARL, is heavily dependent on the fidelity of the simulation environment they were trained in. We present a method to train policies using MARL and to reduce the reality gap when transferring them to the real world via adding domain randomization during training, which we show has a significant and positive impact in real performance compared to rule-based methods or policies trained without different levels of domain randomization.
It is important to mention that despite the performance improvements observed when using domain randomization, its use presents diminishing returns as seen with the overly conservative policy, for it cannot completely close the reality gap without increasing the fidelity of the simulator. Additionally, the amount of domain randomization to be used is case-specific and a theory for the selection of domain randomization remains an open question. The quantification and description of reality gaps presents another opportunity for future research.”
Project Authors
Eduardo Candela is currently working as the Co-Founder & CTO of MAIHEM (YC W24), California.
Leandro Parada is a Research Associate at Imperial College London, United Kingdom.
Luís Marques is a Doctoral Researcher in the Department of Robotics at the University of Michigan, USA.
Yiannis Demiris is a Professor of Human-Centred Robotics and Royal Academy of Engineering Chair in Emerging Technologies at Imperial College London, United Kingdom.
Panagiotis Angeloudis is a Reader in Transport Systems and Logistics at Imperial College London, United Kingdom.
Learn more
Duckietown is a platform for creating and disseminating robotics and AI learning experiences.
It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.