Reinforcement Learning for the Control of Autonomous Robots

Project Resources

Objective: Develop and evaluate reinforcement learning (RL) techniques for safe and autonomous navigation in any Duckietown
Approach: Develop, train and test RL algorithms including Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO), for autonomous lane-keeping and obstacle detection on a DB21 Duckiebot.
Authors: Bruno Fournier, Sébastien Biner

RL on Duckiebots - Project highlights

Here is a visual tour of the authors’ work on implementing reinforcement learning in Duckietown.

Diagram illustrating the basic principle of reinforcement learning applied to autonomous driving, showing an agent interacting with the environment, making decisions based on rewards and feedback. — Figure 1. Principle of Reinforcement Learning in Autonomous Driving.

Diagram showing the application of reinforcement learning in the Duckietown environment, with a Duckiebot navigating simulated roadways based on RL feedback. — Figure 2. Reinforcement Learning in the Duckietown Environment.

Comparison diagram showing the differences between Q-learning and Deep Q-Networks (DQN). — Figure 3. Q-learning vs. Deep Q-Networks (DQN).

Diagram depicting the learning process with the Deep Q-Network (DQN) model, showing how actions are taken based on state inputs and updated using Q-value estimations. — Figure 4. Learning Process with the Deep Q-Network (DQN) Model.

Diagram illustrating the architecture of the Deep Deterministic Policy Gradient (DDPG) algorithm, highlighting the actor and critic networks, experience replay, and target networks. — Figure 5. Architecture of the Deep Deterministic Policy Gradient (DDPG) Algorithm.

Image of a simulation environment in Duckietown, displaying a virtual map with roads, intersections, and a Duckie. — Figure 6. Simulation Environment in Duckietown.

Diagram of the Duckiebot test track with modular square elements, including straight lines, right-angle turns, and intersections, illuminated by two Walimex Pro LED lamps. — Figure 7. Modular Test Track for Duckiebot Driving Tests.

Diagram illustrating the Duckiebot’s reward factors: the distance from the center of the lane (laned) and the angle relative to the lane’s centerline (laneθ), used in the DQN reward function. — Figure 8. Reward Factors for DQN in Duckiebot Navigation.

Side-by-side images showing line detection in Duckietown before and after HSV parameter correction, illustrating improved clarity and accuracy of detected lines. — Figure 9. Line Detection Improvement with HSV Parameter Correction.

Diagram illustrating the structure of the PA2 DQN model, showing the pre-processing of RGB images before they are fed into the neural network for reinforcement learning. — Figure 10. DQN Model Structure.

Aerial view of the "loop_empty" training map used for DQN model training, featuring straight sections and both left and right turns. — Figure 11. Training Map for DQN.

Illustration highlighting the differences between simulation and reality in the context of Duckietown, including variations in color tones, camera angles, and environmental objects. — Figure 12. Differences Between Simulation and Reality.

Graph showing the average reward and average episode length for the DQN model in PA2 over multiple training episodes. — Figure 13. DQN (PA2): Average Reward and Average Episode Length.

Graph showing episode-based rewards during the first phase of DDPG training. — Figure 14. DDPG Training 1: Episode-Based Rewards.

Graph displaying episode-based rewards during the second phase of DDPG training. — Figure 15. DDPG Training 2: Episode-Based Rewards.

Graph showing the average rewards achieved during the training process. — Figure 16. Average Rewards During Training.

Graph showing the average distance traveled during each episode throughout the training. — Figure 17. Average Distance Traveled During Episodes.

Graph showing the evolution of the agent's speed throughout the training process. — Figure 18. Evolution of the Agent's Speed.

Graph showing the average reward and average episode length during Trial 1 of training. — Figure 19. Trial 1: Average Reward and Average Episode Length.

Graph showing the average reward and average episode length during Trial 2 of training. — Figure 20. Trial 2: Average Reward and Average Episode Length.

Graph showing the average reward and average episode length during Trial 3 of training. — Figure 21. Trial 3: Average Reward and Average Episode Length.

Graph showing the average reward and average episode length during Trial 4 of training. — Figure 22. Trial 4: Average Reward and Average Episode Length.

Visualization of the agent's trajectory on the evaluation track during testing. — Figure 23. Agent Trajectory on the Evaluation Track.

Graph showing the average reward and average episode length during Trial 5 of training. — Figure 24. Trial 5: Average Reward and Average Episode Length.

Graph showing the average reward and average episode length during Trial 6 of training. — Figure 25. Trial 6: Average Reward and Average Episode Length.

Visualization of the robot's trajectory as it negotiates a bend on the track. — Figure 26. Trajectory Taken by the Robot to Negotiate a Bend.

Graph showing the evolution of the safety factor throughout the training process. — Figure 27. Evolution of the Safety Factor.

Why reinforcement learning for the control of Duckiebots in Duckietown?

This thesis explores the use of reinforcement learning (RL) techniques to enable autonomous navigation in the Duckietown. Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving feedback through rewards or penalties. The goal is to maximize long-term rewards.

This work focuses on implementing and comparing various RL algorithms—specifically Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) – to analyze performance in autonomous navigation. RL enables agents to learn behaviors by interacting with their environment and adapting to dynamic conditions. The PPO model was found demonstrating smooth driving using grayscale images for enhanced computational efficiency.

Another feature of this project is the integration of YOLO v5, an object detection model, which allowed the Duckiebot to recognize and stop for obstacles, improving its safety capabilities. This integration of perception and RL enabled the Duckiebot not only to follow lanes but also to navigate autonomously, making ‘real-time’ adjustments based on its surroundings.

By transferring trained models from simulation to physical Duckiebots (Sim2Real), the thesis evaluates the feasibility of applying these models to real-world autonomous driving scenarios. This work showcases how reinforcement learning and object detection can be combined to advance the development of safe, autonomous navigation systems, providing insights that could eventually be adapted for full-scale vehicles.

Reinforcement learning for the control of Duckiebots in Duckietown - the challenges

Implementing reinforcement learning, in this project faced a number of challeneges summarized below –

Transfer from Simulation to Reality (Sim2Real): Models trained in simulations often encountered difficulties when applied to real-world Duckiebots, requiring adjustments for accurate and stable performance.
Computational Constraints: Limited processing power on the Duckiebots made it challenging to run complex RL models and object detection algorithms simultaneously.
Stability and Safety of Learning Models: Guaranteeing that the Duckiebot’s actions were safe and did not lead to erratic behaviors or collisions required fine-tuning and extensive testing of the RL algorithms.
Obstacle Detection and Avoidance: Integrating YOLO v5 for obstacle detection posed challenges in ensuring smooth integration with RL, as both systems needed to work harmoniously for obstacle avoidance.

These challenges were addressed through algorithm optimization, iterative model testing, and adjustments to the hyperparameters.

Reinforcement learning for the control of Duckiebots in Duckietown: Results

Reinforcement learning for the control of Duckiebots in Duckietown: Authors

Bruno Fournier is currently pursuing Master of Science in Engineering, Data Science at the HES-SO Haute école spécialisée de Suisse occidentale, Switzerland.

Sébastien Biner is currently pursuing Bachelor of Science in Automotive and Vehicle Technology at the Berner Fachhochschule BFH, Switzerland.

Learn more

Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.