Safe Reinforcement Learning (Safe-RL) in Duckietown
Project Resources
Objective: Implement safe reinforcement learning (Safe-RL) to train Duckiebots at follow a lane, while keeping the robots within the boundaries of the road.
Approach: Deep Q Learning
Authors: Jan Steinmüller, Dr. Amr Alanwar Abdelhafez
Safe-RL Duckietown Project – In his thesis titled “Safe-RL-Duckietown“, Jan Steinmüller used safe reinforcement learning to train Duckiebots to follow a lane while keeping said robots safe during training.
Safe Reinforcement Learning involves learning policies that maximize expected returns while ensuring reasonable system performance and adhering to safety constraints throughout both the learning and deployment phases. Reinforcement learning is a machine learning paradigm where agents learn to make decisions by maximizing cumulative rewards through interaction with an environment, without the necessity for training data or models.
The final result was a trained agent capable of following lanes while avoiding unsafe positions.
This is an open source project, and can be reproduced and improved upon through the Duckietown platform.
Process diagram of action selection and safety layer
Results of the Safe-RL Duckietown Project
Safe Reinforcement Learning: Results and Conclusions
Based on the results, it can be concluded that there is no disadvantage to using a safety layer when doing reinforcement learning since execution time is very similar. Moreover, the dramatically improved safety of the vehicle is helpful for the robot’s training as fewer actions with lower or even negative rewards will be executed. Because of this, reinforcement learning agents with safety layers learn faster and reduce the number of unsafe actions that are being executed.
Unfortunately, manual observation and intervention by the user were still necessary, however, the frequency was clearly reduced which further improved learning as the robots in testing did not know if an outside intervention was done which could result in an action being rewarded incorrectly.
It was also concluded that this project did not reach perfect safety with the implementation. Therefore a fully autonomous reinforcement learning training without any human intervention has not yet been achieved. A lot of improvement factors have been found that can further improve the safety and recovery rate. Additionally, some major problems which are not direct results of the reinforcement learning or safety layer have been identified.
These problems could be attempted to be fixed in different ways like improving the open source implementations of lane filter nodes or adding more sensors or cameras to the robot in order to extend the input data to the agent. Another area that was untouched during the research of this project was other vehicles inside the current lane. The safety layer could potentially be extended to also include safety features that should keep the robot safe from hitting other vehicles.
Jan Steinmüller is a computer science student working in the computer networks and information security research group at Hochschule Bremen in Germany.
Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.
It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.
Project cSLAM – Simultaneous Localization and Mapping (SLAM) is a successful approach for robots to estimate their position and orientation in the world they operate in, while at the same time creating a representation of their surroundings.
This project, centralized SLAM (or cSLAM), enables a Duckiebot to localize itself, while the watchtowers and Duckiebots work together to build a map of the city. The task is achieved by using the camera of the Duckiebot, together with watchtowers located along the path, to detect AprilTags attached to the tiles, the traffic signs, and the Duckiebot itself.
P. S. Anatidaephilia, is Latin for loving, and being addicted to, the idea that somewhere, somehow, a duck is watching you.
cSLAM algorithm physical architecture for localizing Duckiebots. Watchtowers are traffic lights without lights, and are used to transform Duckietown in Autolabs.
Duckietown cSLAM logical architecture for merging sensor measurements from robots and the city.
AprilTag detections from Watchtowers and Duckiebots are harmonized by solving a minimization problem.
RViz reconstruction of experimental localization outcomes.
cSLAM Autolab watchtowers network diagnostic
cSLAM Project Results
(Turn on the sound for best experience!)
This work developed into a paper, check the article here.
Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.
It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.