Interpretable Reinforcement Learning for Visual Policies

Interpretable Reinforcement Learning for Visual Policies

General Information

Interpretable Reinforcement Learning for Visual Policies

Reinforcement Learning (RL) has enabled solving complex problems, especially in relation to visual perception in robotics. An outstanding challenges is that of allowing humans to make sense of the decision making process, so to enable deployment in safety-critical applications such as, e.g., autonomous driving. This work focuses on the problem of interpretable reinforcement learning in vision-based agents.

In particular, this research introduces a self-supervised framework for interpretable reinforcement learning in vision-based agents. The focus lies in enhancing policy interpretability by generating precise attention maps through Self-Supervised Attention Mechanisms (SSAM). 

The method does not rely on external labels and works using data generated by a pretrained RL agent. A self-supervised interpretable network (SSINet) is deployed to identify task-relevant visual features. The approach is evaluated across multiple environments, including Atari and Duckietown. 

Key components of the method include:

  • A two-stage training process using pretrained policies and frozen encoders
  • Attention masks optimized using behavior resemblance and sparsity constraints
  • Quantitative evaluation using FOR and BER metrics for attention quality
  • Comparative analysis with gradient and perturbation-based saliency methods
  • Application across various architectures and RL algorithms including PPO, SAC, and TD3

The proposed approach isolates relevant decision-making cues, offering insight into agent reasoning. In Duckietown, the framework demonstrates how visual interpretability can aid in diagnosing performance bottlenecks and agent failures, offering a scalable model for interpretable reinforcement learning in autonomous navigation systems.

Highlights - interpretable reinforcement learning for visual policies

Here is a visual tour of the implementation of interpretable reinforcement learning for visual policies by the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent’s decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. While several methods have attempted to interpret vision-based RL, most come without detailed explanation for the agent’s behavior. In this paper, we propose a self-supervised interpretable framework, which can discover interpretable features to enable easy understanding of RL agents even for non-experts. Specifically, a self-supervised interpretable network (SSINet) is employed to produce fine-grained attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown, which is a challenging self-driving car simulator environment. The results show that our method renders empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our method provides valuable insight into the internal decision-making process of vision-based RL. In addition, our method does not use any external labelled data, and thus demonstrates the possibility to learn high-quality mask through a self-supervised manner, which may shed light on new paradigms for label-free vision learning such as self-supervised segmentation and detection.

Conclusion - interpretable reinforcement learning for visual policies

Here is the conclusion according to the authors of this paper:

In this paper, we addressed the growing demand for human-interpretable vision-based RL from a fresh perspective. To that end, we proposed a general self-supervised interpretable framework, which can discover interpretable features for easily understanding the agent’s decision-making process. Concretely, a self-supervised interpretable network (SSINet) was employed to produce high-resolution and sharp attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. Then, our method was applied to render empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our work takes a significant step towards interpretable vision-based RL. Moreover, our method exhibits several appealing benefits. First, our interpretable framework is applicable to any RL model taking as input visual images. Second, our method does not use any external labelled data. Finally, we emphasize that our method demonstrates the possibility to learn high-quality mask through a self-supervised manner, which provides an exciting avenue for applying RL to self automatically labelling and label-free vision learning such as self-supervised segmentation and detection.

Did this work spark your curiosity?

Project Authors

Wenjie Shi received the BS degree from the School of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2016. He is currently working toward the Ph.D. degree in control science and engineering from the Department of Automation, Institute of Industrial Intelligence and Systems, Tsinghua University, Beijing, China.

Gao Huang (Member, IEEE) received the B.S. degree in automation from Beihang University, Beijing, China, in 2009, and the Ph.D. degree in automation from Tsinghua University, Beijing, in 2015. He is currently an Associate Professor with the Department of Automation, Tsinghua University.

Shiji Song (Senior Member, IEEE) received the Ph.D. degree in mathematics from the Department of Mathematics, Harbin Institute of Technology, Harbin, China, in 1996. He is currently a Professor at the Department of Automation, Tsinghua University, Beijing, China.

Zhuoyuan Wang (IEEE) is currently a Ph. D. student at Carnegie Mellon University, and holds a B.S. degree in control science and engineering in the Department of Automation, Tsinghua University, Beijing, China.

Tingyu Lin received the B.S. degree and the Ph.D. degree in control system from the School of Automation Science and Electrical Engineering at Beihang University in 2007 and 2014, respectively. He is now a Member of China Simulation Federation (CSF).

Cheng Wu received the M.Sc. degree in electrical engineering from Tsinghua University, Beijing, China, in 1966. He is currently a Professor with the Department of Automation, Tsinghua University.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Visual Feedback for Autonomous Navigation in Duckietown

Features for Efficient Autonomous Navigation in Duckietown

Features for Efficient Autonomous Navigation in Duckietown

Project Resources

Project highlights

Visual Feedback for Autonomous Navigation in Duckietown - the objectives

This project from students at TUM (Technische Universität of Munich) builds on the preexisting Duckietown autonomy stack to add/reintegrate/improve upon much-needed autonomous navigation features: improved control (pure pursuit instead of PID), red stop line detection, AprilTag detection, intersection navigation, and obstacle detection (using YOLO v3), making Duckietowns more complex and interesting!

The resulting agent includes modules for lane following, stop line detection, and intersection handling using AprilTags, following the legacy infrastructure of Duckietown.

The autonomy pipeline relies heavily on vision as the primary means of perception: lane edges are projected from image space to the ground plane using inverse perspective mapping learned after running a camera calibration procedure.

The Duckiebot then estimates a dynamic target point by offsetting yellow or white lane markers depending on visibility. The curvature is computed based on the geometric relation between the Duckiebot and the goal point, and the steering command is derived from this curvature.

The Duckiebot velocity and angular velocity are then modulated using a second-degree polynomial function based on detected path geometry.

Visual input from an onboard monocular camera is processed through a lane filter with adaptive Gaussian variance scaling relative to frame timing.

When running by an intersection, stop lines are detected using HSV color segmentation. AprilTag detection determines intersection decisions, with tag IDs mapped to turn directions.

Every module is implemented as an independent ROS package with dedicated launch files, coordinated via a central launch file. A YOLOv3 object detection model, trained on a custom Duckietown dataset, provides real-time obstacle recognition.

The challenges and approach

One major hurdle was integrating object detection models like Single-Shot Detector (SSD) and YOLO with the Duckiebot’s ROS-based camera system.

While the SSD model was trained on a custom Duckietown dataset, ROS publisher-subscriber mismatches prevented live inference. Transitioning to the YOLO model involved adapting annotation formats and re-training for compatibility with the YOLO architecture. In lane following, the default controller from Duckietown demos showed high deviation, prompting the implementation of a modified pure pursuit approach. 

Additional challenges arose from limited computational resources on the Duckiebot, with CPU overuse causing processing delays when running all modules concurrently. The approach focused on modular development, isolating lane following, stop line detection, and intersection navigation into separate ROS packages with fine-tuned parameters. The pure pursuit algorithm was adapted for ground-projected lane estimation, dynamic speed control, and target point calculation based on visible lane markers. Integration of AprilTag-based intersection logic and LED signaling provided directional control at intersections.

This structured, iterative methodology enabled real-time, vision-guided behavior while operating within the constraints.

Project Report

Did this work spark your curiosity?

Visual Feedback for Autonomous Navigation in Duckietown: Authors

Servesh Khandwe is currently working as a Software Engineer at Porsche Digital, Germany.

Ayush Kumar is currently working as a Research Assistant at Fraunhofer IIS, Germany.

Parth Karkar is currently working as an Analytical Consultant at Mutares SE & Co. KGaA, Germany.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

These spotlight projects are shared to exemplify Duckietown’s value for hands-on learning in robotics and AI, enabling students to apply theoretical concepts to practical challenges in autonomous robotics, boosting competence and job prospects.

Visual Feedback for Lane Tracking in Duckietown

Visual Feedback for Autonomous Lane Tracking in Duckietown

General Information

Visual Feedback for Autonomous Lane Tracking in Duckietown

How can vehicle autonomy be achieved by relying only on visual feedback from the onboard camera?

This work presents an implementation of lane following for the Duckietbot (DB17) using visual feedback as the only onboard sensor. The approach relies on real-time lane detection, and pose estimation, eliminating the need for wheel encoders.

The onboard computation is provided by a Raspberry Pi, which performs low-level motor control, while high-level image processing and decision-making are offloaded to an external ROS-enabled computer.

The key technical aspects of the implemented autonomy pipeline include:

  • Camera calibration to correct fisheye lens distortion;

  • HSV-based image segmentation for lane line detection;

  • Aerial perspective transformation for geometric consistency;

  • Histogram-based color separation of continuous and dashed lines;

  • Piecewise polynomial fitting for path curvature estimation;

  • Closed-loop motion control based on computed linear and angular velocities.

The methodology demonstrates the feasibility of using camera-based perception to control robot motion in structured environments. By using Duckiebot and Duckietown as the development platform, this work is another example of how to bridge the gap between real-world testing and cost-effective prototyping, making vehicle autonomy research more accessible in educational and research contexts.

Highlights - visual feedback for lane tracking in Duckietown

Here is a visual tour of the implementation of vehicle autonomy by the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

The autonomy of a vehicle can be achieved by a proper use of the information acquired with the sensors. Real-sized autonomous vehicles are expensive to acquire and to test on; however, the main algorithms that are used in those cases are similar to the ones that can be used for smaller prototypes. Due to these budget constraints, this work uses the Duckiebot as a testbed to try different algorithms as a first step to achieve full autonomy. This paper presents a methodology to properly use visual feedback, with the information of the robot camera, in order to detect the lane of a circuit and to drive the robot accordingly.

Conclusion - visual feedback for lane tracking in Duckietown

Here is the conclusion according to the authors of this paper:

Autonomous cars are currently a vast research area. Due to this increase in the interest of these vehicles, having a costeffective way to implement algorithms, new applications, and to test them in a controlled environment will further help to develop this technology. In this sense, this paper has presented a methodology for following a lane using a cost-effective robot, called the Duckiebot, using visual feedback as a guide for the motion. Although the whole system was capable of detecting the lane that needs to be followed, it is still sensitive to illumination conditions. Therefore, in places with a lot of lighting and brightness variations, the lane recognition algorithm can affect the autonomy of the vehicle.
As future work, machine learning, and particularly convolutional neural networks, is devised as a means to develop robust lane detectors that are not sensitive to brightness variation. Moreover, more than one Duckiebot is intended to drive simultaneously in the Duckietown.

Did this work spark your curiosity?

Project Authors

Oscar Castro is currently working at Blume, Peru.

Axel Eliam Céspedes Duran is currently working as a Laboratory Professor of the Industrial Instrumentation course at the UTEC – Universidad de Ingeniería y Tecnología, Peru.

Roosevelt Jhans Ubaldo Chavez is currently working as a Laboratory Professor of the Industrial Instrumentation course at the UTEC – Universidad de Ingeniería y Tecnología, Peru.

Oscar E. Ramos is currently working toward the Ph.D. degree in robotics with the Laboratory for Analysis and Architecture of Systems, Centre National de la Recherche Scientifique, University of Toulouse, Toulouse, France.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Pure pursuit gif compress

Pure Pursuit Lane Following with Obstacle Avoidance

Pure Pursuit Lane Following with Obstacle Avoidance

Project Resources

Project highlights

Pure Pursuit Controller with Dynamic Speed and Turn Handling
Pure Pursuit Controller with Dynamic Speed and Turn Handling
Duckiebot lane following with pure pursuit and obstacle avoidance using image processing in Duckietown
Pure Pursuit with Image Processing-Based Obstacle Detection
Duckiebots navigating curves in Duckietown using pure pursuit and obstacle avoidance with onboard object detection
Duckiebots Avoiding Obstacles with Pure Pursuit Control

Pure Pursuit Lane Following with Obstacle Avoidance - the objectives

Pure pursuit is a geometric path tracking algorithm used in autonomous vehicle control systems. It calculates the curvature of the road ahead by determining a target point on the trajectory and computing the required angular velocity to reach that point based on the vehicle’s kinematics.

Unlike proportional integral derivative (PID) control, which adjusts control outputs based on continuous error correction, pure pursuit uses a lookahead point to guide the vehicle along a trajectory, enabling stable convergence to the path without oscillations. This method avoids direct dependency on derivative or integral feedback, reducing complexity in environments with sparse or noisy error signals.

This project aims to implement a pure pursuit-based lane following system integrated with obstacle avoidance for autonomous Duckiebot navigation. The goal is to enable real-time tracking of lane centerlines while maintaining safety through detection and response to dynamic obstacles such as other Duckiebots or cones.

The pipeline includes a modified ground projection system, an adaptive pure pursuit controller for path tracking, and both image processing and deep learning-based object detection modules for obstacle recognition and avoidance.

The challenges and approach

The primary challenges in this project include robust target point estimation under variable lighting and environmental conditions, real-time object detection with limited computational resources, and smooth trajectory control in the presence of dynamic obstacles.

The approach involves modular integration of perception, planning, and control subsystems.

For perception, the system uses both classical image processing methods and a trained deep learning model for object detection, enabling redundancy and simulation compatibility.

For planning and control, the pure pursuit controller dynamically adjusts speed and steering based on the estimated target point and obstacle proximity. Target point estimation is achieved through ground projection, a transformation that maps image coordinates to real-world planar coordinates using a calibrated camera model. Real-time parameter tuning and feedback mechanisms are included to handle variations in frame rate and sensor noise.

Obstacle positions are also ground-projected and used to trigger stop conditions within a defined safety zone, ensuring collision avoidance through reactive control.

Looking for similar projects?

Pure Pursuit Lane Following with Obstacle Avoidance: Authors

Soroush Saryazdi is currently leading the Neural Networks team at Matic, supervised by Navneet Dalal.

Dhaivat Bhatt is currently working as a Machine learning research engineer at Samsung AI centre, Toronto.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

These spotlight projects are shared to exemplify Duckietown’s value for hands-on learning in robotics and AI, enabling students to apply theoretical concepts to practical challenges in autonomous robotics, boosting competence and job prospects.

Reproducible Sim-to-Real Traffic Signal Control Environment

Reproducible Sim-to-Real Traffic Signal Control Environment

General Information

Reproducible Sim-to-Real Traffic Signal Control Environment

As urban environments become increasingly populated and automobile traffic soars, with US citizens spending on average 54 hours a year stuck on the roads, active traffic control management promises to mitigate traffic jams while maintaining (or improving) safety. 

LibSignal++ is a Duckietown-based testbed for reproducible and low-cost sim-to-real evaluation of traffic signal control (TSC) algorithms. Using Duckietown enables consistent, small-scale deployment of both rule-based and learning-based TSC models.

LibSignal++ integrates visual control through camera-based sensing and object detection via the YOLO-v5 model. It features modular components, including Duckiebots, signal controllers, and an indoor positioning system for accurate vehicle trajectory tracking. The testbed supports dynamic scenario replication by enabling both manual and automated manipulation of sensor inputs and road layouts.

Key aspects of the research include:

  • Sim-to-real pipeline for Reinforcement Learning (RL)-based traffic signal control training and deployment
  • Multi-simulator training support with SUMO, CityFlow, and CARLA
  • Reproducibility through standardized and controllable physical components
  • Integration of real-world sensors and visual control systems
  • Comparative evaluation using rule-based policies on 3-way and 4-way intersections

The work concludes with plans to extend to Machine Learning (ML)-based TSC models and further sim-to-real adaptation.

Highlights - Reproducible Sim-to-Real Traffic Signal Control Environment

Here is a visual tour of the sim-to-real work of the authors. For all the details, check out the full paper.

Abstract

Here is the abstract of the work, directly in the words of the authors:

This paper presents a unique sim-to-real assessment environment for traffic signal control (TSC), LibSignal++, featuring a 14-ft by 14-ft scaled-down physical replica of a real-world urban roadway equipped with realistic traffic sensors such as cameras, and actual traffic signal controllers. Besides, it is supported by a precise indoor positioning system to track the actual trajectories of vehicles. To generate various plausible physical conditions that are difficult to replicate with computer simulations, this system supports automatic sensor manipulation to mimic observation changes and also supports manual adjustment of physical traffic network settings to reflect the influence of dynamic changes on vehicle behaviors. This system will enable the assessment of traffic policies that are otherwise extremely difficult to simulate or infeasible for full-scale physical tests, providing a reproducible and low-cost environment for sim-to-real transfer research on traffic signal control problems.

Results

Three traffic control policies were tested over a number of experiment repetitions, evaluating each time traffic throughput, average vehicle waiting times, and vehicle battery consumption.  Standard deviations for all policies were found to be within acceptable ranges, leading the authors to confirm the ability of the testbed to deliver reproducible results within controlled environments.

TSC policies test
Did this work spark your curiosity?

Project Authors

Yiran Zhang is associated with the Arizona State University, USA.

Khoa Vo is associated with the Arizona State University, USA.

Longchao Da is pursuing his Ph.D. at the Arizona State University, USA.

Tiejin Chen is pursuing his Ph.D. at the Arizona State University, USA.

Xiaoou Liu is pursuing her Ph.D. at the Arizona State University, USA.

Hua Wei is an Assistant Professor at the School of Computing and Augmented Intelligence, Arizona State University, USA.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Autonomous Navigation System Development in Duckietown

Autonomous Navigation System Development in Duckietown

Autonomous Navigation System Development in Duckietown

Project Resources

Project highlights

Autonomous Navigation System Development in Duckietown - the objectives

The primary objective of this project is to develop and refine an Autonomous Navigation System within the Duckietown environment, leveraging ROS-based control and computer vision to enable reliable lane following and safe intersection navigation. This includes calibrating sensor inputs, particularly from the camera, IMU, and encoders, and integrating advanced algorithms such as Dijkstra algorithm for optimal path planning. The project aims to ensure that the Duckiebot can autonomously detect lanes, stop lines, and obstacles while dynamically computing the shortest path to any designated point within the mapped environment. Additionally, the system is designed to transition smoothly between operational states (lane following, intersection handling, and recovery) using a refined Finite State Machine approach, all while maintaining robust communication within the ROS ecosystem.

Project Report

The challenges and approach

The project faced several challenges, beginning with hardware constraints, such as the physical limitations of wheel traction and battery lifespan, which affected motion stability and operational time. The integration of various ROS packages, some with incomplete documentation and inconsistent coding practices, complicated the development of a reliable and maintainable codebase. The method adopted involved precise sensor calibration to ensure accurate perception and control, incorporating camera intrinsic and extrinsic calibration for improved visual data interpretation, and adjusting wheel parameters to maintain balanced motion. The lane following module required parameter tuning for gain, trim, and heading correction to adapt to Duckietown’s environment. The original FSM-based intersection navigation system was re-engineered due to unreliability in node transitions, replaced with a distance-based approach for intersection stops and turns, ensuring deterministic and reliable behavior. Dijkstra’s algorithm was implemented to create a structured graph representation of the city map, enabling dynamic path planning that adapts to real-time inputs from the perception system. Custom web dashboards built with React.js and roslibjs facilitated monitoring and debugging by providing live data feedback and control interfaces. Through this rigorous and iterative process, the project achieved a robust autonomous navigation system capable of precise path planning and safe maneuvering within Duckietown.

Did this work spark your curiosity?

Autonomous Navigation System Development in Duckietown: Authors

Julien-Alexandre Bertin Klein is currently a Bachelor of Science (BSc.), Information Engineering at the Technical University of Munich, Germany.

Andrea Pellegrin is currently a Bachelor of Science (BSc.), Information Engineering at the Technical University of Munich, Germany.

Fathia Ismail is currently a Bachelor of Science (BSc.), Information Engineering at the Technical University of Munich, Germany.

Learn more

Duckietown is a modular, customizable, and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

Duckietown is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

These spotlight projects are shared to exemplify Duckietown’s value for hands-on learning in robotics and AI, enabling students to apply theoretical concepts to practical challenges in autonomous robotics, boosting competence and job prospects.

PID Control Lane Following in Duckietown

Autonomous Navigation and Parking in Duckietown

Autonomous Navigation and Parking in Duckietown

Project Resources

Project highlights

Static parameters in a dynamic environment are pre-programmed failure points.

Autonomous Navigation and Parking in Duckietown: the objectives

This includes the development of a closed-loop PID control mechanism for continuous lane following, the use of AprilTag detection for intersection decision-making, and a state-driven behavior architecture to transition between tasks such as stopping, turning, and parking. 

The system uses wheel encoder data for dead-reckoning-based motion execution in the absence of visual cues, and applies HSV-based color segmentation to detect and respond to static and dynamic obstacles. Visual servoing is used for parking alignment based on AprilTag localization. The control logic is modular and supports parameter tuning for hardware variability, with temporal filtering to suppress redundant detections and ensure stability.