General Information
- Title: Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
- Authors: Wenjie Shi, Gao Huang, Shiji Song, Zhuoyuan Wang, Tingyu Lin, Cheng Wu
- Institution: Tsinghua University, Beijing, China
- Citation: W. Shi, G. Huang, S. Song, Z. Wang, T. Lin and C. Wu, "Self-Supervised Discovering of Interpretable Features for Reinforcement Learning," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2712-2724, 1 May 2022, doi: 10.1109/TPAMI.2020.3037898.
Interpretable Reinforcement Learning for Visual Policies














Reinforcement Learning (RL) has enabled solving complex problems, especially in relation to visual perception in robotics. An outstanding challenges is that of allowing humans to make sense of the decision making process, so to enable deployment in safety-critical applications such as, e.g., autonomous driving. This work focuses on the problem of interpretable reinforcement learning in vision-based agents.
In particular, this research introduces a self-supervised framework for interpretable reinforcement learning in vision-based agents. The focus lies in enhancing policy interpretability by generating precise attention maps through Self-Supervised Attention Mechanisms (SSAM).
The method does not rely on external labels and works using data generated by a pretrained RL agent. A self-supervised interpretable network (SSINet) is deployed to identify task-relevant visual features. The approach is evaluated across multiple environments, including Atari and Duckietown.
Key components of the method include:
- A two-stage training process using pretrained policies and frozen encoders
- Attention masks optimized using behavior resemblance and sparsity constraints
- Quantitative evaluation using FOR and BER metrics for attention quality
- Comparative analysis with gradient and perturbation-based saliency methods
- Application across various architectures and RL algorithms including PPO, SAC, and TD3
The proposed approach isolates relevant decision-making cues, offering insight into agent reasoning. In Duckietown, the framework demonstrates how visual interpretability can aid in diagnosing performance bottlenecks and agent failures, offering a scalable model for interpretable reinforcement learning in autonomous navigation systems.
Highlights - interpretable reinforcement learning for visual policies
Here is a visual tour of the implementation of interpretable reinforcement learning for visual policies by the authors. For all the details, check out the full paper.
Abstract
Here is the abstract of the work, directly in the words of the authors:
Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent’s decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. While several methods have attempted to interpret vision-based RL, most come without detailed explanation for the agent’s behavior. In this paper, we propose a self-supervised interpretable framework, which can discover interpretable features to enable easy understanding of RL agents even for non-experts. Specifically, a self-supervised interpretable network (SSINet) is employed to produce fine-grained attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown, which is a challenging self-driving car simulator environment. The results show that our method renders empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our method provides valuable insight into the internal decision-making process of vision-based RL. In addition, our method does not use any external labelled data, and thus demonstrates the possibility to learn high-quality mask through a self-supervised manner, which may shed light on new paradigms for label-free vision learning such as self-supervised segmentation and detection.
Conclusion - interpretable reinforcement learning for visual policies
Here is the conclusion according to the authors of this paper:
In this paper, we addressed the growing demand for human-interpretable vision-based RL from a fresh perspective. To that end, we proposed a general self-supervised interpretable framework, which can discover interpretable features for easily understanding the agent’s decision-making process. Concretely, a self-supervised interpretable network (SSINet) was employed to produce high-resolution and sharp attention masks for highlighting task-relevant information, which constitutes most evidence for the agent’s decisions. Then, our method was applied to render empirical evidences about how the agent makes decisions and why the agent performs well or badly, especially when transferred to novel scenes. Overall, our work takes a significant step towards interpretable vision-based RL. Moreover, our method exhibits several appealing benefits. First, our interpretable framework is applicable to any RL model taking as input visual images. Second, our method does not use any external labelled data. Finally, we emphasize that our method demonstrates the possibility to learn high-quality mask through a self-supervised manner, which provides an exciting avenue for applying RL to self automatically labelling and label-free vision learning such as self-supervised segmentation and detection.
Did this work spark your curiosity?
Check out the following works on vehicle autonomy with Duckietown:
Project Authors
Wenjie Shi received the BS degree from the School of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2016. He is currently working toward the PhD degree in control science and engineering from the Department of Automation, Institute of Industrial Intelligence and System, Tsinghua University, Beijing, China.
Gao Huang (Member, IEEE) received the BS degree in automation from Beihang University, Beijing, China, in 2009, and the PhD degree in automation from Tsinghua University, Beijing, in 2015. He is currently an associate professor with the Department of Automation, Tsinghua University.
Shiji Song (Senior Member, IEEE) received the PhD degree in mathematics from the Department of Mathematics, Harbin Institute of Technology, Harbin, China, in 1996. He is currently a professor at the Department of Automation, Tsinghua University, Beijing, China.
Zhuoyuan Wang is currently working toward the BS degree in control science and engineering in the Department of Automation, Tsinghua University, Beijing, China.
Tingyu Lin received the B.S. degree and the Ph.D. degree in control system from School of Automation Science and Electrical Engineering in Beihang University in 2007 and 2014, respectively. He is now a Member of China Simulation Federation (CSF).
Cheng Wu received the M.Sc. degree in electrical engineering from Tsinghua University, Beijing, China, in 1966. He is currently a Professor with the Department of Automation, Tsinghua University.
Learn more
Duckietown is a platform for creating and disseminating robotics and AI learning experiences.
It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.