Real-Time Reinforcement Learning in Duckiematrix

Project Resources

Objective: Evaluate how computation delays affect Reinforcement Learning performance in Duckiematrix autonomous driving tasks.
Approach: Simulate real-time delays in Duckietown and compare classical RL with action-conditioned Real-Time RL policies.
Authors: Guillaume Gagné-Labelle, Gabriel Sasseville and Nicolas Bosteels

Reinforcement Learning in Duckiematrix Real-Time - objectives and approach

The objective of this project is to evaluate Reinforcement Learning performance in Duckiematrix under real-time constraints in Duckietown by quantifying the impact of computation delay on policy performance, reward, and episode length in autonomous driving tasks.

The approach implements Soft Actor-Critic (SAC) based Reinforcement Learning models in Duckiematrix simulation, introduces controlled fixed and variable time delays in the environment loop, and compares classical Reinforcement Learning policies π(at | st) with action-conditioned Real-Time Reinforcement Learning policies π(at | st−1, at−1) using evaluation reward, reward variance, and episode length metrics across multiple delay distributions.

Reinforcement Learning in Duckiematrix Real-Time - highlights

Reinforcement Learning sampling time distribution in Duckiematrix SAC model showing inference latency variability — Figure 1: Reinforcement Learning SAC Sampling Time Distribution

Reinforcement Learning reward vs delay in Duckiematrix showing performance degradation — Figure 2: Reward vs Delay Classical RL

Reinforcement Learning training curves in Duckiematrix under delay conditions — Figure 3: Learning Curves Classical RL

Reinforcement Learning episode length vs delay in Duckiematrix simulation — Figure 4: Episode Length vs Delay - Classical RL

Reinforcement Learning performance distribution in Duckiematrix with delay variability — Figure 5: Performance Distributions - Classical RL

Reinforcement Learning loss convergence in Duckiematrix classical RL training — Figure 6: Loss Convergence - Classical RL

Reinforcement Learning reward vs delay in Duckiematrix real-time RL with action conditioning — Figure 7: Reward vs Delay - Real-Time RL Action Conditioning

Reinforcement Learning learning curves in Duckiematrix real-time RL training — Figure 8: Learning Curves - Real-Time RL Action Conditioning

Reinforcement Learning episode length vs delay in Duckiematrix real-time RL — Figure 9: Episode Length vs Delay - Real-Time RL

Reinforcement Learning performance distribution in Duckiematrix real-time RL — Figure 10: Performance Distributions - Real-Time RL

Reinforcement Learning loss convergence in Duckiematrix real-time RL training — Figure 11: Loss Convergence - Real-Time RL

Reinforcement Learning reward comparison in Duckiematrix classical vs real-time RL — Figure 12: Side-by-side comparison of reward vs delay for both approaches

Reinforcement Learning episode length comparison in Duckiematrix across RL methods — Figure 13: Episode length comparison

Reinforcement Learning training curves comparison in Duckiematrix classical vs real-time — Figure 14: Training progress comparison

Reinforcement Learning distribution comparison in Duckiematrix classical vs real-time RL — Figure 15: Performance distribution comparison

Reinforcement Learning loss comparison in Duckiematrix classical vs real-time RL — Figure 16: Loss convergence patterns

Reinforcement Learning performance vs mean delay in Duckiematrix variable delay experiment — Figure 17: Performance vs Mean Delay lines = delay regime

Reinforcement Learning performance delta vs baseline in Duckiematrix delay variability — Figure 18: Absolute Difference vs Fixed Baseline lines = delay regime

The challenges

The challenges in this project involve modeling Reinforcement Learning in Duckiematrix under real-time constraints in Duckietown where computation delay violates the Markov Decision Process assumption of instantaneous state–action transitions, resulting in state–action mismatch, policy instability, and reward degradation. Fixed and variable delay distributions introduce non-stationarity, increased variance in evaluation reward, and failure modes at higher delays (≥0.1s for classical RL and ≥1.0s for both methods), while stochastic latency from neural network inference impacts policy execution timing, convergence behavior, and sample efficiency.

Additional challenges include maintaining stability in Soft Actor-Critic training under delayed feedback, handling missing training metrics, ensuring robustness across delay distributions, and evaluating performance using consistent metrics such as reward, variance, and episode length across multiple experimental conditions.

Looking for similar projects?

Check out the following works on sim-to-real with Duckietown:

Real-Time Reinforcement Learning in Duckiematrix: authors

Gabriel Sasseville is a Ph.D. student at Mila Institute in Montreal, Canada.

Guillaume Gagné-Labelle is a collaborating researcher at Mila Institute in Montreal, Canada.

Nicolas Bosteels is a Master’s Research student at Mila Institute in Montreal, Canada.

Learn more

Duckietown is a modular, customizable platform for robotics and artificial intelligence education, enabling hands-on learning and real-world experimentation with autonomous systems.

Designed for teaching, learning, and research, Duckietown supports the full spectrum of autonomy development, from foundational computer science and robotics concepts to advanced AI and self-driving systems research.

These spotlight projects are shared to demonstrate how Duckietown bridges theory and practice in robotics and AI, empowering students to apply machine learning and autonomy techniques to physical robots while building practical skills valued in academic research and industry.