General Information

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer
András Béres and Bálint Gyires-Tóth
Budapest University of Technology and Economics, Hungary
Béres, András and Gyires-Tóth, Bálint (2023) Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer. INFOCOMMUNICATIONS JOURNAL, 15 (1). pp. 15-25. ISSN 2061-2079

Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

One of the classical objections made to machine learning approaches to embeddded autonomy (i.e., to create agents that are deployed on real, physical, robots) is that training requires data, data requires experiement, and experiment are “expensive” (time, money, etc.).

The natural counter argument to this is to use simulation to create the training data, because simulations are much less expensive than real world experiment; they can be ran continuously, with accellerated time, don’t require supervision, nobody gets tired, etc.

But, as the experienced roboticist knows, “simulations are doomed to succeed”. This phrase encapsulates the notion that simulations do not contain the same wealth if information as the real world, because they are programmed to be what the programmer wants them to be useful for – they do not capture the complexity of the real world. Eventually things will “work” in simulation, but does that mean they will “work” in the real-world, too?

As Carl Sagan once said: “If you wish to make an applie pie from scratch, you must first reinvent the universe”.

Domain randomization is an approach to mitigate the limitations of simulations. Instead of training an agent on one set of parameters defining the simulation, many simulations are instead ran, with different values of this parameters. E.g., in the context of a driving simulator like Duckietown, one set of parameters could make the sky purple instead of blue, or the lane markings have slightly different geometric properties, etc. The idea behind this approach is that the agent will be trained on a distribution of datasets that are all slightly different, hopefully making the agent more robust to real world nuisances once deployed in a physical body.

In this paper, the authors investigate specifically visual domain randomization.

Learn about RL, navigation, and other robot autonomy topics at the link below!

Abstract

In order to train reinforcement learning algorithms, a significant amount of experience is required, so it is common practice to train them in simulation, even when they are intended to be applied in the real world. To improve robustness, camerabased agents can be trained using visual domain randomization, which involves changing the visual characteristics of the simulator between training episodes in order to improve their resilience to visual changes in their environment.

In this work, we propose a method, which includes realworld images alongside visual domain randomization in the reinforcement learning training procedure to further enhance the performance after sim-to-real transfer. We train variational autoencoders using both real and simulated frames, and the representations produced by the encoders are then used to train reinforcement learning agents.

The proposed method is evaluated against a variety of baselines, including direct and indirect visual domain randomization, end-to-end reinforcement learning, and supervised and unsupervised state representation learning.

By controlling a differential drive vehicle using only camera images, the method is tested in the Duckietown self-driving car environment. We demonstrate through our experimental results that our method improves learnt representation effectiveness and robustness by achieving the best performance of all tested methods.

Highlights - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here is a visual tour of the work of the authors. For more details, check out the full paper.

Example frames demonstrating visual domain randomization in the Duckietown Gym self-driving simulation environment, featuring varied visual characteristics to enhance training robustness. — Figure 1. Example frames with visual domain randomization from the Duckietown Gym self-driving simulated environment.

Diagram illustrating the proposed method that combines real images with visually randomized simulated images in unsupervised state representation learning, followed by training a control agent with reinforcement learning using a pretrained encoder network. — Figure 2. High level overview of the proposed method.

A block diagram showing the performance of direct versus invariance regularized domain randomization and supervised versus self-supervised state representation learning methods. — Figure 3. Benchmarked Baseline Methods for Domain Randomization and State Representation Learning.

Diagram showing the preprocessing pipeline. — Figure 4. An illustration of the preprocessing pipeline.

Images from the offline dataset displaying three distinct renderings of the same scene, showcasing varied visual perspectives. — Figure 5. Samples from the Offline Dataset with Three Different Renderings for Each Scene.

Table of evaluation results in the simulator without visual domain randomization, highlighting the proposed method in bold at the bottom row and underlining completion ratios above 70%. — Table 1. Evaluation Results in the Simulator Without Visual Domain Randomization.

Table showing evaluation results in the simulator with visual domain randomization, underlining completion ratios above 70% and highlighting the proposed method in bold at the bottom row. — Table 2. Evaluation Results in the Simulator with Visual Domain Randomization.

Table displaying evaluation results from real-world testing, underlining survival times of 20 or more and featuring the proposed method in bold at the bottom row. — Table 3. Evaluation Results in Real-World Testing.

Conclusion - Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer

Here are the conclusions from the authors of this paper:

“In this work we proposed a novel method for learning effective image representations for reinforcement learning, whose core idea is to train a variational autoencoder using visually randomized images from the simulator, but include images from the real world as well, as if it was just another visually different version of the simulator.

We evaluated the method in the Duckietown self-driving environment on the lane-following task, and our experimental results showed that the image representations of our proposed method improved the performance of the tested reinforcement learning agents both in simulation and reality. This demonstrates the effectiveness and robustness of the representations learned by the proposed method. We benchmarked our method against a wide range of baselines, and the proposed method performed among the best in all cases.

Our experiments showed that using some type of visual domain randomization is necessary for a successful simto- real transfer. Variational autoencoder-based representations tended to outperform supervised representations, and both outperformed representations learned during end-to-end reinforcement learning. Also, for visual domain randomization, when using no real images, invariance regularization-based methods seemed to outperform direct methods. Based on our results, we conclude that including real images in simulation-based reinforcement learning trainings is able to enhance the real world performance of the agent – when using the two-stage approach, proposed in this paper.”

Project Authors

András Béres is currently working as a Junior Deep Learning Engineer at Continental, Hungary.

Bálint Gyires-Tóth is an associate professor at
Budapest University of Technology and Economics, Hungary.

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.