Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Deep Reinforcement Learning (DRL) has become a powerful strategy to
solve complex decision making problems based on Deep Neural Networks (DNNs).

However, it is highly data demanding, so unfeasible in physical systems for most
applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback,
with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of
DNNs, but also has no need of a reward function (which sometimes implies the
need of external perception for computing rewards). We combine Deep Learning
with the COrrective Advice Communicated by Humans (COACH) framework, in
which non-expert humans shape policies by correcting the agent’s actions during
execution. The D-COACH framework has the potential to solve complex problems
without much data or time required. 

Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot),with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL.

Introduction

Deep Reinforcement Learning (DRL) has obtained unprecedented results in decisionmaking problems, such as playing Atari games [1], or beating the world champion inGO [2]. 

Nevertheless, in robotic problems, DRL is still limited in applications with
real-world systems [3]. Most of the tasks that have been successfully addressed with
DRL have two common characteristics: 1) they have well-specified reward functions, and 2) they require large amounts of trials, which means long training periods
(or powerful computers) to obtain a satisfying behavior. These two characteristics
can be problematic in cases where 1) the goals of the tasks are poorly defined or
hard to specify/model (reward function does not exist), 2) the execution of many
trials is not feasible (real systems case) and/or not much computational power or
time is available, and 3) sometimes additional external perception is necessary for
computing the reward/cost function.

On the other hand, Machine Learning methods that rely on transfer of human
knowledge, Interactive Machine Learning (IML) methods, have shown to be time efficient for obtaining good performance policies and may not require a well-specified
reward function; moreover, some methods do not need expert human teachers for
training high performance agents [4–6]. In previous years, IML techniques were
limited to work with low-dimensional state spaces problems and to the use of function approximation such as linear models of basis functions (choosing a right basis
function set was crucial for successful learning), in the same way as RL. But, as
DRL have showed, by approximating policies with Deep Neural Networks (DNNs)
it is possible to solve problems with high-dimensional state spaces, without the need
of feature engineering for preprocessing the states. If the same approach is used in
IML, the DRL shortcomings mentioned before can be addressed with the support of
human users who participate in the learning process of the agent.
This work proposes to extend the use of human corrective feedback during task
execution to learn policies with state spaces of low and high dimensionality in continuous action problems (which is the case for most of the problems in robotics)
using deep neural networks.

We combine Deep Learning (DL) with the corrective advice based learning
framework called COrrective Advice Communicated by Humans (COACH) [6],
thus creating the Deep COACH (D-COACH) framework. In this approach, no reward functions are needed and the amount of learning episodes is significantly reduced in comparison to alternative approaches. D-COACH is validated in three different tasks, two in simulations and one in the real-world.

Conclusions

This work presented D-COACH, an algorithm for training policies modeled with
DNNs interactively with corrective advice. The method was validated in a problem
of low-dimensionality, along with problems of high-dimensional state spaces like
raw pixel observations, with a simulated and a real robot environment, and also
using both simulated and real human teachers.

The use of the experience replay buffer (which has been well tested for DRL) was
re-validated for this different kind of learning approach, since this is a feature not
included in the original COACH. The comparisons showed that the use of memory
resulted in an important boost in the learning speed of the agents, which were able
to converge with less feedback, and to perform better even in cases with a significant
amount of erroneous signals.

The results of the experiments show that teachers advising corrections can train
policies in fewer time steps than a DRL method like DDPG. So it was possible
to train real robot tasks based on human corrections during the task execution, in
an environment with a raw pixel level state space. The comparison of D-COACH
with respect to DDPG, shows how this interactive method makes it more feasible
to learn policies represented with DNNs, within the constraints of physical systems.
DDPG needs to accumulate millions of time steps of experience in order to obtain

Did you find this interesting?

Read more Duckietown based papers here.

Duckietown: An open, inexpensive and flexible platform for autonomy education and research

Duckietown: An open, inexpensive and flexible platform for autonomy education and research

Duckietown is an open, inexpensive and flexible platform for autonomy education and research. The platform comprises small autonomous vehicles (“Duckiebots”) built from off-the-shelf components, and cities (“Duckietowns”) complete with roads, signage, traffic lights, obstacles, and citizens (duckies) in need of transportation. The Duckietown platform offers a wide range of functionalities at a low cost. Duckiebots sense the world with only one monocular camera and perform all processing onboard with a Raspberry Pi 2, yet are able to: follow lanes while avoiding obstacles, pedestrians (duckies) and other Duckiebots, localize within a global map, navigate a city, and coordinate with other Duckiebots to avoid collisions. Duckietown is a useful tool since educators and researchers can save money and time by not having to develop all of the necessary supporting infrastructure and capabilities. All materials are available as open source, and the hope is that others in the community will adopt the platform for education and research.

Did you find this interesting?

Read more Duckietown based papers here.

Learning autonomous systems — An interdisciplinary project-based experience

Learning autonomous systems — An interdisciplinary project-based experience

With the increased influence of automation into every part of our lives, tomorrow’s engineers must be capable working with autonomous systems. The explosion of automation and robotics has created a need for a massive increase in engineers who possess the skills necessary to work with twenty-first century systems. Autonomous Systems (MEEM4707) is a new senior/graduate level elective course with goals of: 1) preparing the next generation of skilled engineers, 2) creating new opportunities for learning and well informed career choices, 3) increasing confidence in career options upon graduation, and 4) connecting academic research to the students world. Presented in this paper is the developed curricula, key concepts of the project-based approach, and resources for other educators to implement a similar course at their institution. In the course, we cover the fundamentals of autonomous robots in a hands-on manner through the use of a low-cost mobile robot. Each student builds and programs their own robot, culminating in operation of their autonomous mobile robot in a miniature city environment. The concepts covered in the course are scalable from middle school through graduate school. Evaluation of student learning is completed using pre/post surveys, student progress in the laboratory environment, and conceptual examinations.

Did you find this interesting?

Read more Duckietown based papers here.

Deep Trail-Following Robotic Guide Dog in Pedestrian Environments for People who are Blind and Visually Impaired – Learning from Virtual and Real Worlds

Deep Trail-Following Robotic Guide Dog in Pedestrian Environments for People who are Blind and Visually Impaired - Learning from Virtual and Real Worlds

Navigation in pedestrian environments is critical to enabling independent mobility for the blind and visually impaired (BVI) in their daily lives. White canes have been commonly used to obtain contact feedback for following walls, curbs, or man-made trails, whereas guide dogs can assist in avoiding physical contact with obstacles or other pedestrians. However, the infrastructures of tactile trails or guide dogs are expensive to maintain. Inspired by the autonomous lane following of self-driving cars, we wished to combine the capabilities of existing navigation solutions for BVI users. We proposed an autonomous, trail-following robotic guide dog that would be robust to variances of background textures, illuminations, and interclass trail variations. A deep convolutional neural network (CNN) is trained from both the virtual and realworld environments. Our work included major contributions: 1) conducting experiments to verify that the performance of our models trained in virtual worlds was comparable to that of models trained in the real world; 2) conducting user studies with 10 blind users to verify that the proposed robotic guide dog could effectively assist them in reliably following man-made trails.

Did you find this interesting?

Read more Duckietown based papers here.

Integration of open source platform Duckietown and gesture recognition as an interactive interface for the museum robotic guide

Integration of open source platform Duckietown and gesture recognition as an interactive interface for the museum robotic guide

In recent years, population aging becomes a serious problem. To decrease the demand for labor when navigating visitors in museums, exhibitions, or libraries, this research designs an automatic museum robotic guide which integrates image and gesture recognition technologies to enhance the guided tour quality of visitors. The robot is a self-propelled vehicle developed by ROS (Robot Operating System), in which we achieve the automatic driving based on the function of lane-following via image recognition. This enables the robot to lead guests to visit artworks following the preplanned route. In conjunction with the vocal service about each artwork, the robot can convey the detailed description of the artwork to the guest. We also design a simple wearable device to perform gesture recognition. As a human machine interface, the guest is allowed to interact with the robot by his or her hand gestures. To improve the accuracy of gesture recognition, we design a two phase hybrid machine learning-based framework. In the first phase (or training phase), k-means algorithm is used to train historical data and filter outlier samples to prevent future interference in the recognition phase. Then, in the second phase (or recognition phase), we apply KNN (k-nearest neighboring) algorithm to recognize the hand gesture of users in real time. Experiments show that our method can work in real time and get better accuracy than other methods.

Did you find this interesting?

Read more Duckietown based papers here.

Hybrid control and learning with coresets for autonomous vehicles

Hybrid control and learning with coresets for autonomous vehicles

Modern autonomous systems such as driverless vehicles need to safely operate in a wide range of conditions. A potential solution is to employ a hybrid systems approach, where safety is guaranteed in each individual mode within the system. This offsets complexity and responsibility from the individual controllers onto the complexity of determining discrete mode transitions. In this work we propose an efficient framework based on recursive neural networks and coreset data summarization to learn the transitions between an arbitrary number of controller modes that can have arbitrary complexity. Our approach allows us to efficiently gather annotation data from the large-scale datasets that are required to train such hybrid nonlinear systems to be safe under all operating conditions, favoring underexplored parts of the data. We demonstrate the construction of the embedding, and efficient detection of switching points for autonomous and non-autonomous car data. We further show how our approach enables efficient sampling of training data, to further improve either our embedding or the controllers.

Did you find this interesting?

Read more Duckietown based papers here.

Towards blockchain-based robonomics: autonomous agents behavior validation

Towards blockchain-based robonomics: autonomous agents behavior validation

The decentralized trading market approach, where both autonomous agents and people can consume and produce services expanding own opportunities to reach goals, looks very promising as a part of the Fourth Industrial revolution. The key component of the approach is a blockchain platform that allows an interaction between agents via liability smart contracts. Reliability of a service provider is usually determined by a reputation model. However, this solution only warns future customers about an extent of trust to the service provider in case it could not execute any previous liabilities correctly. From the other hand a blockchain consensus protocol can additionally include a validation procedure that detects incorrect liability executions in order to suspend payment transactions to questionable service providers. The paper presents the validation methodology of a liability execution for agent-based service providers in a decentralized trading market, using the Model Checking method based on the mathematical model of finite state automata and Temporal Logic properties of interest. To demonstrate this concept, we implemented the methodology in the Duckietown application, moving an autonomous mobile robot to achieve a mission goal with the following behavior validation at the end of a completed scenario.

Did you find this interesting?

Read more Duckietown based papers here.

Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.

Did you find this interesting?

Read more Duckietown based papers here.

Duckietown: An Innovative Way to Teach Autonomy

Duckietown: An Innovative Way to Teach Autonomy

Teaching robotics is challenging because it is a multidisciplinary, rapidly evolving and experimental discipline that integrates cutting-edge hardware and software. This paper describes the course design and first implementation of Duckietown, a vehicle autonomy class that experiments with teaching innovations in addition to leveraging modern educational theory for improving student learning. We provide a robot to every student, thanks to a minimalist platform design, to maximize active learning; and introduce a role-play aspect to increase team spirit, by modeling the entire class as a fictional start-up (Duckietown Engineering Co.). The course formulation leverages backward design by formalizing intended learning outcomes (ILOs) enabling students to appreciate the challenges of: (a) heterogeneous disciplines converging in the design of a minimal self-driving car, (b) integrating subsystems to create complex system behaviors, and (c) allocating constrained computational resources. Students learn how to assemble, program, test and operate a self-driving car (Duckiebot) in a model urban environment (Duckietown), as well as how to implement and document new features in the system. Traditional course assessment tools are complemented by a full scale demonstration to the general public. The “duckie” theme was chosen to give a gender-neutral, friendly identity to the robots so as to improve student involvement and outreach possibilities. All of the teaching materials and code is released online in the hope that other institutions will adopt the platform and continue to evolve and improve it, so to keep pace with the fast evolution of the field.

Did you find this interesting?

Read more Duckietown based papers here.