Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Deep Reinforcement Learning (DRL) has become a powerful strategy to
solve complex decision making problems based on Deep Neural Networks (DNNs).

However, it is highly data demanding, so unfeasible in physical systems for most
applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback,
with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of
DNNs, but also has no need of a reward function (which sometimes implies the
need of external perception for computing rewards). We combine Deep Learning
with the COrrective Advice Communicated by Humans (COACH) framework, in
which non-expert humans shape policies by correcting the agent’s actions during
execution. The D-COACH framework has the potential to solve complex problems
without much data or time required. 

Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot),with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL.

Introduction

Deep Reinforcement Learning (DRL) has obtained unprecedented results in decisionmaking problems, such as playing Atari games [1], or beating the world champion inGO [2]. 

Nevertheless, in robotic problems, DRL is still limited in applications with
real-world systems [3]. Most of the tasks that have been successfully addressed with
DRL have two common characteristics: 1) they have well-specified reward functions, and 2) they require large amounts of trials, which means long training periods
(or powerful computers) to obtain a satisfying behavior. These two characteristics
can be problematic in cases where 1) the goals of the tasks are poorly defined or
hard to specify/model (reward function does not exist), 2) the execution of many
trials is not feasible (real systems case) and/or not much computational power or
time is available, and 3) sometimes additional external perception is necessary for
computing the reward/cost function.

On the other hand, Machine Learning methods that rely on transfer of human
knowledge, Interactive Machine Learning (IML) methods, have shown to be time efficient for obtaining good performance policies and may not require a well-specified
reward function; moreover, some methods do not need expert human teachers for
training high performance agents [4–6]. In previous years, IML techniques were
limited to work with low-dimensional state spaces problems and to the use of function approximation such as linear models of basis functions (choosing a right basis
function set was crucial for successful learning), in the same way as RL. But, as
DRL have showed, by approximating policies with Deep Neural Networks (DNNs)
it is possible to solve problems with high-dimensional state spaces, without the need
of feature engineering for preprocessing the states. If the same approach is used in
IML, the DRL shortcomings mentioned before can be addressed with the support of
human users who participate in the learning process of the agent.
This work proposes to extend the use of human corrective feedback during task
execution to learn policies with state spaces of low and high dimensionality in continuous action problems (which is the case for most of the problems in robotics)
using deep neural networks.

We combine Deep Learning (DL) with the corrective advice based learning
framework called COrrective Advice Communicated by Humans (COACH) [6],
thus creating the Deep COACH (D-COACH) framework. In this approach, no reward functions are needed and the amount of learning episodes is significantly reduced in comparison to alternative approaches. D-COACH is validated in three different tasks, two in simulations and one in the real-world.

Conclusions

This work presented D-COACH, an algorithm for training policies modeled with
DNNs interactively with corrective advice. The method was validated in a problem
of low-dimensionality, along with problems of high-dimensional state spaces like
raw pixel observations, with a simulated and a real robot environment, and also
using both simulated and real human teachers.

The use of the experience replay buffer (which has been well tested for DRL) was
re-validated for this different kind of learning approach, since this is a feature not
included in the original COACH. The comparisons showed that the use of memory
resulted in an important boost in the learning speed of the agents, which were able
to converge with less feedback, and to perform better even in cases with a significant
amount of erroneous signals.

The results of the experiments show that teachers advising corrections can train
policies in fewer time steps than a DRL method like DDPG. So it was possible
to train real robot tasks based on human corrections during the task execution, in
an environment with a raw pixel level state space. The comparison of D-COACH
with respect to DDPG, shows how this interactive method makes it more feasible
to learn policies represented with DNNs, within the constraints of physical systems.
DDPG needs to accumulate millions of time steps of experience in order to obtain

Did you find this interesting?

Read more Duckietown based papers here.

Duckietown in Ghana – Teaching robotics to brilliant students

July 2018: Vincent Mai travels from Canada to teach a 2-week Duckietown class to some of the brightest high school students in Ghana.

The email – Montreal, January 2018

On the morning of January 29th, 2018, I received an email. It was a call for international researchers to mentor for two weeks a small group of teenagers that will have been selected among the brightest of Ghana. Robotics was one of the possible topics.

At 4 pm, I had applied.

I was lucky enough to grow up in a part of the world where sciences are available to children. I spent summers in Polytechnique Montreal, playing with electro-magnets and making rockets fly with vinegar and baking soda. I also remember visiting the MIT Museum in Boston, where I was impressed by the bio-inspired swimming robots. There is no doubt that these activities encouraged 17-years-old me to choose physics engineering as my bachelor studies, which then turned into robotics at the graduate level.

The MISE Foundation

The call from the MISE Foundation was a triple opportunity.

First, I could transmit the passion I was given when I was their age. Second, I would participate, in my small, modest way, in the reduction of education inequalities between developing an developed countries. Countries like Ghana can only benefit from brilliant Ghanaians considering maths, computer science or robotics as a career.

Finally, it was an unique opportunity for me to discover and learn, from people living in an environment that is totally different from mine, with other values, objectives and challenges. It is not everyday you can spend two weeks in Ghana.

After some exchanges with Joel, the organizer, with motivation letters, project plan and visa paperwork, it was decided: I was going to Accra from July 20th to August 6th.

The preparation – Montreal, June 2018

My specialty is working with autonomous mobile robots: this is what I wanted to teach. I was going to see the brightest young minds of a whole country. I needed to challenge them: I could not go there with a drag-and-drop programmed Lego.

I chose an option that was close to me. Duckietown is a project-based graduate course given at Université de Montréal by my PhD supervisor, Prof. Liam Paull. It allows students to learn the challenges of autonomous vehicles by having miniature cars run in a controlled environment. A Duckiebot is a simple 2-wheel car commanded by a Raspberry Pi. Its only sensor is a camera.

Along with my proximity with Duckietown, I chose it because making a Duckiebot drive autonomously is a very concrete problem, which involves a lot of interesting concepts: computer vision, localization, control, and integration of all these on a controller. Also, for teenagers, the Duckie is a great mascot.

I had not yet taken the Duckietown course. Preparing took me one month and a half of installing, reverse engineering, and documenting. The objective I designed for the kids? Having a Duckiebot named Moose follow the lanes with a constant speed, without getting out of the road or crossing the middle line.

It was inspired from a demo that was already implemented in the Duckiebot. I could not ask the kids to implement the whole code, so I cut out only the most critical parts of it. I also wrote presentations, exercises, planning each of the 10 days we would spend together, 6 hours a day. I packed the sport mats to do the road, a couple of extra pieces in case something broke, and the print-outs of the presentations. I was ready.

Packed Duckietown

Or, I hoped I was. It was not simple to adapt the contents of a graduate course for kids of whom I had no idea of the math and programming level. Did they know how to multiply matrices? What about Bayes law? Can I ask them to use Numpy? When I asked advice to Liam, he told me with a smile: “I guess you’ll have to take the go with the flow…”

The building – Accra, August 2018

Accra is a large city, spread along the shore of the Atlantic Ocean. Its people are particularly smiling and welcoming. The Lincoln Community School, a private institution hosting the MISE Foundation summer school, has beautiful and calm facilities which allowed us to give the classes in a proper environment. There were 24 children in total: 12 were training for the International Maths Olympics with two mentors, while three teams of 4 students would work with a mentor on projects like mine. The two other projects were adversarial attacks on image classifiers and stereo vision.

The first two days, we did maths. I tested their level: they did not know most of what was necessary to go on. Vector operations, integrals, probabilities… We went through these in a very short time: they amazed me by the speed at which they understood.

For the next five days, we went through the project setup. We started simple, understanding how we can drive the Duckiebot with a joystick. We had to setup Moose, discover ROS, and use it to send commands to the motors.

We followed with the real project: autonomous mobile robotics.

  • See-Think-Act cycle;

  • computer vision for line extraction, from RGB images to Canny edge detection and Hough transform;

  • camera calibration for ground projection, from image sensors to homography matrix;

  • Bayesian estimator for localization, with dynamic prediction and measurement update;

  • and finally, proportional control for outputting the right commands to the wheels.

Building Moose

Moose the Duckiebot, up and running!

For each of these steps, the students wrote their version of the code. Then, we made a final version together that we implemented in Moose.

The experiments – Accra, August 2018

In the two next days, the students had to think what they would do for their research projects. The experiments would be done together but the projects should be individual. Each of them decided to focus on one aspect of autonomous cars. Kwadwo decided to go for speed: he tested the limits of the car as if it was an autonomous ambulance. Abrahim was more concerned about safety: was Moose better than humans at driving? Oheneba thought about the reduction glasshouse gas emissions and William about lowering the traffic. In both cases, they argued that if autonomous cars could improve the situation, they first had to be accepted by humans and therefore be safe and reliable. They tested Moose in differently lit scenes, with white sheets on the road (snow) or with a slightly wrong wheel calibration, to see how it would cope with these conditions.

On the last day, they individually presented their research to a committee formed by the three project mentors. We asked them difficult questions for 15 minutes, testing them and pushing them to think above what they had learned in these 2 weeks. We judged them based on the Intel ISEF criteria (Research project, Methodology, Execution, Creativity and Presentation).

Presenting in front of the judging committee

The closing ceremony – Accra, August 2018

Saturday was parents day. The students made a general presentation of their projects, making the parents laugh uneasily every time they asked “Is everything clear?” At least, I think most of the parents enjoyed the demonstration: it is always nice to see a Duckiebot run!

Finally, at the closing ceremony, the students who had the best presentation grades were rewarded. I was proud that Kwadwo was named Scholar of the Year, winning a Mobile Robotics book and the right to represent Ghana at the Intel ISEF conference in Phoenix, Arizona, in May 2019. He will present his project with the Duckiebot!

The students and organizers also gave each of us a beautiful gift: a honorary scarf on which it is written “Ayeekoo”. In the local languages, it means: “Job well done.”

I hope I did my job well, and that William, Oheneba, Kwadwo and Abrahim will remember Moose the Duckiebot when they choose their careers. I know that, in any case, these four brilliant young men will continue to shine. On my side, I really enjoyed the experience. I will make sure I don’t miss an opportunity to teach again to teenagers using Duckietown, whether it is in another country or here, in Montreal.

The best team!

Important note

I had four boys in my group. You can notice on the picture below that, out of the 24 students, only 3 girls participated in the MISE Foundation program. When I asked Joel about it, he told me he has a very difficult time getting women to participate. At least 6 more girls were invited, but their parents would pressure them not to do maths and science, and discourage them from going to the Summer School. They feel this is not what a woman should be doing. I find this situation very frustrating. Ghana is a country with strong family values that are different from the ones I am used to. It is not our role as international researchers to tell them what is good and what is not. And, to be fair, software engineering presents similar ratios in Canada, even if the reasons are less tangible (maybe?).

On the other hand, engineers and scientists build the world around us, and they do so according to the needs they feel. Men cannot build everything women need. I strongly encourage any girl, in any country, who reads this blog post and who is interested about maths and computer science, to stand for what they want to do. We need you here, to build tomorrow’s world together.

MISE 2018 – Ayeekoo!

You can help the Duckietown Foundation fund similar experiences in Africa and elsewhere in the world by reaching out and donating.

Tell us your story

Are you an instructor, learner, researcher or professional with a Duckietown story to tell? Reach out to us!

Las Olimpiadas AI Driving en NIPS 2018

Autores:

Andrea Censi Liam Paull, Jacopo Tani, Julian Zilly, Thomas Ackermann, Oscar Beijbom, Berabi Berkai, Gianmarco Bernasconi, Anne Kirsten Bowser, Simon Bing, Pin-Wei David Chen, Yu-Chen Chen, Maxime Chevalier-Boisvert, Breandan Considine, Andrea Daniele, Justin De Castri, Maurilio Di Cicco, Manfred Diaz, Paul Aurel Diederichs, Florian Golemo, Ruslan Hristov, Lily Hsu, Yi-Wei Daniel Huang, Chen-Hao Peter Hung, Qing-Shan Jia, Julien Kindle, Dzenan Lapandic, Cheng-Lung Lu, Sunil Mallya, Bhairav Mehta, Aurel Neff, Eryk Nice, Yang-Hung Allen Ou, Abdelhakim Qbaich, Josefine Quack, Claudio Ruch, Adam Sigal, Niklas Stolz, Alejandro Unghia, Ben Weber, Sean Wilson, Zi-Xiang Xia, Timothius Victorio Yasin, Nivethan Yogarajah, Yoshua Bengio, Tao Zhang, Hsueh-Cheng Wang, Matthew Walter, Stefano Soatto, Magnus Egerstedt, Emilio Frazzoli,

Publicado en RSS Workshop on New Benchmarks, Metrics, and Competitions for Robotic Learning

Link: Disponible aquí

Die AI-Fahrolympiade auf der NIPS 2018

Autoren:

Andrea Censi Liam Paull, Jacopo Tani, Julian Zilly, Thomas Ackermann, Oscar Beijbom, Berabi Berkai, Gianmarco Bernasconi, Anne Kirsten Bowser, Simon Bing, Pin-Wei David Chen, Yu-Chen Chen, Maxime Chevalier-Boisvert, Breandan Considine, Andrea Daniele, Justin De Castri, Maurilio Di Cicco, Manfred Diaz, Paul Aurel Diederichs, Florian Golemo, Ruslan Hristov, Lily Hsu, Yi-Wei Daniel Huang, Chen-Hao Peter Hung, Qing-Shan Jia, Julien Kindle, Dzenan Lapandic, Cheng-Lung Lu, Sunil Mallya, Bhairav Mehta, Aurel Neff, Eryk Nice, Yang-Hung Allen Ou, Abdelhakim Qbaich, Josefine Quack, Claudio Ruch, Adam Sigal, Niklas Stolz, Alejandro Unghia, Ben Weber, Sean Wilson, Zi-Xiang Xia, Timothius Victorio Yasin, Nivethan Yogarajah, Yoshua Bengio, Tao Zhang, Hsueh-Cheng Wang, Matthew Walter, Stefano Soatto, Magnus Egerstedt, Emilio Frazzoli,

Veröffentlicht auf dem RSS-Workshop über neue Benchmarks, Metriken und Wettbewerbe für Robotisches Lernen.

Link: Verfügbar hier

How Duckietown inspired a 14 year old girl to become a tech entrepreneur

We host a guest post by Valeria Cagnina, who had the luck to meet our team very early – in fact, when the first Duckietown was still being built – and she helped with the tape!

Nothing is impossible…the word itself says “I’m possible”!

I discovered robotics when I was 11 years old with a digital plant made with Arduino that I saw in Milan Coderdojo. I really liked robotics and decided I would like to make my own robot.

So I searched online for a robot I could make myself. I found some videos on the web about a robot from MIT. I really loved this wonderful robot… but I was too young and I didn’t have the skills necessary to build it. So I surfed online to search other types that would be easier to build, but in my mind remained the dream to go to see this cool robot at MIT in Boston.

After a while, following and making my own Youtube videos, I made my first robot alone at 11 years old: it could move itself around a room avoiding obstacles thanks to its distance sensor programming with Arduino.

In Italy it was not so common to make a robot at 11, so I was able to share this experience a lot of events and conferences that brought me to speak in a TEDx at 14 years old.

Casually, at the same age, I travelled to United States to visit New York, Boston and Canada… at the beginning it seemed a normal holiday… 

I convinced my parents to extend our trip to stay more time around MIT. We went sightseeing in Boston and in MIT but it wasn’t enough for me! I wanted to look inside this place that was so magical to me, and I especially wanted to talk with the engineers that build and program robots! Maybe I would see that same robot that I found when I was 11 years old!

The early stages of Duckietown at MIT

 

I left my parents visiting the rest of Boston and I started to go alone around the MIT departments, trying to open every door that I found in front of me.

While I was walking, I was looking through the laboratories windows and my attention was caught by an empty room -I mean with no humans inside 😀 ! – full of duckies and with a sort of track for cars on the floor.

What was this room about? What was the purpose of these duckies? I was very, very curious about it and had many questions, but there was no one in the lab!

Obviously I never give up, I absolutely believe that nothing is impossible so, every day, until my departure to the next leg of our trip, I continued to go around MIT passing in front of THAT lab hoping to find someone in it.

Finally one day I saw some people inside the lab doing something. I was really excited! I watched them from the window. I absolutely wanted to know what they were doing –  one of them was soldering, another one was using duct tape. Suddenly they saw me and they invited into the lab! What an astonishment for me!

Immediately they asked me a lot of questions: why was a 14 year old roaming MIT alone, why was I so excited about that lab… Then one of them (I didn’t know his name) asked if I wanted to help build “Duckietown”. He told me about the project (at that time it wasn’t started yet) and he asked me about myself and the first robot I built. After an afternoon spent together, I discovered that this strange guy was Andrea Censi, one of the founders of the Duckietown project! Amazing!

Andrea proposed to me a challenge: I had to try to make my own Duckietown robot, a Duckiebot.  Since it was a university project, I was able to follow the online tutorials and ask lots of questions to all the other Duckietown members on the communication forum, Slack. He had only one request of me: he told me that even though the robot was hard to build and program, I shouldn’t give up.  

I was so happy that I immediately agreed. I was handed the robot kit, a list of various links and some Duckies ☺.

Now it was my turn! I didn’t want to disappoint Andrea, so as soon as I arrived in Italy I put myself to work but, wow, building the Duckiebot was very hard! I spent an entire afternoon trying to comprehend just 4 rows of the tutorial. I began to ask questions on Slack and I tried, I tried and I tried again.

I never worked with Linux before so that was a completely new world for me. I started from the beginning, without knowledge at all but I worked for a few months until I received a message from Andrea: “Do you want to spend some time here, in Boston, working with us in Duckietown?” Of course I was willing, I couldn’t wait, it was an amazing proposal!

So I became a Duckietown Senior Tester at 15 years old and I spent almost all the summer inside the labs of MIT. My task was simplifying the university-level tutorial and making it accessible to the high-school students (like me ☺) as well as making the Duckiebot, which had now evolved!

Thanks to the help of Andrea and Liam (the other founder) I finally succeeded to program my robot: it was now able to drive autonomously in Duckietown. If felt like a dream come true!  

Spending the summer in Duckietown at MIT allowed to me to discover a completely new world: I understood that education could be playful and that learning could be fun!

 

Valeria's duckiebot (back)
Valeria's Duckiebot (side)

The AI Driving Olympics at NIPS 2018

General Information

Learn more

Duckietown is a platform for creating and disseminating robotics and AI learning experiences.

It is modular, customizable and state-of-the-art, and designed to teach, learn, and do research. From exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge, Duckietown evolves with the skills of the user.

Duckiebots are ready to conquer the world!

Dear friends of Duckietown:

We are excited to bring you tremendous news about the Duckietown project.

In the past years we have had the support from many enthusiastic individuals who have donated their time and efforts to help the Duckietown project grow, and grown it has!

Duckietown started at MIT in 2016 – almost two years ago. Now Duckietown classes have been taught in 10 countries with more than 700 alumni.

The last months have been a transformative period for the project, as we prepare to jump to the next level in terms of scope and reach.

The Duckietown Foundation

We have established the Duckietown Foundation, a non-profit entity that will lead the Duckietown project.

Our mission: make the world excited about the beauty, the fun, the importance, and the challenges of robotics and artificial intelligence, through learning experiences that are tangible, accessible, and inclusive.

The Duckietown Foundation will serve as the coordination point for the development of Duckietown. As a non-profit, the foundation can accept donations from individuals and companies for the promotion of affordable and fun robotics learning programs around the world.

A Kickstarter

duckietown-00165
data-from-img-fleet_management-daaa-e76ab943

We are organizing a Kickstarter to make it easier for people to obtain Duckiebots and Duckietowns.

This solves the biggest hurdle so on reproducing the Duckietown experience: the the lack of a one-click solution to acquire the hardware.

Also, working with thousands of pieces allows to drive down the price and to design our own custom boards.

See: Our Kickstarter

A donation program

As much as we aim to have affordable hardware, in certain parts of the world the only realistic price is $0.

That is why we have included a donate-a-Duckiebot and donate-a-class program through the Kickstarter.

Become a friend of Duckietown and support the distribution of low-cost and playful AI and robotics education to even more schools across the globe by backing our Kickstarter campaign.

To learn more about how to support Duckietown, reach out to [email protected]

A new website…

We’ve designed a new website that better serves users of the platform by offering support forums and more organized access to the teaching materials.

See: The new forums.

See: New “duckumentation” site docs.duckietown.com

… and 700 more new websites

We want people to share their Duckietown experiences with other Duckie-enthusiasts, whether they be far or near. That’s now possible through upwards of  700 “community” subsites, each with a blog and a forum.

For more information, see the post Communities sites launched.

The AI Driving Olympics

In addition to its role as an education platform, Duckietown is a useful research tool.

We are happy to announce that Duckietown is the official platform for the AI Driving Olympics, a machine learning competition to be held at NIPS 2018 and ICRA 2019, the two largest machine learning and robotics conferences in the world. We challenge you to put your coding to the test and join the competition.

That’s all for now! Thanks for listening –

The Duckietown project relies on an active and engaged community, which is why we want you to stay involved! Support robotics education and research –  Sign up on our website! Back our kickstarter! Compete in the AI Driving Olympics!

 

For any additional information of if you would like to help us in other ways, please see here for how to help us.

Duckietown: An open, inexpensive and flexible platform for autonomy education and research

Duckietown: An open, inexpensive and flexible platform for autonomy education and research

Duckietown is an open, inexpensive and flexible platform for autonomy education and research. The platform comprises small autonomous vehicles (“Duckiebots”) built from off-the-shelf components, and cities (“Duckietowns”) complete with roads, signage, traffic lights, obstacles, and citizens (duckies) in need of transportation. The Duckietown platform offers a wide range of functionalities at a low cost. Duckiebots sense the world with only one monocular camera and perform all processing onboard with a Raspberry Pi 2, yet are able to: follow lanes while avoiding obstacles, pedestrians (duckies) and other Duckiebots, localize within a global map, navigate a city, and coordinate with other Duckiebots to avoid collisions. Duckietown is a useful tool since educators and researchers can save money and time by not having to develop all of the necessary supporting infrastructure and capabilities. All materials are available as open source, and the hope is that others in the community will adopt the platform for education and research.

Did you find this interesting?

Read more Duckietown based papers here.