Congratulations to the winners of the second edition of the AI Driving Olympics!

Team JetBrains came out on top on all 3 challenges

It was a busy (and squeaky) few days at the International Conference on Robotics and Automation in Montreal for the organizers and competitors of the AI Driving Olympics. 

The finals were kicked off by a semifinals round, where we the top 5 submissions from the Lane Following in Simulation leaderboard. The finalists (JBRRussia and MYF) moved forward to the more complicated challenges of Lane Following with Vehicles and Lane Following with Vehicles and Intersections. 

Results from the AI-DO2 Finals event on May 22, 2019 at ICRA

If you couldn’t make it to the event and missed the live stream on Facebook, here’s a short video of the first run of the JetBrains Lane Following submission.

Thanks to everyone that competed, dropped in to say hello, and cheered on the finalists by sending the song of the Duckie down the corridors of the Palais des Congrès. 

A few pictures from the event

Don't know much about the AI Driving Olympics?

It is an accessible and reproducible autonomous car competition designed with straightforward standardized hardware, software and interfaces.

Get Started

Step 1: Build and test your agent with our available templates and baselines

Step 2: Submit to a challenge

Check out the leaderboard

View your submission in simulation

Step 3: Run your submission on a robot

in a Robotarium

AI-DO Robotarium Evaluations Underway

Autolab evaluations underway

We have started evaluating the submissions in our Duckietown “Robotarium” (aka Autolab):

Duckiebot onboard camera feed

Robotarium watchtower camera feed

To queue your submissions for robotarium evaluation, please follow these instructions:

You need to use the –challenge option to specify 3 challenges: the two simulated ones (testing and validation) and the hardware one:

  • dts challenges submit –challenge aido2-LF-sim-validation,aido2-LF-sim-testing,aido2-LF-real-validation
  • dts challenges submit –challenge aido2-LFV-sim-validation,aido2-LFV-sim-testing,aido2-LFV-real-validation
  • dts challenges submit –challenge aido2-LFV-sim-validation,aido2-LFVI-sim-testing,aido2-LFVI-real-validation

We will evaluate submissions by participants that are in the top part of the leaderboard in the simulated testing challenge.

The robotarium evaluations are limited, and we will do them in a round robin strategy for each user. We aim to evaluate all in the top 10 of the simulated challenge; and then more if there is the possibility.

Participants can have multiple submissions in the “real” challenges. We will evaluate first according to “user priority” or by most recent. The priority is settable through the web interface by using the top right button.

Deadlines

The challenges will close May 21 at 8pm Montreal (EDT) time. Please check the server timestamp for the precise time in your time zone.

Round 2 of the the AI Driving Olympics is underway!

The AI-DO is back!

We are excited to announce that we are now ready to accept submissions for AI-DO 2, which will culminate in a live competition event to be held at ICRA 2019 this May 20-22.

The AI Driving Olympics is a global robotics competition that comprises a series of challenges based on autonomous driving. The AI-DO provides a standardized simulation and robotics platform that people from around the world use to engage in friendly competition, while simultaneously advancing the field of robotics and AI. 

Check out our official press release.

The finals of AI-DO 1 at NeurIPS, December 2018

We want to see your classical robotic and machine learning based algorithms go head to head on the competition track. Get started today!

Want to learn more or join the competition? Information and get started instructions are here.

IEEE flyer

If you've already joined the competition we want to hear from you! 

 Share your pictures on facebook and twitter

 Get involved in the community by:

asking for help

offering help

AI-DO 1 at NeurIPS report. Congratulations to our winners!

The winners of AIDO-1 at NeurIPS

duckie-only-transparent

There was a great turnout for the first AI Driving Olympics competition, which took place at the NeurIPS conference in Montreal, Canada on Dec 8, 2018. In the finals, the submissions from the top five competitors were run from  five different locations on the competition track. 

Our top five competitors were awarded $3000 worth of AWS Credits (thank you AWS!) and a trip to one of nuTonomy’s offices for a ride in one of their self-driving cars (thanks APTIV!) 

2000px-Amazon_Web_Services_Logo.svg
aptiv_logo_color_rgb

WINNER

Team Panasonic R&D Center Singapore & NUS

(Wei Gao)


Check out the submission.

The approach: We used the random template for its flexibility and created a debug framework to test the algorithm. After that, we created one python package for our algorithm and used the random template to directly call it. The algorithm basically contains three parts: 1. Perception, 2. Prediction and 3. Control. Prediction plays the most important role when the robot is at the sharp turn where the camera can not observe useful information.

2nd Place

Jon Plante


Check out the submission.

The approach:  “I tried and imitate what a human does when he follows a lane. I believe the human tries to center itself at all times in the lane using the two lines as guides. I think the human implicitly projects the two lines into the horizon and where they intersect is where the human directs the vehicle towards.”

 

3rd Place

Vincent Mai


Check out the submission.

The approach: “The AI-DO application I made was using the ROS lane following baseline. After running it out of the box, I noticed a couple of problems and corrected them by changing several parameters in the code.”

 

 

Jacopo Tani - IMG_20181208_163935

4th Place

Team JetBrains

(Mikita Sazanovich)


Check out the submission.

The approach: “We used our framework for parallel deep reinforcement learning. Our network consisted of five convolutional layers (1st layer with 32 9×9 filters, each following layer with 32 5×5 filters), followed by two fully connected layers (with 768 and 48 neurons) that took as an input four last frames downsampled to 120 by 160 pixels and filtered for white and yellow color. We trained it with Deep Deterministic Policy Gradient algorithm (Lillicrap et al. 2015). The training was done in three stages: first, on a full track, then on the most problematic regions, and then on a full track again.”

5th Place

Team SAIC Moscow

(Anton Mashikhin)


Check out the submission.

The approach: Our solution is based on reinforcement learning algorithm. We used a Twin delayed DDPG and ape-x like distributed scheme. One of the key insights was to add PID controller as an additional  explorative policy. It has significantly improved learning speed and quality

A few photos from the day

AI-DO1 Submission Deadline: Thursday Dec 6 at 11:59pm PST

We’re just about at the end of the road for the 2018 AI Driving Olympics.

There’s certainly been some action on the leaderboard these last few days and it’s going down to the wire. Don’t miss your chance to see you name up there and win the amazing prizes donated by nuTonomy and Amazon AWS!

Submissions will close at 11:59pm PST on Thursday Dec. 6.

Please join us at NeurIPS for the live competition 3:30-5:00pm EST in room 511!

AI-DO I Interactive Tutorials

The AI Driving Olympics, presented by the Duckietown Foundation with help from our partners and sponsors is now in full swing. Check out the leaderboard!

We now have templates for ROS, PyTorch, and TensorFlow, as well as an agnostic template.

We also have baseline implementation using the classical pipeline, imitation learning with data from both simulation and real Duckietown logs, and reinforcement learning.

We are excited to announce that we will be hosting a series of interactive tutorials for competitors to get started. These tutorials will be streamed live from our Facebook page.

See here for the full tutorial schedule.

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Deep Reinforcement Learning (DRL) has become a powerful strategy to
solve complex decision making problems based on Deep Neural Networks (DNNs).

However, it is highly data demanding, so unfeasible in physical systems for most
applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback,
with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of
DNNs, but also has no need of a reward function (which sometimes implies the
need of external perception for computing rewards). We combine Deep Learning
with the COrrective Advice Communicated by Humans (COACH) framework, in
which non-expert humans shape policies by correcting the agent’s actions during
execution. The D-COACH framework has the potential to solve complex problems
without much data or time required. 

Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot),with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL.

Introduction

Deep Reinforcement Learning (DRL) has obtained unprecedented results in decisionmaking problems, such as playing Atari games [1], or beating the world champion inGO [2]. 

Nevertheless, in robotic problems, DRL is still limited in applications with
real-world systems [3]. Most of the tasks that have been successfully addressed with
DRL have two common characteristics: 1) they have well-specified reward functions, and 2) they require large amounts of trials, which means long training periods
(or powerful computers) to obtain a satisfying behavior. These two characteristics
can be problematic in cases where 1) the goals of the tasks are poorly defined or
hard to specify/model (reward function does not exist), 2) the execution of many
trials is not feasible (real systems case) and/or not much computational power or
time is available, and 3) sometimes additional external perception is necessary for
computing the reward/cost function.

On the other hand, Machine Learning methods that rely on transfer of human
knowledge, Interactive Machine Learning (IML) methods, have shown to be time efficient for obtaining good performance policies and may not require a well-specified
reward function; moreover, some methods do not need expert human teachers for
training high performance agents [4–6]. In previous years, IML techniques were
limited to work with low-dimensional state spaces problems and to the use of function approximation such as linear models of basis functions (choosing a right basis
function set was crucial for successful learning), in the same way as RL. But, as
DRL have showed, by approximating policies with Deep Neural Networks (DNNs)
it is possible to solve problems with high-dimensional state spaces, without the need
of feature engineering for preprocessing the states. If the same approach is used in
IML, the DRL shortcomings mentioned before can be addressed with the support of
human users who participate in the learning process of the agent.
This work proposes to extend the use of human corrective feedback during task
execution to learn policies with state spaces of low and high dimensionality in continuous action problems (which is the case for most of the problems in robotics)
using deep neural networks.

We combine Deep Learning (DL) with the corrective advice based learning
framework called COrrective Advice Communicated by Humans (COACH) [6],
thus creating the Deep COACH (D-COACH) framework. In this approach, no reward functions are needed and the amount of learning episodes is significantly reduced in comparison to alternative approaches. D-COACH is validated in three different tasks, two in simulations and one in the real-world.

Conclusions

This work presented D-COACH, an algorithm for training policies modeled with
DNNs interactively with corrective advice. The method was validated in a problem
of low-dimensionality, along with problems of high-dimensional state spaces like
raw pixel observations, with a simulated and a real robot environment, and also
using both simulated and real human teachers.

The use of the experience replay buffer (which has been well tested for DRL) was
re-validated for this different kind of learning approach, since this is a feature not
included in the original COACH. The comparisons showed that the use of memory
resulted in an important boost in the learning speed of the agents, which were able
to converge with less feedback, and to perform better even in cases with a significant
amount of erroneous signals.

The results of the experiments show that teachers advising corrections can train
policies in fewer time steps than a DRL method like DDPG. So it was possible
to train real robot tasks based on human corrections during the task execution, in
an environment with a raw pixel level state space. The comparison of D-COACH
with respect to DDPG, shows how this interactive method makes it more feasible
to learn policies represented with DNNs, within the constraints of physical systems.
DDPG needs to accumulate millions of time steps of experience in order to obtain

Did you find this interesting?

Read more Duckietown based papers here.