There was a great turnout for the first AI Driving Olympics competition, which took place at the NeurIPS conference in Montreal, Canada on Dec 8, 2018. In the finals, the submissions from the top five competitors were run from five different locations on the competition track.
Our top five competitors were awarded $3000 worth of AWS Credits (thank you AWS!) and a trip to one of nuTonomy’s offices for a ride in one of their self-driving cars (thanks APTIV!)
Team Panasonic R&D Center Singapore & NUS
The approach: We used the random template for its flexibility and created a debug framework to test the algorithm. After that, we created one python package for our algorithm and used the random template to directly call it. The algorithm basically contains three parts: 1. Perception, 2. Prediction and 3. Control. Prediction plays the most important role when the robot is at the sharp turn where the camera can not observe useful information.
The approach: “I tried and imitate what a human does when he follows a lane. I believe the human tries to center itself at all times in the lane using the two lines as guides. I think the human implicitly projects the two lines into the horizon and where they intersect is where the human directs the vehicle towards.”
The approach: “The AI-DO application I made was using the ROS lane following baseline. After running it out of the box, I noticed a couple of problems and corrected them by changing several parameters in the code.”
The approach: “We used our framework for parallel deep reinforcement learning. Our network consisted of five convolutional layers (1st layer with 32 9×9 filters, each following layer with 32 5×5 filters), followed by two fully connected layers (with 768 and 48 neurons) that took as an input four last frames downsampled to 120 by 160 pixels and filtered for white and yellow color. We trained it with Deep Deterministic Policy Gradient algorithm (Lillicrap et al. 2015). The training was done in three stages: first, on a full track, then on the most problematic regions, and then on a full track again.”
The approach: Our solution is based on reinforcement learning algorithm. We used a Twin delayed DDPG and ape-x like distributed scheme. One of the key insights was to add PID controller as an additional explorative policy. It has significantly improved learning speed and quality