Embedded out-of-distribution detection on an autonomous robot platform

Embedded out-of-distribution detection on an autonomous robot platform

Introduction

Machine learning is becoming more and more common in cyber-physical systems; many of these systems are safety critical, e.g. autonomous vehicles, UAVs, and surgical robots.  However, machine learning systems can only provide accurate outputs when their input data is similar to their training data.  For example, if an object detector in an autonomous vehicle is trained on images containing various classes of objects, but no ducks, what will it do when it encounters a duck during runtime?  One method for dealing with this challenge is to detect inputs that lie outside the training distribution of data: out-of-distribution (OOD) detection.  Many OOD detector architectures have been explored, however the cyber-physical domain adds additional challenges: hard runtime requirements and resource constrained systems.  In this paper, we implement a real-time OOD detector on the Duckietown framework and use it to demonstrate the challenges as well as the importance of OOD detection in cyber-physical systems.

Out-of-Distribution Detection

Machine learning systems perform best when their test data is similar to their training data.  In some applications unreliable results from a machine learning algorithm may be a mere nuisance, but in other scenarios they can be safety critical.  OOD detection is one method to ensure that machine learning systems remain safe during test time.  The goal of the OOD detector is to determine if the input sample is from a different distribution than that of the training data.  If an OOD sample is detected, the detector can raise a flag indicating that the output of the machine learning system should not be considered safe, and that the system should enter a new control regime.  In an autonomous vehicle, this may mean handing control back to the driver, or bringing the vehicle to a stop as soon as practically possible.

In this paper we consider the existing β-VAE based OOD detection architecture.  This architecture takes advantage of the information bottleneck in a variational auto-encoder (VAE) to learn the distribution of training data.  In this detector the VAE undergoes unsupervised training with the goal of minimizing the error between a true prior probability in input space p(z), and an approximated posterior probability from the encoder output p(z|x).  During test time, the Kullback-Leibler divergence between these distributions p(z) and q(z|x) will be used to assign an OOD score to each input sample.  Because the training goal was to minimize the distance between these two distributions on in-distribution data, in-distribution data found at runtime should have a low OOD score while OOD data should have a higher OOD score.

Duckietown

We used Duckietown to implement our OOD detector.  Duckietown provides a natural test bed because:

  • It is modular and easy to learn: the focus of our research is about implementing an OOD detector, not building a robot from scratch
  • It is a resource constrained system: the RPi on the DB18 is powerful enough to be capable of navigation tasks, but resource constrained enough that real-time performance is not guaranteed.  It servers as a good analog for a  system in which an OOD detector shares a CPU with perception, planning, and control software.
  • It is open source: this eliminates the need to purchase and manage licenses, allows us to directly check the source code when we encounter implementation issues, and allows us to contribute back to the community once our project is finished.
  • It is low-cost: we’re not made of money 🙂
 In our experiment, we used the stock DB18 robot.  Because we took advantage of the existing Duckietown framework, we only had to write three ROS nodes ourselves:
  • Lane following node: a simple OpenCV-based lane follower that navigates based on camera images.  This represents the perception and planning system for the mobile robot that we are trying to protect.  In our system the lane following node takes 640×480 RGB images and updates the planned trajectory at a rate of 5Hz.
  • OOD detection node: this node also takes images directly from the camera, but its job is to raise a flag when an OOD input appears (image with an OOD score greater than some threshold).  On the RPi with no GPU or TPU, it takes a considerable amount of time to make an inference on the VAE, so our detection node does not have a target rate, but rather uses the last available camera frame, dropping any frames that arrive while the OOD score is being computed.
  • Motor control node: during normal operation it takes the trajectory planned by the lane following node and sends it to the wheels.  However, if it receives a signal from the OOD detection node, it begins emergency breaking.

The Experiment

Our experiment considers the emergency stopping distance required for the Duckiebot when an OOD input is detected.  In our setup the Duckiebot drives forward along a straight track.  The area in front of the robot is divided into two zones: the risk zone and the safe zone.  The risk zone is an area where if an obstacle appears, it poses a risk to the Duckiebot.  The safe zone is further away and to the sides; this is a region where unknown obstacles may be present, but they do not pose an immediate threat to the robot.  An obstacle that has not appeared in the training set is placed in the safe zone in front of the robot.  As the robot drives forward along the track, the obstacle will eventually enter the risk zone.   Upon entry into the risk zone we measure how far the Duckiebot travels before the OOD detector triggers an emergency stop.

We defined the risk zone as the area 60cm directly in front of our Duckiebot.  We repeated the experiment 40 times and found that with our system architecture, the Duckiebot stopped on average 14.5cm before the obstacle.  However, in 5 iterations of the experiment, the Duckiebot collided with the stationary obstacle.

We wanted to analyze what lead to the collision in those five cases.  We started by looking at the times it took for our various nodes to run.  We plotted the distribution of end-to-end stopping times, image capture to detection start times, OOD detector execution times, and detection result to motor stop times.  We observed that there was a long tail on the OOD execution times, which lead us to suspect that the collisions occurred when the OOD detector took too long to produce a result.  This hypothesis was bolstered by the fact that even when a collision had occurred, the last logged OOD score was above the detection threshold, it had just been produced too late.  We also looked at the final two OOD detection times for each collision and found that in every case the final two times were above the median detector execution time.  This highlights the importance of real-time scheduling  when performing OOD detection in a cyber-physical system.

We also wanted to analyze what would happen if we adjusted the OOD detection threshold.  Because we had logged the the detection threshold every time the detector had run, we were able to interpolate the position of the robot at every detection time and discover when the robot would have stopped for different OOD detection thresholds.  We observe there is a tradeoff associated with moving the detection threshold.  If the detection threshold is lowered, the frequency of collisions can be reduced and even eliminated.  However, the mean stopping distance is also moved further from the obstacle and the robot is more likely to stop spuriously when the obstacle is outside of the risk zone.

 

Next Steps

In this paper we successfully implemented an OOD detector on a mobile robot, but our experiment leaves many more questions:

  • How does the performance of other OOD detector architectures compare with the β-VAE detector we used in this paper?
  • How can we guarantee the real-time performance of an OOD detector on a resource-constrained system, especially when sharing a CPU with other computationally intensive tasks like perception, planning, and control?
  • Does the performance vary when detecting more complex OOD scenarios: dynamic obstacles, turning corners, etc.?

Did you find this interesting?

Read more Duckietown based papers here.