Monocular Visual Odometry for Duckiebot Navigation

Project Resources

Why Monocular Visual Odometry?

Monocular Visual Odometry (VO) falls under the “perception” block of the traditional robot autonomy architecture.

Perception in robot autonomy involves transforming sensor data into actionable information to accomplish a given task in the environment.

Perception is crucial because it allows robots to create a representation of themselves in teh environment they are operating within, which in turn enables the robot to navigate, avoid static or dynamic obstacles, forming the foundation for effective autonomy.

The function of monocular visual odometry is to estimate the robot’s pose over time by analyzing the sequence of images captured by a single camera.

VO in this project is implemented through the following steps:

Image acquisition: the node receives images from the camera, which serve as the primary source of data for motion estimation.
Feature extraction: key features (points of interest) are extracted from the images using methods like ORB, SURF, or SIFT, which highlight salient details in the scene.
Feature matching: the extracted features from consecutive images are matched, identifying how certain points have moved from one image to the next.
Outlier filtering: erroneous or mismatched features are filtered out, improving the accuracy of the feature matches. In this project, histogram fitting to discard outliers is used.
Rotation estimation: the filtered feature matches are used to estimate the rotation of the Duckiebot, determining how the orientation has changed.
Translation estimation: simultaneously, the node estimates the translation, i.e., how much the Duckiebot has moved in space.
Camera information and kinematics inputs: additional information from the camera (e.g., intrinsic parameters) and kinematic data (e.g., velocity) help refine the translation and rotation estimations.
Path and odometry outputs: the final estimated motion is used to update the Duckiebot’s odometry (evolution of pose estimate over time) and the path it follows within the environment.

Monocular visual odometry is challenging, but provide low-cost, camera-based solution for real-time motion estimation in dynamic environments.

Monocular Visual Odometry: the challenges

Implementing Monocular Visual Odometry involves processing images at runtime, presents challenges that effect performance.

Extracting and matching visual features from consecutive images is a fundamental task in monocular VO. This process can be hindered by factors such as low texture areas, motion blur, variations in lighting conditions and occlusions.
Monocular VO systems face inherent scale ambiguity since a single camera cannot directly measure depth. The system must infer scale from visual features, which can be error-prone and less accurate in the absence of depth cues.
Running VO algorithms requires significant computational resources, particularly when processing high-resolution images at a high frequency. The Raspberry Pi used in the Duckiebot has limited processing power and memory, which contrians the performance of the visual odometry pipeline (the newer Duckiebots, DB21J uses Jetson Nano for computation.)
Monocular VO systems, as all odometry systems relying on dead-recokning models, are susceptible to long-term drift and divergence due to cumulative errors in feature tracking and pose estimation.

This project addresses visual odometry challenges by implementing robust feature extraction and matching algorithms (ORB by default) and optimizing parameters to handle dynamic environments and computational constraints. Moreover, it integrates visual odometry with existing Duckiebot autonomy pipeline, leveraging the finite state machine for accurate pose estimation and navigation.

Project Highlights

Here is the output of the authors’ work. Check out the GitHub repository for more details!

Image showing the estimated path of the Duckiebot during the visual odometry demo. — Figure 1. Path During Visual Odometry Demo.

Image showing the process of removing outliers in visual odometry feature matching using histogram fitting. — Figure 2. Outlier Removal Using Histogram Fitting.

Image illustrating the division of feature pairs into far and close regions for motion estimation in visual odometry. — Figure 3. Region Division for Motion Estimation.

Block diagram showing the steps of the visual odometry pipeline for Duckiebot pose estimation. — Figure 4. Block Diagram of Visual Odometry Implementation.

Monocular Visual Odometry: Results

Monocular Visual Odometry: Authors

Gianmarco Bernasconi is a former Duckietown student of class Autonomous Mobility on Demand at ETH Zurich, and currently works as a Senior Research Engineer at Motional, Singapore.

Tomasz Firynowicz is a former Duckietown student and teaching assistant of the Autonomous Mobility on Demand class at ETH Zurich, and currently works as a Software Engineer at Dentsply Sirona, Switzerland. Tomasz was a mentor on this project.

Guillem Torrente Martí is a former Duckietown student and teaching assistant of the Autonomous Mobility on Demand class at ETH Zurich, and currently works as a Robotics Engineer at SonyAI, Japan. Guillem was a mentor on this project.

Yang Liu is a former Duckietown student and teaching assistant of the Autonomous Mobility on Demand class at ETH Zurich, and currently is a Doctoral Student at EPFL, Switzerland. Yang was a mentor on this project

Learn more

Duckietown is a modular, customizable and state-of-the-art platform for creating and disseminating robotics and AI learning experiences.

It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.