Robotics and Automation Engineering Journal

A Practical Map Needs Direct Visual Odometry

**Zonghai Chen*, Jikai Wang and Zhenhua Ge**

Department of Automation, University of Science and Technology of China, China

Submission: July 11, 2017; Published: September 06, 2017

*Corresponding author: Zonghai Chen, Knowledge Representation and Intelligent Information Technology Laboratory, Department of Automation, University of Science and Technology of China, China, Tel: 0551-63601514; Email: chenzh@ustc.edu.cn

How to cite this article: Zonghai C, Jikai W, Zhenhua G. A Practical Map Needs Direct Visual Odometry. Robot Autom Eng J. 2017; 1(1): 555555. DOI: 10.19080/RAEJ.2017.01.555555

Abstract

Visual odometry (VO) is the process of estimating the ego-motion of an agent sequentially (e.g. vehicle, human, and robot) using only the input of single or multiple cameras attached to it. VO plays an important role in Simultaneous Localization and Mapping (SLAM), which has been widely applied in many fields. The development of VO would greatly promote the transition of visual mapping and navigation to the industry. This paper focuses on direct methods and aims to summarize recent direct algorithms from technical views. Image representation and frame matching strategy are the two key components of VO. This paper is organized based on these two aspects. A qualitative analysis of existing work is then presented. This paper simultaneously serves as a position and tutorial paper.

Keywords: Direct visual odometry; SLAM; Image representation

Abbreviations: VO: Visual odometry; SLAM: Simultaneous Localization and Mapping

Introduction

As the fundamental building blocks for many emerging technologies - from autonomous cars and UAVs to virtual and augmented reality, real-time methods for SLAM and VO have made significant progress. For a long time, the field was dominated by feature-based methods, which are sensitive to environment textures [1]. In addition, the sparse maps constructed by such methods are insufficient for further applications, such as navigation. In recent years, a number of direct formulations have become popular. By directly operating on the raw pixel intensity, it is believed by some researchers [2] that the main limitation of direct methods is their reliance on the consistent appearance between the matched pixels, which is seldom satisfied in robotic applications. To tackle challenging illumination conditions is a long-term big issue for direct VO. Some methods focus on improving the robustness to illumination.

[1] Combined binary feature descriptors and direct tracking framework to obtain robust performance, which enabled visual odometer to be applied to low-textured and poorly lit environment. [3,4] simulated the illumination changing with the gain+bias model. Such methods are limited by definition and would fail in some applications where non-global and nonlinear intensity deformations are common [1]. More sophisticated techniques have been presented. [5] proposed a new direct visual tracking approach where the robustness to lighting changes is assured by using a proposed model of illumination that changes together with an appropriate geometric model of image motion. [6] found that line-segments are abundant and less sensitive to light changes. Thus a camera motion computing method based on 3D line-segments was proposed.

It is believed that these methods, which either impose stringent scene constraints, or heavily rely on dense depth estimates, are not always available. Some methods focus on improving the robustness and accuracy. Through modifying the estimating frameworks. In [7], a sensor model based on the t-distribution and a motion model based on a constant velocity model were incorporated in the minimization problem to guide and stabilize motion estimation in dynamic environment. [8] extended this work to semi-dense monocular visual odometer, adding depth map estimation for selected pixels. Direct and feature-based methods are synthesized to improve the efficiency and robustness simultaneously.

SVO [9] combined direct alignment and feature points and can be used for high frame rate cameras. Edges are important evidence, especially in man-made environments. Line-based VO without bundle adjustment has recently been used for stereo cameras, RGBD [10-13]. Kuse et al. [13] minimized the geometric error to its nearby-edges pixels through distance transformation. Gradient-based optimization methods are commonly applied to compute the relative motion and can be classified into three categories: forward compositional approach, inverse-compositional formulation and efficient second-order minimization approach [3]. Different optimization methods vary in searching efficiency.

Discussion

The performance of direct methods is unstable in different datasets that are collected in different environmental conditions. It is impractical to implement a comprehensive comparison under a unified quantitative standard. Based on the perspective of qualitative, the combination of features and direct frameworks gives VO higher robustness to illumination changing and fast motion while lower precision compared to standard direct methods. In addition, the inherent drawbacks of these features cannot be completely avoided. The integration of re-projection errors and sensor errors in the object function extends the applications scenarios of VO to low dynamic environments. By selecting significant pixels instead of all, the efficiency of VO can be greatly improved.

Conclusion

With the low discrimination of pixels, the minimizing of photometric error is a highly non-convex function thus requires a good initial guess for the optimization. Prior motion model and coarse-to-fine strategy are the two main solutions. Intrinsically, to improve the robustness and precision, the whole VO framework should be designed to ensure that most pixel pairs are matched correctly under uncertain conditions. Direct methods, which exploit much more image information and create dense or semi-dense maps, are still attractive and promising with the development of visual SLAM.