A Practical Map Needs Direct Visual Odometry

Visual odometry (VO) is the process of estimating the ego-motion of an agent sequentially (e


Introduction
As the fundamental building blocks for many emerging technologies -from autonomous cars and UAVs to virtual and augmented reality, real-time methods for SLAM and VO have made significant progress. For a long time, the field was dominated by feature-based methods, which are sensitive to environment textures [1]. In addition, the sparse maps constructed by such methods are insufficient for further applications, such as navigation. In recent years, a number of direct formulations have become popular. By directly operating on the raw pixel intensity, it is believed by some researchers [2] that the main limitation of direct methods is their reliance on the consistent appearance between the matched pixels, which is seldom satisfied in robotic applications. To tackle challenging illumination conditions is a long-term big issue for direct VO. Some methods focus on improving the robustness to illumination.
[1] Combined binary feature descriptors and direct tracking framework to obtain robust performance, which enabled visual odometer to be applied to low-textured and poorly lit environment. [3,4] simulated the illumination changing with the gain+bias model. Such methods are limited by definition and would fail in some applications where non-global and nonlinear intensity deformations are common [1]. More sophisticated techniques have been presented. [5] proposed a new direct visual tracking approach where the robustness to lighting changes is assured by using a proposed model of illumination that changes together with an appropriate geometric model of image motion. [6] found that line-segments are abundant and less sensitive to light changes. Thus a camera motion computing method based on 3D line-segments was proposed.
It is believed that these methods, which either impose stringent scene constraints, or heavily rely on dense depth estimates, are not always available. Some methods focus on improving the robustness and accuracy. Through modifying the estimating frameworks. In [7], a sensor model based on the t-distribution and a motion model based on a constant velocity model were incorporated in the minimization problem to guide and stabilize motion estimation in dynamic environment. [8] extended this work to semi-dense monocular visual odometer, adding depth map estimation for selected pixels. Direct and feature-based methods are synthesized to improve the efficiency and robustness simultaneously.
SVO [9] combined direct alignment and feature points and can be used for high frame rate cameras. Edges are important evidence, especially in man-made environments. Line-based VO without bundle adjustment has recently been used for

Robotics & Automation Engineering Journal
stereo cameras, RGBD [10][11][12][13]. Kuse et al. [13] minimized the geometric error to its nearby-edges pixels through distance transformation. Gradient-based optimization methods are commonly applied to compute the relative motion and can be classified into three categories: forward compositional approach, inverse-compositional formulation and efficient second-order minimization approach [3]. Different optimization methods vary in searching efficiency.

Discussion
The performance of direct methods is unstable in different datasets that are collected in different environmental conditions. It is impractical to implement a comprehensive comparison under a unified quantitative standard. Based on the perspective of qualitative, the combination of features and direct frameworks gives VO higher robustness to illumination changing and fast motion while lower precision compared to standard direct methods. In addition, the inherent drawbacks of these features cannot be completely avoided. The integration of re-projection errors and sensor errors in the object function extends the applications scenarios of VO to low dynamic environments. By selecting significant pixels instead of all, the efficiency of VO can be greatly improved.

Conclusion
With the low discrimination of pixels, the minimizing of photometric error is a highly non-convex function thus requires a good initial guess for the optimization. Prior motion model and coarse-to-fine strategy are the two main solutions. Intrinsically, to improve the robustness and precision, the whole VO framework should be designed to ensure that most pixel pairs are matched correctly under uncertain conditions. Direct methods, which exploit much more image information and create dense or semi-dense maps, are still attractive and promising with the development of visual SLAM.