Visual-Inertial Tracking using Preintegrated Factors

Links & Report

https://github.com/yunjinli/tum_visnav22_project

Method description

Status Quo

During the first phase of the practical course, we were expected to construct a visual odometry (VO) method which uses image pairs from a stereo camera setup to perform SLAM. Towards this goal, based on a single initial image pair, the VO extracts features from both images using ORB keypoints and matches them between the two images using BRIEF descriptors. Matches are further distilled using the epipolar constraint and RANSAC. A Bag-of-words approach is used to track the features over multiple images.

Now, a map is initialized based on an initial camera pair, and the observed features are triangulated between the two cameras using the matched descriptors. They are added to the map as 3D landmarks (map points). After this initialization, we iteratively add new camera pairs and corresponding landmarks using PnP in a RANSAC scheme. As minor errors occur in all of the previous steps we have to optimize the 6DoF pose of all cameras as well as the 3DoF position of all landmarks using Bundle Adjustment.

To make this optimization real-time capable, we do not add all frames to the optimization problem, but only keyframes that are created based on a set of criterions. A sliding optimization window is additionally used, to further decrease the number of parameters in the optimization problem and in order to keep the size relatively constant over time. Old keyframes and their respective observed 3D landmarks are marginalized and will no longer be optimized in the Bundle Adjustment problem.

Even though the visual odometry can already accomplish quite nice results in the simple dataset, used during the exercises, it's worth noting that the capability of the VO is still quite limited. Therefore, the goal of this project is to extend the current VO baseline to a Visual-Inertial Odometry (VIO) by integrating IMU measurements.

IMU model and motion integration

The IMU gives us acceleration and angular velocity. Note that these terms are affected by white noises and slowly varying biases. Because of the time constraint of the project, we would like to make things as simple as possible. Therefore, we neglect the noises and biases in our VIO system. Based on the previous work, we can derive the state at $t+\Delta t$ as follow.

$\begin{aligned} \textbf{R}(t+\Delta t)&=\textbf{R}(t) \text{Exp}(\boldsymbol{\omega} (t) \Delta t) \\ \textbf{v}(t+\Delta t)&=\textbf{v}(t)+\textbf{g}\Delta t + \textbf{R}(t) \textbf{a}(t)\Delta t \\ \textbf{p}(t+\Delta t)&=\textbf{p}(t)+\textbf{v}(t)\Delta t+\frac{1}{2}\textbf{g}\Delta t^2+\frac{1}{2}\textbf{R}(t)\textbf{a}(t)^2 \end{aligned}$

Note that $\textbf{R}(t)$, $\textbf{v}(t)$, and $\textbf{p}(t)$ are the rotation, velocity, and position estimated at time t and they are all observed by the world coordinate frame. The assumption we make here is that between the interval $[t, t+\Delta t]$, $\textbf{R}(t)$ is constant. This is the reason why the integration above would only be reasonable when $\Delta t$ is small.

IMU preintegration

As we already mentioned, IMU measurements have a high frequency, while the frames usually don't. Therefore, in order to handle the high-frequency IMU data, we preintegrate several consecutive IMU measurements between frames and compute the overall delta state, this delta state will later be used to compute the residual, that is the so-called preintegrated factor. We basically use the framework developed in basalt but neglect all the bias terms to compute the delta state and calculate the preintegrated factor which is expected to be minimized. The illustration of IMU preintegration between frames can be seen in figure below, where we compute the preintegrated factor with all the IMU measurements between consecutive frames. This preintegrated factor would later serve as a constraint when we perform the optimization.

$\begin{aligned} \textbf{r}_{\Delta \textbf{R}}&=\text{Log}(\Delta \textbf{R}\textbf{R}_j^T\textbf{R}i) \\ \textbf{r}{\Delta \textbf{v}}&=\textbf{R}_i^T(\textbf{v}_j-\textbf{v}i-\textbf{g}\Delta t)-\Delta \textbf{v} \\ \textbf{r}{\Delta \textbf{p}}&=\textbf{R}_i^T(\textbf{p}_j-\textbf{p}_i-\frac{1}{2}\textbf{g}\Delta t^2)-\Delta \textbf{p} \end{aligned}$

Demo

https://youtu.be/aNgcuXywrX4

https://youtu.be/fA9gDHygKfg