# Structure from Motion (SfM)¶

Theia has implementations of many common Structure from Motion (sfm) algorithms. We attempt to use a generic interface whenever possible so as to maximize compatibility with other libraries.

## Projection Matrix¶

We provide two convenience matrices that are commonly used in multiview geometry. The first is a `TransformationMatrix` which is an affine transformation matrix composed of rotation and translation of the form: \(\left[R | t\right]\), i.e., the extrinsic parameters of a camera. The `TransformationMatrix` is merely a typedef of the Eigen affine transformation matrix. Similarly, a `ProjectionMatrix` class is defined that representsthe intrinsic and extrinsic parameters, nameley matrices of the form: \(K\left[R | t \right]\) where \(K\) is a 3x3 matrix of the camera intrinsics (e.g., focal length, principle point, and radial distortion). The `ProjectionMatrix` is merely a typedef of an Eigen 3x4 matrix.

## Camera and CameraPose¶

At the core of SfM are cameras which provide us with observations of 3D points. Theia uses a `Camera` struct to maintain all imaging and view information. This includes information about the image captured, the feature positions (both in the image planes and in 3D space), and feature descriptors.

```
struct Camera {
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
Camera() {}
explicit Camera(const CameraPose& pose) : pose_(pose) {}
// The camera pose describing the position, orientation, and intrinsics of the
// camera.
CameraPose pose_;
// Width of the camera in pixels.
int width_;
// Height of the camera in pixels.
int height_;
// Original measured feature positions.
std::vector<Eigen::Vector2d> feature_position_2D_distorted_;
// Feature positions after correcting for radial distortion.
std::vector<Eigen::Vector2d> feature_position_2D_;
// Descriptors.
std::vector<Eigen::VectorXf> descriptors_;
std::vector<Eigen::BinaryVectorX> binary_descriptors_;
// 3D feature IDs. This assumes that a container of 3D feature positions is
// owned elsewhere. This is essentially the track ID, and the main usage is
// for reconstructions where many 2D points correspond to a single 3D point.
std::vector<size_t> feature_3D_ids_;
// If the 3D positions are stored locally, then we can keep them in a
// container here.
std::vector<Eigen::Vector3d> feature_positions_3D_;
};
```

The pose of the camera is contained in the `CameraPose` object. This
class contains extrinsic (rotation and translation) and intrinsic calibration
(focal length, principle point, radial distortion) information for the camera,
and provides many convenience functions for various image and point
transformations.

classCameraPose¶

CameraPose()¶The default constructor. Sets all values to identity values.

The

CameraPoseclass must be initialized with theInitializePose()method rather than with the constructor. There are several variations of this method so as to be flexible to the information that the user has available:

- void
InitializePose(const Eigen::Matrix3d&rotation, const Eigen::Vector3d&translation, const Eigen::Matrix3d&calibration, const doublek1, const doublek2, const doublek3, const doublek4)¶Initialize the camera pose with the full extrinsic (rotation and translation) and intrinsic (calibration matrix and radial distortion) parameters. The extrinsic parameters should provide world-to-camera transformations.

- void
InitializePose(const Eigen::Matrix<double, 3, 4>&projection_matrix, const doublek1, const doublek2, const doublek3, const doublek4)¶Initialize the pose with a projection matrix given by P = K * [R | t], where K is the calibration matrix, R is the rotation matrix, and t is the translation. The projection matrix provided should be a world-to-image transformation (as opposed to world-to-camera).

- void
InitializePose(const Eigen::Matrix<double, 3, 4>&transformation_matrix, const Eigen::Matrix3d&calibration, const doublek1, const doublek2, const doublek3, const doublek4)¶Initialize the pose with a given transformation matrix that defines the world-to-camera transformation.

- void
InitializePose(const CameraPose&pose)¶Copy constructor.

It is important to be able to access components of the camera pose, so we provide getter functions for all relative information:

- Eigen::Matrix3d
rotation_matrix()const¶Get the rotation componenet of the transformation matrix.

- Eigen::Vector3d
translation()const¶Get the translation componenet of the transformation matrix.

- Eigen::Vector3d
position()const¶Get the camera position in the world coordinate system defined as position = -R’ * t.

- Eigen::Matrix3d
calibration_matrix()const¶Get the 3x3 camera calibration matrix defined by K = diag(f, f, 1).

- double
focal_length()const¶Get the focal length of the camera.

- void
radial_distortion(double*k1, double*k2, double*k3, double*k4)const¶Returns the radial distortion parameters.

- Eigen::Matrix<double, 3, 4>
projection_matrix()const¶Returns the full projection matrix that describes the camera pose.

- Eigen::Matrix<double, 3, 4>
transformation_matrix()const¶Returns the transformation matrix \(T = [R | t]\) i.e., the extrinsic parameters.

Finally, the most important functionality of a camera is that it projects points in the world into the image. Modeling this projection is a crucial part of structure from motion (and all projective geometry!), so we provide transformation functions to perform these tasks for you. We define three coordinate systems: the world coordinate system, camera coordinate system, and image coordinate system. The world coordinate system is defined as the 3D coordinate system relative to some world origin. 3D points in SfM models are typically defined with respect to the world coordinate system. The camera coordinate system is the coordinate system centered around the camera. That is, with the camera at the origin looking down the z-axis. Finally, the image coordinate system is the camera coordinate system projected onto the image plane. The image coordinate system is defined in pixels, and may only be reconciled with respect to the other coordinate systems when the intrinsic paramters are known.

So, given a point in the world coordinate system, \(X_w\), we can transform that point with a translation and rotation \(T=[R | t]\) such that \(X_c = T * X_w\) is a point in the camera coordinate system. To transform \(X_c\) into image coordinates, we must apply the camera calibration matrix, \(K\) such that \(X_i = K * T * X_w\) is a point in the image plane (in pixels).

The functions below provide these transformations for a single point, as well as optimized transformations for transforming multiple points at the same time.

- void
WorldToCamera(const Eigen::Vector3d&world_point, Eigen::Vector3d*camera_point)const¶Transforms a point from the world coordinate system to the camera coordinate system.

- void
WorldToCamera(const std::vector<Eigen::Vector3d>&world_point, std::vector<Eigen::Vector3d>*camera_point)const¶Transformation method for multiple points.

- void
CameraToWorld(const Eigen::Vector3d&camera_point, Eigen::Vector3d*world_point)const¶Transforms a point from the camera coordinate system to the world coordinate system.

- void
CameraToWorld(const std::vector<Eigen::Vector3d>&camera_point, std::vector<Eigen::Vector3d>*world_point)const¶Transformation method for multiple points.

- void
CameraToImage(const Eigen::Vector3d&camera_point, Eigen::Vector2d*image_point)const¶Projects the 3D points in camera coordinates into the image plane using the calibration matrix of the camera.

- void
CameraToImage(const std::vector<Eigen::Vector3d>&camera_point, std::vector<Eigen::Vector2d>*image_point)const¶Projection method for multiple points. NOTE: this method is void, and does not indicate whether points are in front of behind the camera.

- bool
WorldToImage(const Eigen::Vector3d&world_point, Eigen::Vector2d*image_point)const¶Projects the 3D points in world coordinates into the image plane using the projection matrix of the camera. Returns true if the point is in front of the camera and false otherwise.

- void
WorldToImage(const std::vector<Eigen::Vector3d>&world_point, std::vector<Eigen::Vector2d>*image_point)const¶Projection method for multiple points. NOTE: this method is void, and does not indicate whether points are in front of behind the camera.

Correcting radial distortion can be a common operation for SfM so that the images may be as geomtetrically correct as possible. The following two functions will undistort image points based on the intrinsic paramters of the camera.

- void
UndistortImagePoint(const Eigen::Vector2d&distorted_point, Eigen::Vector2d*undistorted_point)const¶Undistorts the image point using the radial distortion parameters.

- void
UndistortImagePoint(const std::vector<Eigen::Vector2d>&distorted_point, std::vector<Eigen::Vector2d>*undistorted_point)const¶Undistort multiple points at the same time.

## 2-View Triangulation¶

Triangulation in structure from motion calculates the 3D position of an image coordinate that has been tracked through several, if not many, images.

- bool
Triangulate(const ProjectionMatrix&pose_left, const ProjectionMatrix&pose_right, const Eigen::Vector2d&point_left, const Eigen::Vector2d&point_right, Eigen::Vector3d*triangulated_point)¶2-view triangulation using the DLT method described in [HartleyZisserman]. The poses are the (potentially calibrated) poses of the two cameras, and the points are the 2D image points of the matched features that will be used to triangulate the 3D point. If there was an error computing the triangulation (e.g., the point is found to be at infinity) then

falseis returned. On successful triangulation,trueis returned.

## N-View Triangulation¶

- bool
TriangulateNViewSVD(const std::vector<ProjectionMatrix>&poses, const std::vector<Eigen::Vector2d>&points, Eigen::Vector3d*triangulated_point)¶

- bool
TriangulateNView(const std::vector<ProjectionMatrix>&poses, const std::vector<Eigen::Vector2d>&points, Eigen::Vector3d*triangulated_point)¶We provide two N-view triangluation methods that minimizes an algebraic approximation of the geometric error. The first is the classic SVD method presented in [HartleyZisserman]. The second is a custom algebraic minimization. Note that we can derive an algebraic constraint where we note that the unit ray of an image observation can be stretched by depth \(\alpha\) to meet the world point \(X\) for each of the \(n\) observations:

\[\alpha_i \bar{x_i} = P_i X,\]for images \(i=1,\ldots,n\). This equation can be effectively rewritten as:

\[\alpha_i = \bar{x_i}^\top P_i X,\]which can be substituted into our original constraint such that:

\[\bar{x_i} \bar{x_i}^\top P_i X = P_i X\]\[0 = (P_i - \bar{x_i} \bar{x_i}^\top P_i) X\]We can then stack this constraint for each observation, leading to the linear least squares problem:

\[\begin{split}\begin{bmatrix} (P_1 - \bar{x_1} \bar{x_1}^\top P_1) \\ \vdots \\ (P_n - \bar{x_n} \bar{x_n}^\top P_n) \end{bmatrix} X = \textbf{0}\end{split}\]This system of equations is of the form \(AX=0\) which can be solved by extracting the right nullspace of \(A\). The right nullspace of \(A\) can be extracted efficiently by noting that it is equivalent to the nullspace of \(A^\top A\), which is a 4x4 matrix.

## Similarity Transformation¶

- void
AlignPointCloudsICP(const intnum_points, const doubleleft[], const doubleright[], doublerotation[3 * 3], doubletranslation[3])¶We implement ICP for point clouds. We use Besl-McKay registration to align point clouds. We use SVD decomposition to find the rotation, as this is much more likely to find the global minimum as compared to traditional ICP, which is only guaranteed to find a local minimum. Our goal is to find the transformation from the left to the right coordinate system. We assume that the left and right models have the same number of points, and that the points are aligned by correspondence (i.e. left[i] corresponds to right[i]).

- void
AlignPointCloudsUmeyama(const intnum_points, const doubleleft[], const doubleright[], doublerotation[3 * 3], doubletranslation[3], double*scale)¶This function estimates the 3D similiarty transformation using the least squares method of [Umeyama]. The returned rotation, translation, and scale align the left points to the right such that \(Right = s * R * Left + t\).

- void
GdlsSimilarityTransform(const std::vector<Eigen::Vector3d>&ray_origin, const std::vector<Eigen::Vector3d>&ray_direction, const std::vector<Eigen::Vector3d>&world_point, std::vector<Eigen::Quaterniond>*solution_rotation, std::vector<Eigen::Vector3d>*solution_translation, std::vector<double>*solution_scale)¶Computes the solution to the generalized pose and scale problem based on the paper “gDLS: A Scalable Solution to the Generalized Pose and Scale Problem” by Sweeney et. al. [SweeneyGDLS]. Given image rays from one coordinate system that correspond to 3D points in another coordinate system, this function computes the rotation, translation, and scale that will align the rays with the 3D points. This is used for applications such as loop closure in SLAM and SfM. This method is extremely scalable and highly accurate because the cost function that is minimized is independent of the number of points. Theoretically, up to 27 solutions may be returned, but in practice only 4 real solutions arise and in almost all cases where n >= 6 there is only one solution which places the observed points in front of the camera. The rotation, translation, and scale are defined such that: \(sp_i + \alpha_i d_i = RX_i + t\) where the observed image ray has an origin at \(p_i\) in the unit direction \(d_i\) corresponding to 3D point \(X_i\).

ray_origin: the origin (i.e., camera center) of the image ray used in the 2D-3D correspondence.

ray_direction: Normalized image rays corresponding to model points. Must contain at least 4 points.

world_point: 3D location of features. Must correspond to the image_ray of the same index. Must contain the same number of points as image_ray, and at least 4.

solution_rotation: the rotation quaternion of the candidate solutions

solution_translation: the translation of the candidate solutions

solution_scale: the scale of the candidate solutions