Recovering Position on a Line

$$ \begin{bmatrix} s u \ s v \ s \end{bmatrix} = \begin{bmatrix} p{00} & p{01}\ p{10} & p{11}\ p{20} & p{21}\ \end{bmatrix}\begin{bmatrix} \lambda X \ \lambda \end{bmatrix}

$$ \begin{aligned} &u = \frac{su}{s} = \frac{p{00}\lambda X + p{01}\lambda}{p{20} \lambda X + p{21} \lambda}\ \Rightarrow &X = \frac{p{01} - u p{21}}{u p{20} - p{00}}\ &v = \frac{sv}{s} = \frac{p{10}\lambda X + p{11}\lambda}{p{20} \lambda X + p{21} \lambda}\ \Rightarrow &X = \frac{p{11} - u p{21}}{v p{20} - p{00}}\ \end{aligned}

Only one is needed.

Simplification:

$$ \begin{bmatrix} s u \ s \end{bmatrix} = \begin{bmatrix} p{00} & p{01}\ p{20} & p{21}\ \end{bmatrix}\begin{bmatrix} \lambda X \ \lambda \end{bmatrix}

$$ \begin{bmatrix} s v \ s \end{bmatrix} = \begin{bmatrix} p{10} & p{11}\ p{20} & p{21}\ \end{bmatrix}\begin{bmatrix} \lambda X \ \lambda \end{bmatrix}

Degrees of freedom: 3

Recovering Position on a Plane

$$ \begin{bmatrix} s u \ s v \ s \end{bmatrix} = \begin{bmatrix} p{00} & p{01} & p{02}\ p{10} & p{11} & p{12}\ p{20} & p{21} & p_{22}\ \end{bmatrix}\begin{bmatrix} \lambda X \ \lambda Y \ \lambda \end{bmatrix}

Let $$Q$$ be the inverse of $$P^p$$, multiply both sides by $$Q$$:

$$ \begin{bmatrix} q{00} & q{01} & q{02}\ q{10} & q{11} & q{12}\ q{20} & q{21} & q_{22}\ \end{bmatrix}\begin{bmatrix} s u \ s v \ s \end{bmatrix} = \begin{bmatrix} \lambda X \ \lambda Y \ \lambda \end{bmatrix}

$$ X = \frac{\lambda X}{\lambda} = \frac{q{00}u + q{01}v + q{02}}{q{20}u + q{21}v + q{22}}\ Y = \frac{\lambda Y}{\lambda} = \frac{q{10}u + q{11}v + q{12}}{q{20}u + q{21}v + q{22}}

Recovering Position in 3D Space

$$ \begin{bmatrix} s u \ s v \ s \end{bmatrix} = \begin{bmatrix} p{00} & p{01} & p{02} & p{03}\ p{10} & p{11} & p{12} & p{13}\ p{20} & p{21} & p{22} & p{23}\ \end{bmatrix}\begin{bmatrix} \lambda X \ \lambda Y \ \lambda Z \ \lambda \end{bmatrix}

$$ u = \frac{su}{s} = \frac{p{00}X + p{01}Y + p{02}X + p{03}}{p{20}X + p{21}Y + p{22}X + p{23}}\ v = \frac{sv}{s} = \frac{p{10}X + p{11}Y + p{12}X + p{13}}{p{20}X + p{21}Y + p{22}X + p{23}}\

Each equation defines a plane. Their intersection defines a line (ray) along which the world point must lie:

It is generally not possible to recover the world position of an image point even with a calibrated camera. But if we have the world point in another view, it will also lie on the ray of that view, and the 3D location of the world point can be resolved.

This is the essence of stereo vision.

One point observed by one camera gives 2 equations with 3 unknowns; the point observed by another camera gives another 2 equations -> over-constrained with 4 equations to solve for 3 unknowns -> linear least square.

Triangulation

World point $$X$$ observed as pixel coordinates $$(u, v)$$. Given CCD parameters, translated into image plane coordinates $$(x, y)$$:

$$ u = k_u x + u_0 \Rightarrow x = \frac{u-u_0}{k_u}\ v = k_v y + v_0 \Rightarrow y = \frac{v-v_0}{k_v}

Given focal length $$f$$, translate $$(x, y)$$ into a ray vector $$p$$ in the 3D space, which is related to the camera coordinates $$X_c$$ via unknown depth $$Z_c$$:

$$ p = \begin{bmatrix} x \ y \ f \end{bmatrix} = \begin{bmatrix} \frac{fX_c}{Z_c} \ \frac{fY_c}{Z_c} \ \frac{fZ_c}{Z_c} \end{bmatrix} = \frac{f}{Z_c}X_c

Given two camera coordinate systems related by known rotation $$R$$ & translation $$T$$ s.t. $$X_c' = RX_c + T$$, and since $$X_c'$$ and $$p'$$ are parallel, we can derive the triangulation equation:

$$ X_c \times p' = 0\(RX_c + T) \times p' = 0 \ (\frac{Z_c}{f}Rp + T) \times p' = 0

The only unknown is the depth, and the triangulation equation provides 3 equations for it. If the equations are inconsistent, the 2 rays do not intersect, and the image features in the two cameras are not correspondent. Otherwise, solve $$Z_c$$ and recover the full 3D scene structure:

$$ X_c = \frac{Z_c}{f}p

Epipolar Geometry

Epipolar planes defined by different world points intersect along the baseline
Epipolar lines hence intersect at the epipole

Essential Matrix

$$ \begin{aligned} X_c' &= RX_c + T\ T \times X_c' &= T \times RX_c + T \times T \ T \times X_c' &= T \times RX_c\ X_c' \cdot (T \times X_c') &= X_c' \cdot (T \times RX_c)\ X_c' \cdot (T \times RX_c) &= 0\ X_c' \cdot ([T]_x RX_c) &= 0\ X_c' E X_c &= 0 \end{aligned}

where $$E = [T]_x R$$.

Skew-symmetric matrix

$$[T]_x = \begin{bmatrix} 0 & -T_z & T_y \ T_z & 0 & -T_x \ -T_y & T_x & 0 \end{bmatrix}$$

where

$$T = \begin{bmatrix} T_x \ T_y \ T_z \end{bmatrix}$$

Note:

A skew-symmetric matrix has 2 identical singular values & 1 zero singular value
Multiplication by a rotation matrix does not change the singular values of a matrix
Hence, $$E$$ also has 2 identical singular values & 1 zero singular value

Degree of freedom: 5 (3 for R, 2 for T, unknown scale)

Epipolar Constraint

The equation also holds for ray vectors:

$$ p'^T E p = 0\ \Rightarrow p'^T N = 0

where $$N = Ep = [T]_x Rp$$ is the normal vector of the epipolar plane. This constrains $$p'$$ to lie on the epipolar plane.

If a point x is observed in one image, its corresponding point in the other image must line on the epipolar line -> reduces search for correspondence from 2D to 1D.

Locations of Epipoles

Consider epipole $$e$$ in the image of the left camera with position vector $$p_e$$. $$\lambda T$$ is in the right camera's coordinate system.

Relate the coordinate systems by $$R$$ and $$T$$:

$$ \begin{aligned} \lambda T &= R p_e + T\ \lambda(T \times T) &= T \times R p_e + T \times T\ [T]_x R P_e &= 0\ Ep_e &= 0 \end{aligned}

-> Epipole in the left image lies in the right nullspace of $$E$$.

-> Epipole in the right image lies in the left nullspace of $$E$$.

To have a non-trivial solution for epipole, $$E$$ must be non-invertible ($$det E = 0$$), thus max rank 2.

Decomposition

$$E = [T]_x R = SR$$, $$S$$: skew-symmetric, $$R$$: orthogonal

Suppose SVD gives $$E = UDV^T$$, there are 2 possible factorizations:

$$ S = UZU^T\ R = UWV^T \text{ or } UW^TV^T

where

$$Z = \begin{bmatrix} 0 & 1 & 0 \ -1 & 0 & 0 \ 0 & 0 & 0 \end{bmatrix}$$, $$W = \begin{bmatrix} 0 & -1 & 0 \ 1 & 0 & 0 \ 0 & 0 & -1 \end{bmatrix}$$.

Since $$ST = T \times T = 0$$, $$T = u_2$$ (last column of $$U$$).
Since $$E$$ is only determined up to an unknown scale, so is $$T$$.

Four possible configurations:

$$R = UWV^T$$, $$T = u_2$$
$$R = UWV^T$$, $$T = -u_2$$
$$R = UW^TV^T$$, $$T = u_2$$
$$R = UW^TV^T$$, $$T = -u_2$$

Differs in the direction of translation vector & rotation of the second camera. Only one will give a reconstructed point in front of both cameras, hence testing if a single triangulated point is in front of both cameras is sufficient to select the true configuration.

Stereo Vision I