• Home
  • About
    • Ziqing Zhao photo

      Ziqing Zhao

      The shortest answer is doing.

    • Learn More
    • Email
    • LinkedIn
    • Github
  • Posts
    • All Posts
    • All Tags
  • Projects

Image Formation

12 Mar 2019

Reading time ~3 minutes

[1] R. Szeliski, Computer Vision: Algorithms and Applications, 2010.
[2] R. Szeliski, A. Hai-zhou, Computer Vision: Algorithms and Applications, Tsinghua University Press, 2012.

1. Geometric Primitives and Transformations

1.1 Geometric Primitives

2D points. A 2D point can be represented in Cartesian form \(\mathbf{x} = (x, y) \in \mathbb{R}^2\) or in homogeneous coordinates \(\tilde{\mathbf{x}} = (\tilde{x}, \tilde{y}, \tilde{w}) \in \mathbb{P}^2\) with \(\mathbb{P}^2 = \mathbb{R}^3 \setminus \{(0,0,0)\}\). The homogeneous vector relates to its Cartesian counterpart through \(\tilde{\mathbf{x}} = (\tilde{x}, \tilde{y}, \tilde{w}) = \tilde{w}(x, y, 1) = \tilde{w}\,\bar{\mathbf{x}},\) where \(\bar{\mathbf{x}} = (x, y, 1)\) is the augmented vector.

2D lines. A line expressed in homogeneous coordinates \(\tilde{\mathbf{l}} = (a, b, c)\) satisfies \(\bar{\mathbf{x}} \cdot \tilde{\mathbf{l}} = ax + by + c = 0\). The intersection of two lines is \(\tilde{\mathbf{x}} = \tilde{\mathbf{l}}_1 \times \tilde{\mathbf{l}}_2\), whereas the line through two points is \(\tilde{\mathbf{l}} = \tilde{\mathbf{x}}_1 \times \tilde{\mathbf{x}}_2\). Least-squares fitting recovers a line from noisy points or the best intersection of almost-concurrent lines.

2D conics. Conics are described by \(\tilde{\mathbf{x}}^{T} Q \tilde{\mathbf{x}} = 0\) and play a key role in multi-view geometry and camera calibration.

3D points. Analogously, a 3D point can be written as \(\mathbf{x} = (x, y, z) \in \mathbb{R}^3\) or \(\tilde{\mathbf{x}} = (\tilde{x}, \tilde{y}, \tilde{z}, \tilde{w}) \in \mathbb{P}^3\) with augmented form \(\bar{\mathbf{x}} = (x, y, z, 1)\) so that \(\tilde{\mathbf{x}} = \tilde{w}\,\bar{\mathbf{x}}\).

3D planes. A plane with homogeneous parameters \(\tilde{\mathbf{m}} = (a, b, c, d)\) satisfies \(\bar{\mathbf{x}} \cdot \tilde{\mathbf{m}} = ax + by + cz + d = 0\).

3D lines. A line defined by two points \((\mathbf{p}, \mathbf{q})\) can be parameterized as \(\mathbf{r} = (1-\lambda)\mathbf{p} + \lambda \mathbf{q}\) with \(0 \le \lambda \le 1\). Homogeneously, \(\tilde{\mathbf{r}} = \mu \tilde{\mathbf{p}} + \lambda \tilde{\mathbf{q}}\), which has excess degrees of freedom. Plücker coordinates remedy this with \(\mathbf{L} = \tilde{\mathbf{p}}\,\tilde{\mathbf{q}}^{T} - \tilde{\mathbf{q}}\,\tilde{\mathbf{p}}^{T},\) subject to \(\det(\mathbf{L}) = 0\), leaving four degrees of freedom.

3D quadrics. Quadrics mirror conics via \(\tilde{\mathbf{x}}^{T} Q \tilde{\mathbf{x}} = 0\).

1.2 Two-Dimensional Transformations

Planar transformations include translation, Euclidean transforms (rotation plus translation), similarity transforms (uniform scale + Euclidean), affine transforms, and projective transforms.

Translation. \(x' = x + t, \qquad x' = [\,\mathbf{I} \ \ t\,]\bar{x},\) or in homogeneous form, \(\bar{x}' = \begin{bmatrix} \mathbf{I} & t \\ 0^{T} & 1 \end{bmatrix} \bar{x},\) which enables chaining via \(3\times3\) matrices.

Euclidean transformation (rotation + translation). \(x' = \mathbf{R}x + t = [\,\mathbf{R} \ \ t\,]\bar{x},\) where \(\mathbf{R} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}\) is an orthogonal rotation matrix satisfying \(\mathbf{R}\mathbf{R}^{T} = \mathbf{I}\) and \(|\mathbf{R}|=1\).

Similarity transformation (scale + rotation + translation). \(x' = s\mathbf{R}x + t = [\,s\mathbf{R} \ \ t\,]\bar{x} = \begin{bmatrix} a & -b & t_x \\ b & a & t_y \end{bmatrix}\bar{x},\) with \(a^2 + b^2 = s^2\). Similarities preserve angles between lines.

Affine transformation. \(x' = \mathbf{A}\bar{\mathbf{x}}, \qquad x' = \begin{bmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{bmatrix}\bar{x},\) where \(\mathbf{A}\) is \(2\times3\). Affine maps preserve parallelism.

Projective transforms extend this framework by allowing arbitrary \(3 \times 3\) homogeneous matrices and are handled similarly, though they are not detailed further here.



Computer Vision Share Tweet +1