Linear problems like $\mathbf{Ax} = \mathbf{b}$ , with $\mathbf{x}$ the unknown solution, appear in many applications. In theory, one needs to invert matrix $\mathbf{A}$ such that $\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}$ . When $\mathbf{A}$ is very small, for example a 3×3 matrix, one can compute the inverse matrix if its determinant is non-zero. When the matrix is large, a direct inversion of the matrix is in general not a good idea, so other methods must be used.

Since we are interested in $\mathbf{x}$ , rather than the inverse of $\mathbf{A}$ , it is possible to search for $\mathbf{x}$ through the minimization of a quadratic function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T\mathbf{Ax} - \mathbf{x}^T\mathbf{b}$ , for which the solution $\mathbf{x}$ of the linear problem, is a minimizer of the quadratic function. To see this, the solution of the minimization problem is located where its its gradient at $\mathbf{x}$ becomes zero, provided that the minimization is unique. Since the gradient of the minimization problem is exactly our linear problem, the minimizer solves the linear problem.

A method for solving this minimization problem is for example the Gradient Descent method. Please note that Gradient Descent is in general used for non-linear problems, but can also used for linear problems. Although the method is not very efficient for solving linear problems, it gives a good insight in other methods like the Conjugate Gradient method.

Derivation of Gradient Descent for solving linear problems

As the name of the method suggests, the approximation of solution $\mathbf{x}$ descents along the gradient of the minimization problem, until it reaches the minimum of the quadratic function. The gradient of $f(\mathbf{x})$ at $\mathbf{x}$ is just $\mathbf{Ax} - \mathbf{b}$ . Suppose this gradient direction at $\mathbf{x}$ is known as $\mathbf{p}$ , one takes a step from $\mathbf{x}$ in direction $\mathbf{p}$ until a minimum is found, i.e., $\mathbf{x}_{i+1} = \mathbf{x}_i + \alpha \mathbf{p}_i$ , with $\alpha$ the unknown step-size. At this minimum, the gradient of $f(\mathbf{x})$ and $\mathbf{p}$ are perpendicular. Using this observation, the step-size $\alpha \mathbf{p}$ can be computed as follows:

Update rule for the approximation, with $\alpha$ the unknown step-size.

$\mathbf{x}_{i+1} = \mathbf{x}_i + \alpha \mathbf{p}_i$

Left-multiply by $-\mathbf{A}$ .

$-\mathbf{Ax}_{i+1} = -\mathbf{Ax}_i - \alpha\mathbf{Ap}_i$

Add $\mathbf{b}$ .

$\mathbf{b}-\mathbf{Ax}_{i+1} = \mathbf{b}-\mathbf{Ax}_i - \alpha\mathbf{Ap}_i$

$\mathbf{r}_{i+1} = \mathbf{r}_i - \alpha \mathbf{Ap}_i$

Please note that residual $\mathbf{r}$ is the negative gradient of $f$ , so $\alpha$ is computed as follows by using the property that the gradient and $\mathbf{p}$ are perpendicular.

$\mathbf{p}_i^T\mathbf{r}_{i+1} = \mathbf{p}_i^T\mathbf{r}_i - \alpha \mathbf{p}_i^T\mathbf{Ap}_i$

Since $\mathbf{p}_i^T\mathbf{r}_{i+1} = 0$ at the minimum, we have:

$\alpha = \frac{\mathbf{p}_i^T\mathbf{r}_i}{\mathbf{p}_i^T\mathbf{Ap}_i}$

Given $\alpha$ , both $\mathbf{x}_{i+1}$ and $\mathbf{r}_{i+1}$ can be computed. In the Gradient Descent method, $\mathbf{p}_{i+1}$ is set using $\mathbf{r}_{i+1}$ . By repeating this procedure, eventually $\mathbf{x}$ reaches the global minimizer for $f$ , and so a solution for the linear problem is obtained. When solving a non-linear problem, $\alpha$ is set small enough to ensure that $f(\mathbf{x}_{i+1}) < f(\mathbf{x}_i)$ , and $\mathbf{p}$ is set to $\nabla f(\mathbf{x})$ .