Linear problems like , with the unknown solution, appear in many applications. In theory, one needs to invert matrix such that . When is very small, for example a 3×3 matrix, one can compute the inverse matrix if its determinant is non-zero. When the matrix is large, a direct inversion of the matrix is in general not a good idea, so other methods must be used.

Since we are interested in , rather than the inverse of , it is possible to search for through the minimization of a quadratic function , for which the solution of the linear problem, is a minimizer of the quadratic function. To see this, the solution of the minimization problem is located where its its gradient at becomes zero, provided that the minimization is unique. Since the gradient of the minimization problem is exactly our linear problem, the minimizer solves the linear problem.

A method for solving this minimization problem is for example the Gradient Descent method. Please note that Gradient Descent is in general used for non-linear problems, but can also used for linear problems. Although the method is not very efficient for solving linear problems, it gives a good insight in other methods like the Conjugate Gradient method.

# Derivation of Gradient Descent for solving linear problems

As the name of the method suggests, the approximation of solution descents along the gradient of the minimization problem, until it reaches the minimum of the quadratic function. The gradient of at is just . Suppose this gradient direction at is known as , one takes a step from in direction until a minimum is found, i.e., , with the unknown step-size. At this minimum, the gradient of and are perpendicular. Using this observation, the step-size can be computed as follows:

Update rule for the approximation, with the unknown step-size.

Left-multiply by .

Add .

Please note that residual is the negative gradient of , so is computed as follows by using the property that the gradient and are perpendicular.

Since at the minimum, we have:

Given , both and can be computed. In the Gradient Descent method, is set using . By repeating this procedure, eventually reaches the global minimizer for , and so a solution for the linear problem is obtained. When solving a non-linear problem, is set small enough to ensure that , and is set to .