Newton-Raphson: The Rest of the Story
|
|
On the main page the tale of the monks and the sign-posts provides you with the basic concept of why Newton-Raphson iteration works. But it does nothing to delineate the conditions under which it does and does not work, nor does it explain the almost magical phenomenon of quadratic convergence. That is, it doesn't explain why the number of digits accuracy doubles with each iteration. On this page we take a deeper look at what's going on when we apply Newton-Raphson. We will see, using mathematical methods rather than metaphor, why it behaves the way it does, and we will see the pitfalls of the method.
Recall from the main page that Newton-Raphson is a method that allows us to approximate the solution to f(x) = 0 provided that f'(x) exists. To zero in on the solution for x, we chose a first guess, x0, and iterated
xn+1 = xn - |
f(xn) |
The way to analyze this is to define a new function, g(x):
g(x) = x - |
f(x) |
and then examine the more general case of iterating xn+1 = g(xn).
When you iterate a function, g(x), by taking g(x), then g(g(x)), then g(g(g(x))), and so on, if it converges to anything, it must converge to a value of x where x = g(x). It should be pretty clear why. If x ≠ g(x), then the iteration of g on x is still "in motion" and hasn't converged yet.
We call a value of x that solves the equation,
x = g(x), a fixed point of g. Iterating a function on
itself over and over, as we have done here with g(x), is called
fixed point iteration.
To the right you can see what happens when we do this with the example function,
g(x) = e-x, (shown in blue). The line,
y = x, is shown in green. The fixed point of any function is where
it intersects the line, y = x. The red shows the process of fixed point
iteration graphically. The first guess, x0, is at about
x = 0.95. The point labeled on the graph as x0 is
actually (x0,g(x0)). We locate the horizontal
position, x = g(x0) = x1, by following the horizontal
line from (x0,g(x0)) until we encounter the
line, y = x. In this case we have to follow it to the left. From there,
following the vertical line until we encounter the curve, y = g(x),
locates the point, (x1,g(x1)). From there we
follow the horizontal to y = x again and then the vertical to
y = g(x) again to find (x2,g(x2)).
And so on.
You are encouraged to try this example on your calculator. Choose a first guess, 0 < x0 < 1, then iterate the function, e-x, over and over on that value and see what happens. You do this by alternately pressing the "+/-" key and the ex key. Many calculators don't have an ex key, and you have to press "Shift" or "Inv" followed by the ln key to get the ex function.
The example here appears to converge, but this is not the case with every
example. If you let g(x) = 1/x, for example, choosing x0
as anything but either 0, 1 or -1 results in the iteration oscillating between
the same two values over and over again. So that is an example that does not converge, even
though it has fixed points at x = 1 and x = -1.
Another example is of nonconvergence is the function,
g(x) = -ln(x).
This has the same fixed point as g(x) = e-x. How do I know this?
Because if you take the fixed point equation for the e-x function,
x = e-x, then take the log of both sides and take the negative
of both sides of the result, you get -ln(x) = x, which is the fixed point
equation for -ln(x) function. Since the two functions have equivalent
fixed point equations, they must have identical fixed points.
The graph on the right shows a first guess of x0 of about 0.6. Just as we did with the first example, we follow the horizontal from (x0,g(x0)) to y = x, then follow the vertical to y = g(x) to find (x1,g(x1)). We do the same again from there to find (x2,g(x2)). And we find that x2 is farther from the fixed point than x0. It's pretty clear that the same pattern continues as you iterate this, and further iterations keep getting you farther and farther from the fixed point. So what is different about this function that keeps its fixed point iteration from converging when the first example did converge?
Suppose g(x) is any function that has a fixed point, and let xf be that fixed point. Hence g(xf) = xf. As before we start with a first guess, x0, and we iterate g(x) on it to get x1, x2, and so on.
Now suppose that xf is at the center of an interval, [xf-Δx, xf+Δx], where if both a and b are any two values in that interval, then it is always the case that
g(b) - g(a) |
≤ L |
for some positive L. This is called a Lipschitz condition with L being the Lipschitz constant (See a biography of Lipschitz). If your first guess, x0, is in this interval, [xf-Δx, xf+Δx], then the Lipschitz condition requires that
g(xf) - g(x0) |
≤ L |
Now notice in the numerator that g(xf) = xf and g(x0) = x1. The above becomes
xf - x1 |
≤ L |
which is the same as
|xf - x1| |
≤ L |
or equivalently |xf - x1| ≤ L|xf - x0|. If L < 1, then |xf - x1| < |xf - x0|. Or in other words, if the Lipschitz constant, L, is less than one, applying the iteration on the first guess gives us a result that is closer to the fixed point, xf, than the first guess was. In addition it guarantees that x1 is also in the interval, [xf-Δx, xf+Δx]. Hence when you iterate again on x1, the same thing happens, and you find that |xf - x2| ≤ L|xf - x1| ≤ L2|xf - x0|. This time we are guaranteed that x2 will also be in the Lipschitz interval. Clearly as you continue to iterate, every xn is in the Lipschitz interval and that at each iteration, |xf - xn| ≤ L|xf - xn-1|. From that you can conclude that |xf - xn| ≤ Ln|xf - x0|. Still restricting L < 1, it follows that |xf - xn| must converge to zero as n goes to infinity, because Ln goes to zero. From that you have to conclude that xn converges to the fixed point, xf, as n goes to infinity.
So what Lipschitz conditions can we attach to the functions, g(x) = e-x and g(x) = -ln(x)? It turns out that Lipschitz conditions are intimately related to a function's derivative through the mean value theorem. Suppose you have a Lipschitz interval, [xf-Δx, xf+Δx], in which |g'(x)| never exceeds some value, L. The mean value theorem asserts that if a and b are on the interval, then there is some value, c, between a and b where
g'(c) = |
g(b) - g(a) |
Since c is also in the interval, you know that |g'(c)| ≤ L. That means that absolute value of the quotient above can't be any larger than L. And by definition, that imposes a Lipschitz condition with a Lipschitz constant of L on the entire interval.
From the above you must conclude the following: If the fixed point of g(x) occurs within such an interval where |g'(x)| < 1 throughout, then by choosing your first guess, x0, in that interval, fixed point iteration will necessarily converge to the fixed point. Observe that g(x) = e-x satisfies this for 0 < x < ∞. Clearly its fixed point is in that range somewhere, and that is why fixed point iteration converges for e-x.
What about g(x) = -ln(x)? Suppose in some interval, [xf-Δx, xf+Δx], that |g'(x)| is never less than some value, L. You can make the same kind of argument as before using the mean value theorem that the Lipschitz constant for such an interval must be greater than or equal to the minimum magnitude of the derivative. In the case of g(x) = -ln(x), the fixed point is inside an interval where its derivative is greater than 1 -- that is, for this function, |g'(x)| > 1 for 0 < x < 1, and its fixed point is clearly in that range. So fixed point iteration on this function can never converge.
0.600000000000000 0.548811636094026 0.577635844258915 0.561223619437972 0.570510548780604 0.565236784068812 0.568225584061220 0.566529806871672 0.567491330229635 0.566945936306695 0.567255229510642 | 0.567079808452920 0.567179294918487 0.567122871061844 0.567154871224198 0.567136722466622 0.567147015386912 0.567141177817933 0.567144488553342 0.567142610891110 0.567143675794371 0.567143071841543 |
Getting back to g(x) = e-x, if you tried the fixed point iteration
for this on your calculator, you would see that although it converges, its convergence
doesn't have the sweetness of the convergence you get with Newton-Raphson. The table on the
right shows what happens when you start with x0 = 0.6 and
iterate this function on a 15 digit calculator. You can see that the approximations gain
one digit of accuracy roughly every four iterations
(actual value of the fixed point to 15 figures is
xf = 0.567143290409784). This is consistent with
0.5 < L < 0.6 -- that is L4 ≈ 0.1.
But it is clearly not showing the behavior we saw with Newton-Raphson, where the number
of digits accuracy doubled with each iteration. At the beginning of this discussion, we saw
that Newton-Raphson is, at its core, a form of fixed point iteration. So what is magic
about Newton-Raphson that makes it converge more rapidly than ordinary fixed point iteration?
When you do Newton-Raphson on the example, f(x) = x2 - 2 = 0, you are effectively doing fixed point iteration on
g(x) = x - |
x2 - 2 |
The graph on the right shows this function and its fixed point. What stands out on this graph is that at the fixed point, g'(x) appears to be zero. This begs the question: is it always the case that Newton-Raphson iterates a function whose derivative is zero at its fixed point?
In the general case of doing Newton-Raphson to find the solution to f(x) = 0, you do fixed point iteration on
g(x) = x - |
f(x) |
Using the quotient rule, we find
g'(x) = 1 - |
f'(x)2 - f(x)f"(x) |
f(x)f"(x) |
Recall that the fixed point of g(x), xf, is also the solution to f(xf) = 0. So at xf, the above equation tells us that g'(xf) = 0 whenever f'(xf) ≠ 0. This last condition is the same as saying that the point where f(x) = 0 is not also a critical point.
The key to seeing why Newton-Raphson converges the way it does is to see what the Lipschitz constant, L, is when x is close to the fixed point, xf. Suppose that at the nth iteration, xn is in the interval, [xf-Δx, xf+Δx]. Let C be the maximum value of
|
f"(x) |
over that interval. Then the Lipschitz constant, L, is no larger than |f(xf±Δx)C|. But f(x) goes to zero as x goes to xf. This means that as Δx gets closer and closer to zero, so does the Lipschitz constant. And as the Lipschitz constant gets closer and closer to zero, the iteration converges faster and faster. If L = 0.01 on the nth iteration for example, then the approximation for xn+1 for will be one hundred times closer to xf than was xn. And that will make the next Lipschitz constant one hundred times smaller as well, and so on. From this you can see why the number of accurate digits doubles with each iteration of Newton-Raphson.
You can also see from the above discussion that the conditions for this type of convergence is simply that the symmetric interval around xf that you start your iteration in must contain no critical points, f(x) must be differentiable throughout the interval, and it must be the case that
f(x)f"(x) |
< 1 |
throughout the interval, [xf-Δx, xf+Δx], from
which your first guess is taken.