Advanced Analysis, Notes 5: Hilbert spaces (application: Von Neumann’s mean ergodic theorem)
by Orr Shalit
In this lecture we give an application of elementary operators-on-Hilbert-space theory, by proving von Neumann’s mean ergodic theorem. See also this treatment by Terry Tao on his blog.
For today’s lecture we will require the following simple fact which I forgot to mention in the previous one.
Exercise A: Let . Then
.
1. The basic problem of ergodic theory
In the study of discrete dynamical systems, one considers the action of some map on a space
. Ergodic theory is the part of dynamical systems theory in which one is interested in the action of a measure preserving transformation
on a measure space
.
Perhaps surprisingly, the origins of ergodic theory are in mathematical physics – statistical mechanics, to be precise. If you are interested in learning more about how what we discuss here is related to physics, I suggest you take a look at the section “Ergodic theory: an introduction” in Reed and Simon’s Volume I of the series Methods of Modern Mathematical Physics.
Since our goal here is merely to illustrate how operator theory can be applied to ergodic theory, we will work in the simplest possible setup: our space will be the unit interval
, and our transformation
will be piecewise continuous (we will not, however, need to assume that
is invertible). Anybody who took a course in measure theory can replace
and
by whatever they desire. The operator theoretic details will remain the same.
We assume further that is measure preserving. By this we mean that for all
,
It is not entirely clear at first whether there are interesting examples of measure preserving maps.
Examples:
- For
, let
. Then
and this is equal to
- Let
for
and
for
. Then
, so
is measure preserving. (Remark: Note that
, so it does not do what one might naively think that a “measure preserving” map should do. However,
does satisfy that the measure of
is equal to the measure of
for every
, and this turns out to be the important property).
for
and
for
.
- etc.
We pick a point , and we start moving it around the space
by applying
again and again. We get a sequence
in
. The basic problem in ergodic theory is to determine the statistical behavior of this sequence. To quantify the phrase “statistical behavior”, one may study the large
behavior of the so-called time averages
for functions , say.
Why would this be interesting?
Suppose, for example, that is the indicator function of some interval:
if and only if
, otherwise
. In this case the sum
counts the number of times that
visited the interval
in the first
steps that
takes along the sequence
. The time averages therefore measure the fraction of the “time” that the sequence
spends inside
. When one takes the limit
, if that limit exists, one gets a measure of how much time the sequence spends inside
in the long run. If the sequence
behaves in a completely “random” manner, what would be your best guess for the limit of
? If we think of the probability of
being at a certain point in
as being uniformly distributed on
, then the best guess, intuitively, would be that
.
Note that , so our guess is
(*)
for indicator functions. Now maybe we would like to use some more complicated function to measure the distribution of the sequence
. But if (*) holds for indicator function then it holds for step functions, and then also presumably for other functions by some limiting process.
The equality (*) is called also, sometimes, the “Ergodic Hypothesis”. It describes a situation where taking time average (that is, starting at a point and taking the average of repeated measurements
) is equal to the space average
(which is the expected value of
on the probability space
). I certainly do not claim that we have justified (*); in fact (*) does not always hold. All that we said is that we might expect (*) to hold if the sequence
is spread out on the interval in a random or uniform way. The various ergodic theorems proved in ergodic theory make this very loose discussion precise.
2. The mean ergodic theorem
The mean ergodic theorem discusses the validity of (*) in the setting of . (The reason why it is called mean is that convergence in the
norm used to be called convergence in the mean). The first thing we have to do is to make sense of the composition
in
. The problem is that if
, then
is not defined by the values it attains at points
, so it is not clear what we mean by
. This problem is solved very smoothly in the setting of Hilbert space.
Define a transformation by
. By our assumptions on
,
is well defined, linear and isometric on
, which is a dense subspace of
. By Exercise B from the previous lecture,
extends in a unique way to an isometry
. We continue to write
even for
.
Definition 1: A transformation is said to be ergodic if
implies that
.
Theorem 2 (Mean ergodic theorem): Let be a measure preserving (piecewise continuous) ergodic transformation. Then for all
in the norm.
Remark: Recalling (*), one immediately sees the weakness of this theorem. It does not tell us what happens for particular in the time averages, but only gives us norm convergence of the sequence of functions
. Typically, pointwise (or almost everywhere pointwise) convergence theorems are harder to prove.
The operator that we defined above is an isometry, and in particular it satisfies
. An operator
satisfying
is said to be a contraction. Theorem 2 is an immediate consequence of the following theorem.
Theorem 3: Let be a Hilbert space and let
be a contraction. Denote
. Then for all
,
in the norm.
To see how Theorem 2 follows from Theorem 3, note simply that if is ergodic, then
is the one dimensional space of constant functions, so (by Theorem 15 in Notes 2)
.
Proof: The first step of the proof requires a bit of inspiration. Recall that . Let
be a unit vector, and consider
. From the Cauchy-Schwarz inequality,
, so
. From the equality part of Cauchy-Schwarz this can only happen when
. But reconsidering
it is evident that
. Thus
. Because of the symmetry of the * operations, we get a nice little result: for a contraction
,
.
Next, we try to understand what the decomposition looks like. Note that
. By our nice little result,
. Therefore (by Proposition 9 in Notes 4),
. So
induces the decomposition
.
If , then
, and the conclusion of the theorem holds for this
.
Let . Then
has the form
for some
. We compute
.
We have used the fact that , so that
is bounded.
Now consider , and fix
. There is some
such that
. For some
,
for all
. Then for
we break up
as
to find that
demonstrating that . Thus the theorem holds for all
.
Finally, since , the theorem holds on all of
.