### Advanced Analysis, Notes 5: Hilbert spaces (application: Von Neumann’s mean ergodic theorem)

#### by Orr Shalit

In this lecture we give an application of elementary operators-on-Hilbert-space theory, by proving von Neumann’s mean ergodic theorem. See also this treatment by Terry Tao on his blog.

For today’s lecture we will require the following simple fact which I forgot to mention in the previous one.

**Exercise A:** Let . Then .

#### 1. The basic problem of ergodic theory

In the study of discrete dynamical systems, one considers the action of some map on a space . Ergodic theory is the part of dynamical systems theory in which one is interested in the action of a measure preserving transformation on a measure space .

Perhaps surprisingly, the origins of ergodic theory are in mathematical physics – statistical mechanics, to be precise. If you are interested in learning more about how what we discuss here is related to physics, I suggest you take a look at the section “Ergodic theory: an introduction” in Reed and Simon’s Volume I of the series *Methods of Modern Mathematical Physics*.

Since our goal here is merely to illustrate how operator theory can be applied to ergodic theory, we will work in the simplest possible setup: our space will be the unit interval , and our transformation will be piecewise continuous (we will not, however, need to assume that is invertible). Anybody who took a course in measure theory can replace and by whatever they desire. The operator theoretic details will remain the same.

We assume further that is **measure preserving**. By this we mean that for all ,

It is not entirely clear at first whether there are interesting examples of measure preserving maps.

**Examples:**

- For , let . Then and this is equal to
- Let for and for . Then , so is measure preserving. (
**Remark:**Note that , so it does not do what one might naively think that a “measure preserving” map should do. However, does satisfy that the measure of is equal to the measure of for every , and this turns out to be the important property). - for and for .
- etc.

We pick a point , and we start moving it around the space by applying again and again. We get a sequence in . The basic problem in ergodic theory is to determine the statistical behavior of this sequence. To quantify the phrase “statistical behavior”, one may study the large behavior of the so-called **time averages**

for functions , say.

Why would this be interesting?

Suppose, for example, that is the indicator function of some interval: if and only if , otherwise . In this case the sum counts the number of times that visited the interval in the first steps that takes along the sequence . The time averages therefore measure the fraction of the “time” that the sequence spends inside . When one takes the limit , if that limit exists, one gets a measure of how much time the sequence spends inside in the long run. If the sequence behaves in a completely “random” manner, what would be your best guess for the limit of ? If we think of the probability of being at a certain point in as being uniformly distributed on , then the best guess, intuitively, would be that

.

Note that , so our guess is

(*)

for indicator functions. Now maybe we would like to use some more complicated function to measure the distribution of the sequence . But if (*) holds for indicator function then it holds for step functions, and then also presumably for other functions by some limiting process.

The equality (*) is called also, sometimes, the “Ergodic Hypothesis”. It describes a situation where taking time average (that is, starting at a point and taking the average of repeated measurements ) is equal to the **space average** (which is the expected value of on the probability space ). I certainly do not claim that we have justified (*); in fact (*) does not always hold. All that we said is that we might expect (*) to hold if the sequence is spread out on the interval in a random or uniform way. The various ergodic theorems proved in ergodic theory make this very loose discussion precise.

#### 2. The mean ergodic theorem

The mean ergodic theorem discusses the validity of (*) in the setting of . (The reason why it is called *mean* is that convergence in the norm used to be called *convergence in the mean)*. The first thing we have to do is to make sense of the composition in . The problem is that if , then is not defined by the values it attains at points , so it is not clear what we mean by . This problem is solved very smoothly in the setting of Hilbert space.

Define a transformation by . By our assumptions on , is well defined, linear and isometric on , which is a dense subspace of . By Exercise B from the previous lecture, extends in a unique way to an isometry . We continue to write even for .

**Definition 1:** *A transformation is said to be ergodic if implies that .*

**Theorem 2 (Mean ergodic theorem): ***Let be a measure preserving (piecewise continuous) ergodic transformation. Then for all *

*in the norm. *

**Remark:** Recalling (*), one immediately sees the weakness of this theorem. It does not tell us what happens for particular in the time averages, but only gives us norm convergence of the sequence of functions . Typically, pointwise (or almost everywhere pointwise) convergence theorems are harder to prove.

The operator that we defined above is an isometry, and in particular it satisfies . An operator satisfying is said to be a **contraction**. Theorem 2 is an immediate consequence of the following theorem.

**Theorem 3:** *Let be a Hilbert space and let be a contraction. Denote . Then for all , *

*in the norm. *

To see how Theorem 2 follows from Theorem 3, note simply that if is ergodic, then is the one dimensional space of constant functions, so (by Theorem 15 in Notes 2) .

**Proof: **The first step of the proof requires a bit of inspiration. Recall that . Let be a unit vector, and consider . From the Cauchy-Schwarz inequality, , so . From the equality part of Cauchy-Schwarz this can only happen when . But reconsidering it is evident that . Thus . Because of the symmetry of the * operations, we get a nice little result: *for a contraction ,*

.

Next, we try to understand what the decomposition looks like. Note that . By our nice little result, . Therefore (by Proposition 9 in Notes 4), . So induces the decomposition

.

If , then , and the conclusion of the theorem holds for this .

Let . Then has the form for some . We compute

.

We have used the fact that , so that is bounded.

Now consider , and fix . There is some such that . For some , for all . Then for we break up as to find that

demonstrating that . Thus the theorem holds for all .

Finally, since , the theorem holds on all of .