Advanced Analysis, Notes 6: Banach spaces (basics, the Hahn-Banach Theorems)

by Orr Shalit

Recall that a norm on a (real or complex) vector space X is a function \| \cdot \| : X \rightarrow [0, \infty) that satisfies for all x,y \in X and all scalars a the following:

  1. \|x\| = 0 \Leftrightarrow x = 0.
  2. \|ax\| = |a| \|x\|.
  3. \|x + y \| \leq \|x\| + \|y\|.

A vector space with a norm on it is said to be a normed space. Inner product spaces are normed spaces. However, many norms of interest are not induced by an inner product. In fact:

Exercise A: A norm is induced by an inner product if and only if it satisfies the parallelogram law:

\|x+y\|^2 + \|x-y\|^2 = 2 \|x\|^2 + 2\|y\|^2 .

Instead of solving this exercise, you might prefer to read this old paper where Jordan and von Neumann prove this.

Using Exercise A, it is not hard to show that some very frequently occurring norms, such as the sup norm on C(X) or the operator norm on B(H), are not induced by inner products. The latter example shows that even if one is working in the setting of Hilbert spaces one is led to study other normed spaces. We now begin our study of normed spaces and, particular, Banach spaces.

1. Basics

Our main objects of interest will now be normed vector spaces, over either the real or the complex numbers. Every norm naturally induces a metric. A Banach space is a normed vector space which is complete with respect to the norm.

Example 1: Let X be a topological space. The space C_b(X) of bounded continuous functions equipped with the sup norm \|f\|_\infty = \sup_{x\in X}|f(x)| is a Banach space. This is usually proved in a course in topology. At least when X is a metric space, the proof is similar to the proof from a course in analysis which shows that the uniform limit of continuous functions is continuous.

Example 2: Let p \geq 1, and let \ell^p = \{a = (a_n)_{n=1}^\infty : \sum_{n=1}^\infty |a_n|^p < \infty\}, equipped with the norm \|a\|_p = (\sum_{n=1}^\infty |a_n|^p)^{1/p}. Then \ell^p is a Banach space. At the outset it may not be clear that \|\cdot \|_p satisfies the triangle inequality, or that \ell^p is closed under addition (unless p=1, in which case it is obvious). These two facts follow from Minkowski’s inequality. Once these facts are established, the proof that \ell^p is complete is obtained from the proof that \ell^2 is complete (see Notes 1) by replacing 2 by p.

In the study of the spaces \ell^p, Holder’s inequality and Minkowski’s inequality are indispensable, so the reader is advised to review these.

Example 3: Let \ell^\infty be the space of bounded sequences with the norm \|a\|_\infty = \sup_n |a_n|. Then \ell^\infty is a Banach space.

Proposition 4: Every normed space X can be embedded isometrically in a unique Banach space which contains X as a dense subset.

The proof of Proposition 4 is just like Theorem 5 in Notes 1, only simpler. The unique Banach space containing X is called the completion of X

Exercise B: Let \ell_0 denote the vector space of all finitely supported sequences. For 1 \leq p \leq \infty, equip \ell_0 with the \| \cdot \|_p norm. Compute the completion of \ell_0.

Exercise C: A function f \in C(\mathbb{R}^d) is said to have compact support if supp f := \overline{\{x: f(x) \neq 0\}} is compact. A function f is said to vanish at infinity if for all \epsilon >0 there is a compact K such that |f(x)| < \epsilon for all x \notin K. Let C_c(\mathbb{R}^n) denote the space of all compactly supported continuous functions, and let C_0(\mathbb{R}^n) denote the space of all continuous functions vanishing at infinity. Prove that the completion of C_c(\mathbb{R}^n) is C_0(\mathbb{R}^n).

Example 5: If (X, \mu) is a measure space and p \geq 1, then one may define the Lebesgue spaces L^p(X, \mu) as the space of all measurable functions on X for which \int_X |f|^p d \mu < \infty. If X = \{1, 2, \ldots, \} and \mu is the counting measure then we get Example 2 (the counting measure is the measure that assigns to each set S the value \mu(S) = the number of elements in S).

Example 6: For those who are not yet very comfortable with measure spaces, another way to obtain (instances of) the previous example is as follows. Let D be an open subset in \mathbb{R}^n. Denote by C_c(D) the continuous functions of compact support on D. Define a norm on C_c(D) by \|f\|_p = (\int_D |f(t)|^p dt)^{1/p}. Then L^p(D) = L^p(D, dx) can be identified with (and can also be defined as) the completion of C_c(D) with respect to this norm.

Exercise D: Prove that L^p(D) is separable. Prove that \ell^p is separable for p \in [1,\infty) and non-separable for p = \infty .

2. Bounded operators and dual spaces

Example 7: Let X and Y be normed spaces. We define the space of bounded linear operators from X to Y, B(X,Y),  just as we have in the Hilbert space case (see these notes). We equip B(X,Y) with the operator norm \|T\| = \sup_{\|x\|=1} \|Tx\|. It is really simple to show that the operator norm is truly a norm: \|(A+B)x\| \leq \|Ax\| + \|Bx\| \leq (\|A\|+\|B\|)\|x\|. It is also easy to show that Proposition 1 from Notes 4 holds:  a linear map between to normed spaces is bounded if and only if it is continuous, if and only if it is continuous at some point.  A slightly deeper fact is the following.

Proposition 8: If  Y  is a Banach space, then B(X,Y) is a also a Banach space. 

Proof: Suppose that \{A_n\} is a Cauchy sequence in B(X,Y). For all x \in X, \{A_n x\} is a Cauchy sequence in the complete space Y. Let A(x) = \lim A_n x. It is easy to see that the map A is linear and bounded. Even though we wrote down Ax = \lim A_n x, we still did not show that A_n \rightarrow A in B(X,Y), because this involves showing that \|A_n - A\| \rightarrow 0. If N is chosen large so that \|A_n - A_m\| is smaller that t for m,n \geq N, then for all x \in X,

\|(A-A_m)x\| = \lim_n \|(A_n - A_m)x\| \leq t \|x\|.

Thus \|A-A_m\| \leq t for m \geq N, thus A_n \rightarrow A.

One of the most important instances of the above proposition is when Y is equal to \mathbb{R} or \mathbb{C}.

Definition 9: Let X be a normed space. We denote the space of all bounded linear functionals (i.e., bounded linear maps into the scalar field) by X^* . X^* is called the dual space or the conjugate space of X

Remark: In some parts of the literature, the dual space is denoted by X'.


  1. Every Hilbert space can be identified (anti-linearly) with its dual.
  2. (\ell^1)^* = \ell^\infty: every b \in \ell^\infty gives rise to an F_b \in (\ell^1)^* by way of F_b(a) = \sum_i a_i b_i.
  3. If 1<p<\infty, and q satisfies 1/p + 1/q = 1, then (\ell^p)^* = \ell^q, as above.
  4. We will later see that (\ell^\infty)^* \neq \ell^1.

We will discuss some more examples in the next set of notes, where we will also prove (\ell^p)^* = \ell^q.

3. Isomorphsims

Having now several examples of Banach spaces, it is of interest to know what is the relationship between these spaces. Recall that Hilbert spaces which had orthonormal bases of the same cardinality turned out to be isomorphic, so they all enjoy the same structure and the same geometry. When it comes to Banach spaces the landscape is much richer. Although we hardly have any tools to begin mapping the landscape at this point, we raise the issue now, so we can return to it from time to time as we proceed.

A bounded operator T \in B(X,Y) is said to be an isomorphism if it is bijective and if it has a bounded inverse. It is said to be an isometry if \|Tx\|=\|x\| for all x \in X. X and Y are said to be isomorphic if there is an isomorphism between them, and isometrically isomorphic (or simply isometric) if there is an isometric isomorphism between them.

Note that the notion of isomorphism that we defined here is much weaker than the one we defined for Hilbert spaces (although within the class of Hilbert spaces they are equivalent). Indeed, a Hilbert space cannot be isometrically isomorphic to a normed space that is not Hilbert, but it certainly can be isomorphic. In fact, all n-dimensional normed spaces are isomorphic to \mathbb{C}^n with the standard inner product.

At this point we do not have much more to say about which of our examples of Banach spaces is isomorphic to which. Here is something simple that we can say.

Exercise E: Prove that \ell^p is not isomorphic to \ell^\infty for all p<\infty.

4. Topological vector spaces

Before moving on, I have to say that Banach spaces, though by far more general than Hilbert spaces, do not cover all the kinds of topological vector spaces that arise in analysis or in applications.

Consider for example the space C(\Omega), where \Omega \subseteq \mathbb{R}^n. When \Omega is not compact there exist unbounded continuous functions on it. Thus one cannot define the sup norm for all elements in \Omega. However, for every compact subset K of \Omega and every f \in C(\Omega), we can define \|f\|_K = \sup_{x\in K} |f(x)|. If K_1, K_2, \ldots is a sequence of compact sets such that \Omega = \cup K_i, then we may define a measure of distance between elements in C(\Omega) by

d(f,g) = \sum_i 2^{-i} \frac{\|f-g\|_{K_i}}{1+ \|f-g\|_{K_i}}.

One shows that d is a complete metric on C(\Omega), and that a sequence \{f_n\} converges to f in (C(\Omega),d) if and only if f_n \rightarrow f uniformly on every compact subset of \Omega.

It can be shown that there is no norm one can define on C(\Omega) that induces this metric. Other examples of interesting function spaces which are not linearly homeomorphic to Banach spaces are C^\infty(\Omega) or H(\Omega) (the space of holomorphic functions on \Omega) when \Omega \subseteq \mathbb{C}.

A vector space which is also a complete metric space with a translation invariant metric which has a convex local base at 0 is called a Frechet space. There are even more general topological vector spaces of interest in analysis, especially locally convex topological vector spaces. Some of the theory we shall develop can also be developed for these spaces. I chose to stick to the more concrete Banach spaces for didactic purposes. We will discuss some specific locally convex topologies when they arise in the study of  Banach spaces and their operators.

5. The Hahn-Banach extension theorems

Definition 10: Let X be a real vector space. A sub-linear functional is a function p: X \rightarrow \mathbb{R} such that 

  1. p(x+y) \leq p(x) + p(y)  for all x, y \in X
  2. p(cx) = c p(x)  for all x \in X and c \geq 0

Theorem 11 (HB extension theorem, sub-linear functional version): Let X be a real vector space, and let p be a sub-linear functional on X. Suppose that Y \subseteq X is a subspace, and that f is a linear functional on Y such that f(y) \leq p(y)  for all y \in Y. Then there exists a linear functional F on X such that F\big|_Y = f and F(x) \leq p(x)  for all x \in X

Proof: We may suppose that Y is not trivial, otherwise the result is trivial. The first part of the proof involves extending f only to a slightly larger subspace. The second part of the proof then involves using the first part to show that f can be extended all the way up to X.

Let x \notin Y be nonzero, and define W = \{y + cx : y \in Y, c \in \mathbb{R}\}. We will extend f to a functional F on W such that F(w) \leq p(w) for all w \in W. Since x is independent of Y, we are free to define F(x) to be any real number, and that determines uniquely an extension F of f. The issue here is to choose F(x) so that F is smaller than p on W. What we need is

(*) f(y) + cF(x) = F(y) + cF(x) = F(y+cx) \leq p(y+cx)

for all y \in Y, c \in \mathbb{R}. We may divide (*) by |c|, and we may replace y by any multiple of y, to convert condition (*) to

F(x) \leq p(x-y) + f(y) and f(y) - p(y-x) \leq F(x)

for all y \in Y. Working backwards, we see that if there exists any t \in \mathbb{R} such that

\sup \{f(z) - p(z-x) : z \in Y\} \leq t \leq \inf\{p(x-y) + f(y) : y \in Y\}

then we can define F(x) = t, and that defines the extension F on W that is dominated by p. But for all y,z,

f(z) - f(y) = f(z-y) \leq p(z-y) \leq p(z-x) + p(x - y)

so f(z) - p(z-x) \leq f(y) + p(x-y), so such a t exists, and the first part of the proof is complete.

And now, the second part of the proof. If X is finite dimensional, then we proceed by induction. If X is not finite dimensional, we use Zorn’s lemma as follows. Let P be the collection of all pairs (g,Z) such that

  1. Z is a linear subspace of X containing Y.
  2. g is a linear functional on Z that extends f.
  3. g(z) \leq p(z) for all z \in Z.

The pair (Y,f) is in P, so P is not empty. Let P be partially ordered by the rule (Z,g) \leq (Z',g') if and only if Z' \supseteq Z and g'\big|_Z = g. It is easy to see that every chain in P has an upper bound, thus by Zorn’s lemma P has a maximal element (W, F). Now, W must be X, otherwise we would be able to extend it further by the first part of the theorem. Thus, F is the required extension of f, and the proof is complete.

Theorem 12 (HB extension theorem, bounded functional version): Let Y be a subspace of a normed space X, and let f \in Y^*. Then there exists F \in X^* such that F\big|_Y = f and \|F\| = \|f\|.

Proof: We prove the theorem first for X a real space. Define p(x) = \|f\| \|x\| for all x \in X. This is easily seen to be a sub-linear functional on X, which dominates f on Y. By the HB extension theorem, there exists F extending f such that |F(x)| \leq p(x) = \|f\| \|x\| for all x, thus \|F\| \leq \|f\|. Since F extends f, this is clearly an equality.

Suppose now that X is a normed space over the complex numbers. Then it is also a normed space over the reals. Define g = Re f. Then g is a bounded real functional on Y, and \|g\| \leq \|f\|. By the previous paragraph, g extends to a bounded real functional G on X such that \|G\| = \|g\|. The natural hope is that there is a complex linear functional F such that G = Re F which does what we want. But a short computation shows that the real part of a complex functional F defines it completely, in the sense that F = Re F - i Re F(ix). Thus we have no other option but to define F(x) = G(x) - i G(ix). It is now a matter of some computations to show that such a defined F is linear and extends f. To see that \|F\| = \|f\|, it suffices to show that \|F\| \leq \|G\| (why?).

Let x \in X, and write F(x) = r e^{i t}. Then |F(x)| = r = e^{-i t}F(x)= F(e^{-i t} x) = G(e^{-i t} x) - i G(ie^{-i t} x). But this is a real number, so its imaginary part vanishes and we get |F(x)| = G(e^{-i t} x) \leq \|G\| \|x\|, which proves \|F\| \leq \|G\|.

Corollary 13: Let X be a normed space and x \in X. Then there exists F \in X^* such that \|F\| = 1  and  F(x) = \|x\|

Proof: Let Y = \textrm{span}\{ x\}, define f on  Y by f(cx) = c\|x\|, and extend using Hahn-Banach.

Corollary 14: Let x be an element in a normed space X  for which f(x) = 0 for all f \in X^*. Then x = 0

6. The Hahn-Banach separation theorems

Theorem 15 (HB sepratation theorem, subspace-point): Let M be a linear subspace of a normed space X, and let x \in X such that d(x,M) = \inf\{\|x-m\| : m \in M\} = d. Then there exists F \in X^*, such that \|F\|\leq 1, F(x) = d and F(M) = \{0\}

Proof: On \textrm{span}\{x,M\} we define a linear functional by f(cx+m) = cd. By Hahn-Banach, we will be done once we show that \|f\| \leq 1. But \|cx+m\| = |c| \|x-(-c^{-1}m)\| \geq |c|d (when c \neq 0), thus

|f(cx+m)| = |cd| \leq \|cx+m\| .

Corollary 16: Let M be a subspace in a normed space. A point x \in X is in \overline{M} if and only f(x) = 0  for all  f \in X^* that vanishes on M

Corollary 16 is very useful in approximation theory. There is a proof of the Stone-Weierstrass Theorem that is based on this corollary (and also the Krein-Milman theorem, which we shall learn in a future lecture). Just to give you another taste, Corollary 16 (together with some complex analysis) can also be used to prove the following theorem of Muntz.

Theorem 17 (Muntz approximation theorem): Let \{t_k\}_{k=1}^\infty be an increasing sequence of positive reals. Let 0<a<b. Then the linear space spanned by the monomials \{x^{t_k}\} is dense in C[a,b] if and only if  \sum_{k=1}^\infty \frac{1}{t_k} = \infty

In particular, polynomials in prime powers of x are dense in C[a,b], isn’t that neat? See Theorem 3 on this post for the part of the proof of Muntz’ theorem (the sufficiency part – which is the more interesting part) that uses Hahn-Banach.

We return to our discussion of separation theorems. Let us fix for the rest of the post a normed space X over the reals. Since every complex space is also a real space, the following separation theorems can be applied to complex spaces (with the modification that every f appearing in the statements and definitions be replaced by its real part Ref).

The conclusion of Corollary 16 can be restated by saying that there is a functional that “separates” a subspace from a point. To make the notion of separation more geometrically intuitive, we introduce the following terminology.

Definition 18: A hyperplane in X is a subset of the form F^{-1}(c), where F is a linear functional on X and c \in \mathbb{R}. If A, B \subseteq X, we say that the hyperplane F^{-1}(c)  separates A from B if 

F(A) \leq c \leq F(B).

We say that this hyperplane strictly separates A  from B  if there is some \epsilon > 0 such that 

F(A) \leq c-\epsilon < c+ \epsilon \leq F(B).

It is instructive to draw a picture that goes with this definition. Though a statement about a hyperplane separating two sets is just a statement about functionals, it is convenient to use the geometric terminology. To make our geometric vocabulary complete, we need the following result.

Exercise F: Prove that a hyperplane F^{-1}(c) is closed if and only if it is not dense, and this happens if and only if F is continuous.

To obtain the separation theorems, we will make use of the following device. Recall the definition of convex set from Notes 2.

Definition 19 (the Minkowski functional): Let C \subseteq X be a convex and open set. Define p : X \rightarrow [0,\infty) by 

p(x) = \inf\{t>0 : t^{-1}x \in C\}.

p is called the Minkowski functional of C.

Lemma 20: Let C \subseteq X be a convex and open set containing 0. Then p is a sub-linear functional, C = \{x \in X : p(x) < 1\}, and there exists some M>0 such that p(x) \leq M\|x\| for all x \in X

Proof: There is some r such that the closed ball B(0,r) \subset C, thus for every x \neq 0, \frac{r}{\|x\|}x \in C, hence p(x) \leq \frac{1}{r}\|x\|. That takes care of the last assertion.

Assume that x \in C. The set C is open, so (1+r)x \in C for sufficiently small r >0, whence p(x) \leq (1+r)^{-1} < 1.

Assume that p(x) < 1. Thus there is some 0<t<1 for which t^{-1}x \in C. But since x  is on the ray connecting  0  and  t^{-1}x  and  C is convex, x \in C. Thus C = \{x \in C : p(x)<1\}.

Clearly, p(0) = 0. For c>0, p(cx) can be written as \inf \{ct>0: (ct)^{-1}cx \in C\} = c\inf \{t>0 : t^{-1} x \in C\} = c p(x).

We proceed to prove that p is sub-additive. Let x,y \in X and r > 0. From the definition (p(x) + r)^{-1}x and (p(y) + r)^{-1}y are in C. Then every convex combination

(*)  t(p(x) + r)^{-1}x + (1-t)(p(y) + r)^{-1}y

( t \in [0,t] ) will also be in C. We now choose the value of  t  cleverly, so that the coefficients of x and y will be the same, and in that way we will get something times x+y. The solution of the equation

t(p(x) + r)^{-1} = (1-t)(p(y) + r)^{-1}

is t = \frac{p(x)+r}{p(x) +p(y) + 2 r}, and plugging that value of t in (*) gives (p(x) + p(y) + 2r)^{-1} (x+y) \in C. Thus, p(x+y) \leq p(x) + p(y) + 2r. Letting r tend to 0 we obtained the result.

Theorem 21 (HB separation theorem, convex-open): Let A,B \subseteq X be two nonempty disjoint convex sets, such that B is open. Then there exists a closed hyperplane which separates A and B

Proof: Let us first prove the case where A consists of one point. By translation invariance of all the notions involved, we may assume that A = \{x\} and 0 \in B. We want to use the HB extension theorem (Theorem 11). Let p be the Minkowski functional of B. Let f be the linear functional sending every \lambda x \in \textrm{span}\{x\} to \lambda. We will show that f(\lambda x) \leq p(\lambda x) for all \lambda. Now, p(x) \geq1, so if \lambda \geq 0, then f(\lambda x) = \lambda \leq \lambda p(x) = p(\lambda x). If \lambda < 0, then f(\lambda x) < 0 \leq p(\lambda x). By the HB extension theorem f can be extended to a functional F on X such that F(y) \leq p(y) for all y \in X. This F satisfies F(x) = 1 and F(y) \leq p(y) < 1 for y \in B, so the hyperplane F^{-1}(1) separates A and B (Question: why is this hyperplane closed?).

Now let us treat the general case. The set A - B = \{a-b : a \in A, b \in B\} is convex and open, and does not contain 0. By the previous paragraph, there is a closed hyperplane that separates A - B and \{0\} as follows

F(A-B) < 0,

so F(a) < F(b) for all a,b. If c \in [\sup F(A), \inf F(B)] then F^{-1}(c) separates A and B.

Theorem 22 (HB separation theorem, compact-closed): Let A,B \subseteq X be two nonempty disjoint convex sets, such that A is closed and B is compact. Then there exists a closed hyperplane which strictly separates A and B.

Proof: As above, we consider A - B, call this set C. As above, C is convex and does not contain 0. Since B is compact, it follows that C is closed. Let B(0,r) be a small ball disjoint from C. By Theorem 21, there is a functional F \in X^* and c \in \mathbb{R} such that

F(a)-F(b) \leq c \leq \inf F(B(0,r)) = -\|f\|r

for all a\in A, b \in B. It follows that A and B can be strictly separated by some hyperplane defined by F.