In high school calculus, one is often inundated with various series convergence tests. It is often a headache to determine which convergence test to use on a particular series. None of the high school convergence tests work on every series. For example,
The answer is no. A short and clever argument shows that an algorithm that can determine if any sequence (or equivalently, series) converges would be capable of solving the halting problem, and thus no such algorithm can exist. This result may be disappointing to high school math students. It also suggests that any notion of a "barrier" between the convergent series and the divergent series would be fuzzy at best. There are many ways to define what such a threshold could be. Intuitively, we would like to say that the threshold is some "critical rate of decay" on the terms of series (modulo trivial modifications to the series) such that a series converges if and only if the rate of decay of the terms of that series is at least the critical rate of decay. There are several different ways of making this precise. We will discuss the following way. Definition: Let \(\{a_n\}_{n\in\mathbb{N}}\) be a sequence of positive numbers. The series \(\sum_{n=1}^{\infty}{a_n}\) is said to be a threshold series (and the sequence \(\{a_n\}_{n\in\mathbb{N}}\) is a threshold sequence) if \(\sum_{n=1}^{\infty}{a_n|c_n|}<\infty\) if and only if the sequence \(\{c_n\}_{n\in\mathbb{N}}\) is bounded. Indeed, this notion of a threshold series captures what we intuitively want. Of course, modifying the terms of any series by any collection of bounded coefficients will not change the convergence of the series. Our notion of threshold series thus includes all such modifications (this is what we meant by quantifying a rate of decay of the terms of a series modulo trivial modifications). The main point is that modification by an unbounded collection of coefficients will always tip the threshold series over the edge into the realm of divergent series—no matter how slowly our coefficients grow. Thus, a threshold series is a series that exhibits a "critical rate of decay" in its terms. It turns out that our hunch that the barrier between convergent and divergent series is fuzzy is correct in the sense that threshold series do not exist. To prove this result, we will need the following lemma. Lemma: Let \(X\) and \(Y\) be topological spaces and let \(\Omega\) be a dense subset of \(Y\). If \(\varphi\colon X\to Y\) is an open map, then \(\varphi^{-1}(\Omega)\) is dense in \(X\). Proof: Pick \(x\in X\) and an open neighborhood \(U\) of \(x\). Since \(\varphi\) is an open map and \(U\) is nonempty, \(\varphi(U)\) is an open subset of \(Y\). Since \(\Omega\) is dense, there exists \(y\in\varphi(U)\cap\Omega\). In particular, since \(y\in\varphi(U)\), there exists \(x'\in U\) such that \(\varphi(x')=y\in\Omega\). Hence, \(x'\in\varphi^{-1}(\Omega)\), so \(U\cap\varphi^{-1}(\Omega)\) is nonempty. \(\square\) Now we may proceed with our main argument. We will reason by contradiction. Suppose that there exists a threshold sequence \(\{a_n\}_{n\in\mathbb{N}}\). Let \(B(\mathbb{N})\) be the space of bounded functions on \(\mathbb{N}\). We can interpret \(B(\mathbb{N})\) as a Banach space with the uniform norm (i.e. the supremum norm). Let \(\mu\) be the counting measure on \(\mathbb{N}\) so that \(L^1(\mu)\) is precisely the collection of absolutely convergent sequences. Define the map \(T\colon B(\mathbb{N})\to L^1(\mu)\) given by \((Tf)(n)=a_nf(n)\) for every \(n\in\mathbb{N}\) and \(f\in B(\mathbb{N})\). Note that the image of \(T\) is a subset of \(L^1(\mu)\) and so \(L^1(\mu)\) is a valid codomain because \(\{a_n\}_{n\in\mathbb{N}}\) is a threshold sequence. It is trivial to check that \(T\) is a surjective linear map between Banach spaces. Suppose that we have a sequence of functions \(\{g_n\}_{n\in\mathbb{N}}\subseteq B(\mathbb{N})\) such that \(g_n\to g\) and \(Tg_n\to h\) where the convergence occurs with respect to the norms of the relevant Banach spaces. Pick \(\epsilon>0\) and fix \(k\in\mathbb{N}\). Since \(g_n\to g\) in uniform norm, we have that \(g\) is the pointwise limit of the \(g_n\). In particular, we may pick \(N_1\) so large that for all \(n>N_1\) we have \(|g_n(k)-g(k)|<\frac{\epsilon}{2a_k}\). Since \(Tg_n\to h\) in the \(L^1\) norm, we pick \(N_2\) so large that for all \(n>N_2\) we have \[|a_kg_n(k)-h(k)|\leq\sum_{j=1}^{\infty}{|a_jg_n(j)-h(j)|}=\sum_{j=1}^{\infty}{|Tg_n(j)-h(j)|}=\|Tg_n-h\|_1<\frac{\epsilon}{2}.\] Now, for \(n>\max{(N_1,N_2)}\), we have \[\begin{split} |(Tg)(k)-h(k)|&=|a_kg(k)-h(k)|\\ &=|a_kg(k)-a_kg_n(k)+a_kg_n(k)-h(k)|\\ &\leq|a_kg(k)-a_kg_n(k)|+|a_kg_n(k)-h(k)|\\ &<\frac{\epsilon}{2}+\frac{\epsilon}{2}=\epsilon. \end{split}\] Since \(\epsilon\) is arbitrary, we have that \((Tg)(k)=h(k)\). Thus, \(Tg=h\). We have showed that \(T\) is a closed linear map, hence \(T\) is continuous by the closed graph theorem. Now since \(T\) is a surjective continuous linear map, \(T\) is open by the open mapping theorem. Let us define \[S=\left\{f\in B(\mathbb{N}): \left|f^{-1}(\mathbb{R}\setminus\{0\})\right|<\infty\right\}.\] It is clear that \(T^{-1}(S)=S\). Notice that for any \(f\in S\), if \(\chi_{\mathbb{N}}\in B(\mathbb{N})\) is the constant indicator function, \[\sup_{n\in\mathbb{N}}{|\chi_{\mathbb{N}}(n)-f(n)|}\geq\sup_{n\in f^{-1}(\{0\})}{|\chi_{\mathbb{N}}(n)-f(n)|}=1.\] Hence, \(S\) is not dense in \(B(\mathbb{N})\). Now pick \(\epsilon>0\) and \(f\in L^1(\mu)\). By definition, \(\sum_{n=1}^{\infty}{|f(n)|}<\infty\), so we may pick \(N\) so large that for \(m\geq N\) we have \(\sum_{n=m}^{\infty}{|f(n)|}<\epsilon\). Define the function \(g\) by \[g(n)=\begin{cases} f(n) & n<N\\ 0 & n\geq N. \end{cases}\] Note that \(g\in S\) by construction. Moreover, \[\|f-g\|_1=\sum_{n=1}^{\infty}{|f(n)-g(n)|}=\sum_{n=N}^{\infty}{|f(n)|}<\epsilon.\] This means that \(S\) is dense in \(L^1(\mu)\). So \(S\) is dense in \(L^1(\mu)\) but not \(B(\mathbb{N})\) and \(T\colon B(\mathbb{N})\to L^1(\mu)\) is an open map with \(T^{-1}(S)=S\). This contradicts the lemma, completing the argument. This is one of my favorite problems because it provides some insight to something I've always thought about when I was younger (the "barrier" between convergent and divergent series) using some comparatively abstract techniques from functional analysis. A lot was swept under the rug via the open mapping and closed graph theorems. I find it fascinating that these abstract results can tell us something that I find relatively tangible about series. The problem of measuring how quickly the terms of a series decay isn't just one from my big bag of problems that I find interesting. It is in fact well-studied in complex analysis, where many results are known regarding how quickly the zeros of an entire function grow. Allegedly, this has significant ramifications in analytic number theory. At some point in the future, I will talk about the critical exponent and the order of an entire function and the relationship between the growth of an entire function and the distribution of its zeros.
0 Comments
One of my favorite ideas in all of mathematics is to study the topology of a space by studying functions on the space. This is the underlying idea of Morse theory, which I hope to learn more about. A huge set of examples of this idea that I am more familiar with comes from complex analysis.
One may observe that in the examples that I have given, the functions we are studying to probe the topology possess some non-topological properties. For instance, holomorphic functions famously have some extremely strong properties most of which are not topological in nature at all. The same goes for harmonic functions, which share many properties with holomorphic functions. In general, differentiability is not a property of a function that interacts much at all with the domain topology. So it is natural to wonder if we can study the topology of a space by studying functions that obey no assumption other than the assumption that they interact somehow with the topology. It also seems reasonable that the topology should be uniquely determined by such functions. More precisely, let \(X\) be a topological space and let \(C(X)\) be the ring of real-valued continuous functions on \(X\). Question: Can one recover the topology on \(X\) given \(C(X)\)? In this blog post, we will show that the answer is in the affirmative. We will focus on the following special case: fix \(X\) to be a compact Hausdorff topological space. Consider the spectrum of the ring, \(C(X)\), which we denote \(\text{Spec }C(X)\). We can interpret the spectrum as a topological space by giving it the Zariski topology. Let \(\mathscr{M}\) be the set of maximal ideals of \(C(X)\). Since every maximal ideal is prime, \(\mathscr{M}\subseteq\text{Spec }C(X)\) and we can endow \(\mathscr{M}\) with the subspace topology. We sometimes refer to \(\mathscr{M}\) as the maximal spectrum of \(C(X)\). The incredible fact which we will prove is that \(X\) is homeomorphic to \(\mathscr{M}\). For every \(x\in X\) define \(I_x=\left\{f\in C(X)\colon f(x)=0\right\}\). Clearly, \(I_x\) is an ideal of \(C(X)\). What is less clear is that \(I_x\) is always a maximal ideal. We will show this in two different ways. In the first method, we will show that any ideal properly containing \(I_x\) is the full ring \(C(X)\). Fix \(x\in X\) and pick \(g\in C(X)\setminus I_x\). Since \(X\) is Hausdorff, \(\{x\}\) is closed, and since \(g\) is continuous, \(g^{-1}(\{0\})\) is closed (and disjoint with \(\{x\}\) since \(g\notin I_x\)). Recall that every compact Hausdorff space is normal (\(T_4\)), so by Urysohn's lemma, there exists a continuous function \(f\colon X\to\mathbb{R}\) such that \(f(x)=0\) but \(f(y)=1\) for all \(y\in g^{-1}(\{0\})\). Notice that \(f\in I_x\). Moreover, \(f\) and \(g\) have no common zeros by construction. Therefore, \(f^2+g^2\in\langle f,g\rangle\) is always positive, so the multiplicative inverse \(\frac{1}{f^2+g^2}\) exists in \(C(X)\). Since ideals are closed under multiplication from any element, \(\chi_X=(f^2+g^2)\cdot\frac{1}{f^2+g^2}\in\langle f,g\rangle\). So the ideal \(\langle f,g\rangle\) contains the identity element of the ring and thus \[C(X)=\langle f,g\rangle\subseteq\langle I_x,g\rangle\subseteq C(X).\] Hence, \(I_x\) is a maximal ideal as claimed. We have established that \(\{I_x\}_{x\in X}\subseteq\mathscr{M}\). It turns out that this is method is quite clumsy. A quicker way to establish that \(I_x\) is maximal is to notice that it is the kernel of the evaluation homomorphism \(C(X)\to\mathbb{R}\) that maps \(f\mapsto f(x)\). Since the homomorphism is clearly surjective, the first isomorphism theorem tells us that \(C(X)/I_x\cong\mathbb{R}\), which is a field. This immediately tells us that \(I_x\) is maximal. Hence, Urysohn's lemma is not (yet) required. The fact that \(\{I_x\}_{x\in X}\subseteq\mathscr{M}\) is purely algebraic. We want to establish the reverse inclusion as well. This is tantamount to showing that every maximal ideal of \(C(X)\) is of the form \(I_x\) for an appropriate choice of \(x\in X\). Let us study a "rogue" maximal ideal \(I\) that is not of the form \(I_x\) for any \(x\in X\). Since \(I\) is maximal and \(I_x\) is maximal for every \(x\in X\), the containment \(I\subseteq I_x\) would immediately imply \(I=I_x\). Hence, \(I\) is not contained in any ideal of the form \(I_x\). This means that for each \(x\in X\), there exists \(f_x\in I\) such that \(f_x(x)\neq0\). For each \(x\in X\), by the continuity of each \(f_x\) and the fact that \(f_x(x)\neq0\), there exists an open neighborhood \(U_x\) of \(x\) such that \(0\notin f_x(U_x)\). This gives us an open cover \(\{U_x\}_{x\in X}\) (notice that to form this open cover, we are invoking the axiom of choice). By compactness, we may extract a finite subcover \(\left\{U_{x_j}\right\}_{j=1}^{n}\). By construction, for each \(x\in X\), there exists at least one \(1\leq j\leq n\) such that \(f_{x_j}(x)\neq0\). So the functions \(f_{x_1},\dots,f_{x_n}\) have no common zero. This means that the function \(f_{x_1}^2+\dots+f_{x_n}^2\) is always positive and so \(\frac{1}{f_{x_1}^2+\dots+f_{x_n}^2}\) is a well-defined continuous function on \(X\). Since \(f_{x_1}^2+\dots+f_{x_n}^2\in I\), we have that \(\chi_X=(f_{x_1}^2+\dots+f_{x_n}^2)\cdot\frac{1}{f_{x_1}^2+\dots+f_{x_n}^2}\in I\). This is a contradiction: no maximal ideal is the unit ideal. Hence, no "rogue" maximal ideals exist. This establishes that \(\mathscr{M}=\{I_x\}_{x\in X}\). Notice the paragraph above uses the same sum of squares trick that we used when we clumsily showed that \(\{I_x\}_{x\in X}\subseteq\mathscr{M}\). In particular, we are using the general fact that the ideal generated by any finite collection of functions in \(C(X)\) that share no common zero is the unit ideal. This is what we have essentially proven in the previous paragraph. Now consider the well-defined map \(\varphi\colon X\to\mathscr{M}\) defined by \(\varphi(x)=I_x\). The above establishes that this map is a surjection. A more subtle point is injectivity. This is where we truly need Urysohn's lemma. Pick \(x,y\in X\) to be distinct points. Since compact Hausdorff spaces are normal, and \(\{x\}\) and \(\{y\}\) are disjoint closed sets, by Urysohn's lemma there exists \(f\in C(X)\) such that \(f(x)=0\) and \(f(y)=1\neq0\). This shows that \(I_x\neq I_y\), which establishes that \(\varphi\) is an injection and thus a bijection. We will establish that \(\varphi\) is in fact a homeomorphism. To do this, we will construct a basis for the topology of \(X\) and for the topology of \(\mathscr{M}\), and show that \(\varphi\) induces a bijection between those bases. For each \(f\in C(X)\), define \[U_f=f^{-1}\left(\mathbb{R}\setminus\{0\}\right),\qquad \tilde{U}_f=\left\{I\in\mathscr{M}\colon f\notin I\right\}.\] We claim that \(\{U_f\}_{f\in C(X)}\) and \(\{\tilde{U}_f\}_{f\in C(X)}\) form bases for the topologies on \(X\) and \(\mathscr{M}\), respectively. To check this, we will use the following standard result from point-set topology. A collection of open subsets \(\mathscr{E}\) of a topological space is a basis for the topology if and only if
We continue to establish the claim that \(\{\tilde{U}_f\}_{f\in C(X)}\) forms a basis for the topology on \(\mathscr{M}\). This is easy with a little knowledge of the Zariski topology on the spectrum of a ring. Define \[X_f=\left\{I\in\text{Spec }C(X)\colon f\notin I\right\}.\] It is a standard fact that \(\{X_f\}_{f\in C(X)}\) forms a basis for the Zariski topology. It is also clear that since \(\tilde{U}_f=\mathscr{M}\cap X_f\) for every \(f\in C(X)\), we have that \(\{\tilde{U}_f\}_{f\in C(X)}\) forms a basis for the subspace topology on \(\mathscr{M}\). Finally, we will establish that for every \(f\in C(X)\), we have \(\varphi(U_f)=\tilde{U}_f\). But this can be done in a single line. \[\varphi(U_f)=\left\{I_x\in\mathscr{M}\colon f(x)\neq0\right\}=\left\{I\in\mathscr{M}\colon f\notin I\right\}=\tilde{U}_f.\] We conclude that \(\varphi\) is a homeomorphism. What is interesting is how we employed the assumptions that \(X\) is Hausdorff and compact. Urysohn's lemma was used in a crucial way to establish that \(\varphi\) is injective, and for this we needed that \(X\) is normal (which uses both assumptions). The compactness assumption was used by itself in the proof that \(\varphi\) is surjective (i.e., the proof of the fact that \(\mathscr{M}=\{I_x\}_{x\in X}\)). However, the astute reader may argue that by proving that \(\{U_f\}_{f\in C(X)}\) forms a basis for the topology on \(X\), we accomplished exactly what we wanted to: we found a way to reconstruct the topology of \(X\) given \(C(X)\). In particular, we used the elements of \(C(X)\) to construct a basis for the topology on \(X\). In doing this, we used no assumption on \(X\) at all; we did not use the assumptions that \(X\) is Hausdorff and compact. Indeed, this construction is valid for any topological space. The issue is that the construction relies heavily on an understanding of the individual continuous functions in \(C(X)\). Usually, it is very difficult to compute preimages of arbitrary continuous functions on \(X\). Hence, we would like a better, more direct way to characterize the topology on \(X\). Showing that \(X\) is homeomorphic to \(\mathscr{M}\) (at the expense of some assumptions) gives us a complete picture of the topology (not just a basis) and it relies more on the ring structure of \(C(X)\) than the actual behaviors of the functions in \(C(X)\). From a theoretical point of view, this is a "nicer" characterization of the topology. It is an entirely algebraic characterization. So while it is true that \(C(X)\) always uniquely determines the topology on \(X\), there is an especially nice algebraic way to represent this topology in the case that \(X\) is compact and Hausdorff. This begs the question: what goes wrong with our algebraic characterization when we remove either the assumption of compactness or of being Hausdorff? Since the Hausdorff assumption is a separation axiom, it is fairly intuitive why things may go wrong if it is removed. What is more interesting is if we remove compactness. Let us study what happens when we remove the compactness assumption from a topological subspace \(X\subseteq\mathbb{R}\). By the Heine-Borel theorem, compactness in this context is equivalent to being closed and bounded, so let us separately remove the assumption of being closed and the assumption of being bounded to see what goes wrong in both cases. First, suppose \(X=(0,1)\). This is a set that is bounded but not closed. Let \(J=\left\{f\in C(X)\colon\lim_{y\to 1^-}{f(y)}=0\right\}\). It is easy to check that \(J\) is an ideal. However, it is easy to see that \(J\) is not contained in \(I_x\) for any \(x\in X\) because the function \(g(y)=y-y^2\) is in \(J\) but not in any \(I_x\) since \(g\) is positive on \(X\). Therefore, the maximal ideal containing \(J\) is none of the \(I_x\). So in this case, the inclusion \(\{I_x\}_{x\in X}\subseteq\mathscr{M}\) is strict. Now, suppose that \(X=[0,\infty)\). This is a set that is closed but not bounded. In this case, let \(J=\left\{f\in C(X)\colon\lim_{y\to\infty}{f(y)}=0\right\}\). Once again, this is an ideal. Moreover, the function \(g(y)=e^{-y}\) is in \(J\) but none of the \(I_x\), since \(g\) is positive on \(X\). So in this case as well, the inclusion \(\{I_x\}_{x\in X}\subseteq\mathscr{M}\) is strict. There is one last interesting note. Recall that when we formed an open cover in the argument, we remarked that we were invoking the axiom of choice. This was used to establish that \(\mathscr{M}=\{I_x\}_{x\in X}\). It turns out that that equality can be proven without the axiom of choice using only the assumptions that \(X\) is a complete, totally bounded metric space. See here. Here is a proof of the analyticity of harmonic functions. This is a bit hard to do from scratch, so I will assume various facts and use notation that is standard (at least in Evans). We denote the volume of the unit ball in \(\mathbb{R}^n\) as \(\alpha(n)\). The surface measure of the unit sphere in \(\mathbb{R}^n\) is then \(n\alpha(n)\). We will also denote multi-indices using \(\alpha\).
We begin with the assumption that \(u\) is harmonic in the open subset \(U\subseteq\mathbb{R}^n\). We will employ the following bound on the derivatives of \(u\). Let \(\alpha\) be a multi-index of order \(|\alpha|=k\), and let \(B_r(x_0)\subseteq U\). Then, \[|D^{\alpha}u(x_0)|\leq\frac{C_k}{r^{n+k}}||u||_{L^1(B_r(x_0))},\] where the constants \(C_k\) are given by \(C_0=\frac{1}{\alpha(n)}\) and \(C_k=\frac{(2^{n+1}nk)^k}{\alpha(n)}\) for \(k>0\). This estimate follows from strong induction on \(k\) and the mean-value property of harmonic functions. Fix \(x_0\in U\). If we set \(r=\frac{1}{4}\inf_{x\in\partial U}{|x-x_0|}\) (which is positive since \(U\) is open) and \(M=\frac{||u||_{L^1(B_{2r}(x_0))}}{\alpha(n)r^n}<\infty\), and apply the above estimate on the derivative, we obtain for any \(x\in B_r(x_0)\) \[|D^{\alpha}u(x)|\leq\frac{(2^{n+1}n|\alpha|)^{|\alpha|}}{r^{n+|\alpha|}\alpha(n)}||u||_{L^1(B_r(x))}=\left[\frac{||u||_{L^1(B_r(x))}}{\alpha(n)r^n}\right]\left(\frac{2^{n+1}n|\alpha|}{r^n}\right)^{|\alpha|}.\] By the triangle inequality, \(B_r(x)\subseteq B_{2r}(x_0)\), so we can modify the bracketed factor on the right hand side of the above inequality to \(M\). This gives us \[|D^{\alpha}u(x)|\leq M\left(\frac{2^{n+1}n}{r}\right)^{|\alpha|}|\alpha|^{|\alpha|}.\] Now, for every \(k\in\mathbb{N}\), notice that \(\frac{k^k}{k!}\) is merely one term in the Taylor expansion of \(e^k\), so \(k^k<e^k\). It follows that \(|\alpha|^{|\alpha|}<e^{|\alpha|}\). Moreover, the multinomial theorem states that \((x_1+\dots+x_n)^k=\sum_{|\alpha|=k}{\binom{|\alpha|}{\alpha}x^{\alpha}}\), where \(x^{\alpha}=\prod_{j=1}^{n}{x_j^{\alpha_j}}\) and \(\binom{|\alpha|}{\alpha}=\frac{|\alpha|!}{\alpha!}=\frac{|\alpha|!}{\prod_{j=1}^{n}{\alpha_j!}}\). It follows that \(n^k=(1+\dots+1)^k=\sum_{|\alpha|=k}{\frac{|\alpha|!}{\alpha!}}\). So for any fixed \(\alpha\), \(\frac{|\alpha|!}{\alpha!}\) is a single term in the expansion of \(n^{|\alpha|}\), and thus \(\frac{|\alpha|!}{\alpha!}\leq n^{|\alpha|}\) or \(|\alpha!|\leq n^{|\alpha|}\alpha!\). Combining all of these estimates, we have \[||D^{\alpha}u||_{L^{\infty}(B_r(x_0))}\leq\sup_{x\in B_r(x_0)}{|D^{\alpha}u(x)|}\leq M\left(\frac{2^{n+1}n}{r}\right)^{|\alpha|}|\alpha|^{|\alpha|}<M\left(\frac{2^{n+1}ne}{r}\right)^{|\alpha|}.\] Since \(1\leq|\alpha|!\leq n^{|\alpha|}\alpha!\), so \(\left(\frac{2^{n+1}ne}{r}\right)^{|\alpha|}\leq\left(\frac{2^{n+1}n^2e}{r}\right)^{|\alpha|}\alpha!\) and we obtain the estimate \[||D^{\alpha}u||_{L^{\infty}(B_r(x_0))}<M\left(\frac{2^{n+1}n^2e}{r}\right)^{|\alpha|}\alpha!\] The Taylor expansion of \(u\) about \(x_0\) is given by \[\sum_{\alpha}{\frac{D^{\alpha}u(x_0)}{\alpha!}(x-x_0)^{\alpha}}.\] We wish to show that this power series converges in some neighborhood. To do this, we study the remainder term, \[R_N(x)=u(x)-\sum_{k=0}^{N-1}{\sum_{|\alpha|=k}{\frac{D^{\alpha}u(x_0)}{\alpha!}(x-x_0)^{\alpha}}}.\] Consider the function \(g\colon\mathbb{R}\to\mathbb{R}\) given by \(g(t)=u(x_0+t(x-x_0))\). By applying the mean-value remainder form of Taylor's theorem to the Taylor expansion of \(g\) about \(0\) evaluated at \(t=1\), we obtain \[R_N(x)=g(1)-\sum_{k=0}^{N-1}{\sum_{|\alpha|=k}{\frac{D^{\alpha}u(x_0)}{\alpha!}(x-x_0)^{\alpha}}}=\sum_{|\alpha|=N}{\frac{D^{\alpha}u(x_0+\lambda(x-x_0))}{\alpha!}(x-x_0)^{\alpha}}\] for some \(\lambda\in[0,1]\). Now, suppose that we are considering \(x\) such that \(|x-x_0|<\frac{r}{2^{n+2}n^3e}\). In particular, we will have \(x\in B_r(x_0)\), so by the convexity of the ball, \(x_0+\lambda(x-x_0)\in B_r(x_0)\) and we will have access to our estimate of the \(L^{\infty}\) norm of \(D^{\alpha}u\). Combining all of the bounds, we obtain \[|R_n(x)|\leq M\sum_{|\alpha|=N}{\left(\frac{2^{n+1}n^2e}{r}\right)^N\left(\frac{r}{2^{n+2}n^3e}\right)^N}=M\sum_{|\alpha|=N}{\frac{1}{(2n)^N}}.\] The relevant multi-indices with order \(N\) that we are summing over are \(N\)-tuples of integers from the set \(\{1,2,\dots,n\}\), because it is these integers that correspond to the components that can exist in \(\mathbb{R}^n\). Hence, there are \(n^N\) multi-indices we are summing over, so \[M\sum_{|\alpha|=N}{\frac{1}{(2n)^N}}=\frac{Mn^N}{(2n)^N}=\frac{M}{2^N}.\] Thus, the remainder vanishes as we take \(N\to\infty\) and the Taylor series converges in the neighborhood we have defined. One of the greatest advantages of the Lebesgue integral over the Riemann integral is its good behavior under limits. There are three big theorems in measure theory that demonstrate this feature of the Lebesgue integral. The first of these is the monotone convergence theorem.
Monotone Convergence Theorem: If \(\{f_n\}\) is a convergent sequence in \(L^+\) such that \(f_j\leq f_{j+1}\) for all \(j\), then \(\int{\lim_{n\to\infty}{f_n}}=\lim_{n\to\infty}{\int{f_n}}\). Proof: Suppose that the measure space we are working in is \((X,\mathscr{M},\mu)\). First, note that \(\{f_n\}\) is an increasing sequence of functions, so \(\{\int{f_n}\}\) is an increasing sequence of real numbers. Thus, \(\lim_{n\to\infty}{\int{f_n}}\) exists, at least in the extended real numbers. Put \(f=\lim_{n\to\infty}{f}\). Secondly, since \(\int{f_n}\leq\int{f}\) for all \(n\), we also have that \(\lim_{n\to\infty}{\int{f_n}}\leq\int{f}\). It remains to prove the reverse inequality. Pick a simple function \(\phi\) such that \(0\leq\phi\leq f\). Let \(\alpha\in(0,1)\) be arbitrary. Define \(E_n=\{x\in X\colon f_n(x)\geq \alpha\phi(x)\}\). Since the \(f_n\) are increasing, \(f_n(x)\geq\alpha\phi(x)\) will imply \(f_m(x)\geq\alpha\phi(x)\) for all \(m\geq n\). Hence, the \(E_n\) are an increasing sequence of sets: \(E_1\subseteq E_2\subseteq \dots\). Fix \(x\in X\). The sequence \(\{f_n(x)\}\) is an increasing sequence of real numbers converging to \(f(x)\). Hence, there exists some \(N\) such that for all \(n>N\) we have that \(0\leq f(x)-f_n(x)\leq f(x)-\alpha\phi(x)\). So for all \(n>N\) we have \(f_n(x)\geq\alpha\phi(x)\) so \(x\in E_n\). In particular, \(X=\bigcup_{n=1}^{\infty}{E_n}\). By the definitions, we have \[\int{f_n}\geq\int_{E_n}{f_n}\geq\alpha\int_{E_n}{\phi}.\] Hence, \[\lim_{n\to\infty}{\int{f_n}}\geq\alpha\lim_{n\to\infty}{\int_{E_n}{\phi}}.\] Recall that for \(E\in\mathscr{M}\), the map \(E\mapsto\int_{E}{\phi}\) is itself a measure, say \(\nu\). Since the \(E_n\) are increasing, by continuity from below on the measure \(\nu\), we have that \[\lim_{n\to\infty}{\int_{E_n}{\phi}}=\int_{\bigcup_{n=1}^{\infty}{E_n}}{\phi},\] so our inequality becomes \[\lim_{n\to\infty}{\int{f_n}}\geq\alpha\int{\phi}.\] Since this holds for \(\alpha\in(0,1)\), it will hold for \(\alpha=1\). Hence, \[\lim_{n\to\infty}{\int{f_n}}\geq\int{\phi}.\] Finally, taking the supremum over all simple functions \(\phi\leq f\), we obtain the desired inequality \[\lim_{n\to\infty}{\int{f_n}}\geq\int{f}.\] \(\square\) The definition of \(\int{f}\) requires us to consider the supremum of the set of integrals of simple functions that do not exceed \(f\), but the monotone convergence theorem tells us that we may instead compute the integral as \(\int{f}=\lim_{n\to\infty}{\int{\phi_n}}\), where \(\{\phi_n\}\) is an increasing sequence of simple functions that converges to \(f\) almost everywhere. It is a standard first result in measure theory that such sequences of simple functions exist. The monotone convergence theorem allows us to interchange integrals with limits when the limit is taken on a sequence of increasing functions. But if the dream is to be able to interchange the integral with a limit for all sequences of functions, we must prepare to be disappointed. The classic pathology is the sequence of mass functions \(f_n=\chi_{(n,n+1)}\). Of course, the sequence \(\{f_n\}\) converges pointwise to the function \(f(x)=0\), but \(\int{f_n}=1\) for all \(n\), so we clearly cannot interchange the limit and the integral in this scenario. The issue here is that even though there is convergence to a nice function, there is a non-negligible mass that preserved in every function of the sequence. A related example is given by \(f_n=n\chi_{\left(0,\frac{1}{n}\right)}\). Once again, the integral of each function in the sequence is \(1\), but the function converges pointwise to \(0\). Again, the issue is that there is a non-negligible mass that is preserved in every function of the sequence. A reasonable objection may be the following. In both examples above, the convergence is pointwise but not uniform, so perhaps the limit and integral fail to be interchangeable because the convergence is not strong enough. This is also wrong: consider \(f_n=\frac{1}{n}\chi_{(0,n)}\). In this case, \(f_n\to 0\) not just pointwise, but uniformly. Nonetheless, \(\int{f_n}=1\) for every \(n\), so the limit and integral cannot be interchanged. It turns out that the relationship between the standard modes of convergence and the limiting behavior of integrals is a subtle one which I will talk about in the future. So the failures that we have demonstrated are truly an inherent limitation of the Lebesgue integral, not a weakness of the form of convergence. Nonetheless, there is something we can say when we have no condition on the sequence of functions (not even the condition that they converge to something). Fatou's Lemma: If \(\{f_n\}\) is any sequence in \(L^+\), then \[\int{\liminf{f_n}}\leq\liminf{\int{f_n}}.\] Proof: For every \(k\geq1\) and \(j\geq k\) we have that \(\inf_{n\geq k}{f_n}\leq f_j\) and thus \(\int{\inf_{n\geq k}{f_n}}\leq\int{f_j}\). This is preserved when we take the infimum of the left: \[\int{\inf_{n\geq k}{f_n}}\leq\inf_{j\geq k}{\int{f_j}}.\] For every \(k\), let \(g_k=\inf_{n\geq k}{f_n}\) and consider the sequence of functions \(\{g_k\}\). Clearly,\(g_1\leq g_2\leq\dots\). So by the monotone convergence theorem, we have that \[\int{\liminf_{n\to\infty}{f_n}}=\int{\lim_{k\to\infty}{g_k}}=\lim_{k\to\infty}{\int{g_k}}\leq\lim_{k\to\infty}{\inf_{j\geq k}{\int{f_j}}}=\liminf_{n\to\infty}{\int{f_n}}.\] \(\square\) Fatou's lemma can be used to derive another proof of the monotone convergence theorem. Second Proof (Monotone Convergence Theorem): Recall that in the first proof, it was immediate that \(\lim_{n\to\infty}{\int{f_n}}\leq\int{f}\), so it suffices to show the reverse inequality. But this is immediate by Fatou's lemma. In particular, if a limit exists, it is equal to the limit infimum, so \(f=\liminf_{n\to\infty}{f_n}\) and \(\lim_{n\to\infty}{\int{f_n}}=\liminf_{n\to\infty}{\int{f_n}}\). \(\square\) Fatou's lemma has weak hypotheses but is a weak result. Moreover, one may ask: what about sequences in \(L^1\)? So far we have only been dealing with sequences in \(L^+\). The dominated convergence theorem is the main result that we use in practice, and is arguably one of the most important results in measure theory. Dominated Convergence Theorem: Let \(\{f_n\}\) be a convergent sequence in \(L^1\) such that there exists \(g\in L^1\) with \(|f_n|\leq g\) almost everywhere for all \(n\), then \(\lim_{n\to\infty}{f_n}\in L^1\) and \(\int{\lim_{n\to\infty}{f_n}}=\lim_{n\to\infty}{\int{f_n}}\). Proof: Let \(\lim_{n\to\infty}{f_n}=f\) almost everywhere. Some standard results show that \(f\) is measurable. Since \(|f|\leq g\in L^1\), we must have that \(f\in L^1\). Suppose that the \(f_n\) are real-valued. By the hypothesis, we have that \(g+f_n\geq0\) and \(g-f_n\geq0\) almost everywhere. Now we can compute \[\begin{split} \int{g}+\int{f}&=\int{g+f}\\ &=\int{\left(g+\lim_{n\to\infty}{f_n}\right)}\\ &=\int{\liminf_{n\to\infty}{\left(g+f_n\right)}}\\ &\leq\liminf_{n\to\infty}{\int{(g+f_n)}}\\ &=\int{g}+\liminf_{n\to\infty}{\int{f_n}}, \end{split}\] where we invoke Fatou's lemma to obtain the inequality. Similarly, \[\begin{split} \int{g}-\int{f}&=\int{g-f}\\ &=\int{\left(g-\lim_{n\to\infty}{f_n}\right)}\\ &=\int{\liminf_{n\to\infty}{\left(g-f_n\right)}}\\ &\leq\liminf_{n\to\infty}{\int{(g-f_n)}}\\ &=\int{g}-\limsup_{n\to\infty}{\int{f_n}}. \end{split}\] Rearranging the two inequalities gives us \(\liminf_{n\to\infty}{\int{f_n}}\geq\int{f}\) and \(\int{f}\geq\limsup_{n\to\infty}{\int{f_n}}\). So it follows that \[\int{f}=\lim_{n\to\infty}{\int{f_n}}.\] In general, if the \(f_n\) are not real-valued, we can break it up into the positive and negative parts of its real and imaginary parts and apply the same procedure as above to arrive at the conclusion. \(\square\) The dominated convergence theorem shows up in the proofs of many other foundational theorems of measure theory, as it accepts a mild hypothesis and permits one to interchange a limit and integral in exchange, which is a very useful operation. To get some more feel for the three big convergence theorems, we will interpret them in a special case. Consider the measure space \((\mathbb{N},\mathscr{P}(\mathbb{N}),\mu)\) where \(\mu\) is the counting measure. Let \(\{f_n\}\) be an arbitrary sequence in \(L^+\) (with respect to this measure space). What does this really mean? For each \(n\), we may construct the family of simple functions \(\{\phi_m^n\}\) as follows. First, define \[\phi_1^n=f_n(1)\chi_{f_n^{-1}(f_n(1))}.\] Define, \[A_m=\mathbb{N}\setminus\bigcup_{j=1}^{m}{f_n^{-1}(f_n(k))}.\] For \(m>1\), if \(A_m=\varnothing\), set \(\phi_m^n=\phi_{m-1}^n\). Otherwise, let \(k_m\) be the minimal element of \(A_m\) and set \[\phi_m^n=\phi_{m-1}^n+f_n(k_m)\chi_{f_n^{-1}(f_n(k_m))}.\] Observe that by construction, \(\lim_{m\to\infty}{\phi_m^n}=f_n\) everywhere, and \(m\), \(\phi_1^n\leq\phi_2^n\leq\dots\leq f_n\). So by the monotone convergence theorem, \[\int{f_n}=\lim_{m\to\infty}{\int{\phi_m^n}}=\sum_{k=1}^{\infty}{f_n(k)},\] where the sum comes from the construction of \(\phi_m^n\) and the fact that \(\mu\) is the counting measure. So we may interpret \(\int{f_n}\) as simply the sum of its values. In particular, there is a correspondence between functions in \(L^+\) with respect to our measure space and sequences of numbers in \([0,\infty]\). Let \(\{S_n\}\) be a family of such sequences, where \[S_n=s_{n,1},s_{n,2}\dots\] and let \(\{f_n\}\) be the family of functions in \(L^+\) such that \(f_n(k)=s_{n,k}\). Let \(T\) be the sequence given by \[T=t_1,t_2,\dots\] where \(t_k=\liminf_{n\to\infty}{s_{n,k}}\). Corresponding to this sequence is the function \(\liminf_{n\to\infty}{f_n}\). Hence, Fatou's lemma tells us that \[\sum_{k=1}^{\infty}{\liminf_{n\to\infty}{s_{n,k}}}\leq\liminf_{n\to\infty}\sum_{k=1}^{\infty}{s_{n,k}}.\] Now, let us further insist that the sequences \(S_n\) are such that for each \(k\), \(s_{1,k}\leq s_{2,k}\leq\dots\). Of course, for each \(k\), \(\lim_{n\to\infty}{s_{n,k}}\) is guaranteed to exist due to the compactness of \([0,\infty]\). Then by the monotone convergence theorem, \[\lim_{n\to\infty}{\sum_{k=1}^{\infty}{s_{n,k}}}=\sum_{k=1}^{\infty}{\lim_{n\to\infty}{s_{n,k}}}.\] We need some extra work to interpret the dominated convergence theorem in this context. Integrals of functions \(\mathbb{N}\to\mathbb{C}\) can still be interpreted as the infinite sum of their values by splitting up the function into the positive and negative parts of its real and imaginary parts and applying our work above on \(L^+\) functions, and using the linearity of the integral. Suppose that \(\{f_n\}\) is a sequence of functions \(\mathbb{N}\to\mathbb{C}\) in \(L^1\) such that \(f_n\to f\) almost everywhere and there is a nonnegative \(g\in L^1\) such that \(g\geq|f_n|\) almost everywhere for each \(n\). Note that the functions \(|f_n|\) are in \(L^+\), and by our previous interpretation of such functions, they correspond to sequences of numbers from \([0,\infty]\). Hence, the condition \(f_n\in L^1\) is equivalent to \[\sum_{k=1}^{\infty}{|f_n(k)|}<\infty.\] Observe that in the measure space \((\mathbb{N},\mathscr{P}(\mathbb{N}),\mu)\), the only null set is the empty set. Hence, convergence almost everywhere is equivalent to convergence everywhere in this context. In particular, for each \(k\), \(f(k)=\lim_{n\to\infty}{f_n(k)}\). Moreover, notice that \(g\) is in \(L^+\), so \[\sum_{k=1}^{\infty}{|f_n(k)|}\leq\sum_{k=1}^{\infty}{g(k)}<\infty.\] So the interpretation is the following. Let \(\{S_n\}\) be a family of sequences of complex numbers \[S_n=s_{n,1},s_{n,2},\dots,\] such that for each \(k\), \(\lim_{n\to\infty}{s_{n,k}}\) exists, and there exists a sequence of nonnegative real numbers \(s_1,s_2,\dots\) with \[\sum_{k=1}^{\infty}{|s_{n,k}|}\leq\sum_{k=1}^{\infty}{s_k}<\infty,\] for each \(n\). By the dominated convergence theorem, \[\sum_{k=1}^{\infty}{\lim_{n\to\infty}{s_{n,k}}}=\lim_{n\to\infty}{\sum_{k=1}^{\infty}{s_{n,k}}},\] and moreover, \[\sum_{k=1}^{\infty}{\lim_{n\to\infty}{|s_{n,k}|}}<\infty.\] There are certain theoretical preliminaries that precede the construction of most measures. One example of a measure that is more or less immediate from the definition of measures is the counting measure. It is the measure \(\mu\) on the measurable space \((X,\mathscr{P}(X))\) given by
\[\mu(E)=\sum_{x\in E}{1}.\] It is intuitive that this ought to be a measure, and it is easily checked that it is one. However, the modern construction of most other "nice" measures requires quite a bit of ground work. The first important result towards this direction is the following. Carathéodory's Theorem: Let \(\mu^*\) be an outer measure on \(X\). The collection \(\mathscr{M}\) of \(\mu^*\)-measurable subsets of \(X\) is a \(\sigma\)-algebra on \(X\), and the restriction of \(\mu^*\) to \(\mathscr{M}\) is a complete measure. As a side note, the idea of \(\mu^*\)-measurability is sometimes called the Carathéodory criterion which is that a subset \(A\) is \(\mu^*\)-measurable if for every \(B\) in the power set of \(X\), we have that \(\mu^*(B)=\mu^*(A\cap B)+\mu^*(A^c\cap B)\). Proof: It is clear by the definition of the Carathéodory criterion, and the fact that \((A^c)^c=A\), \(\mathscr{M}\) is closed under complements. It remains to show that \(\mathscr{M}\) is closed under countable unions to establish that it is a \(\sigma\)-algebra. First we show that it is closed under finite unions. Suppose \(A,B\in\mathscr{M}\). Let \(E\) be an arbitrary subset of \(X\). Then, since \(A\) is \(\mu^*\)-measurable, \[\mu^*(E)=\mu^*(E\cap A)+\mu^*(E\cap A^c).\] We can split up the first term using the \(\mu^*\)-measurability of \(B\). This gives us \[\mu^*(E)=\mu^*((E\cap A)\cap B)+\mu^*((E\cap A)\cap B^c)+\mu^*(E\cap A^c).\] Performing a similar expansion on the last term gives us \[\mu^*(E)=\mu^*(E\cap A\cap B)+\mu^*(E\cap A\cap B^c)+\mu^*(E\cap A^c\cap B)+\mu^*(E\cap A^c\cap B^c).\] In fact, the Carathéodory criterion is the natural condition on sets that allows them to be "building blocks" of other sets in the \(\mu^*\) sense. That is, for any finite collection of \(n\) \(\mu^*\)-measurable sets, the outer measure of any set can be expressed as the sum of the outer measures of intersections of the set with the \(2^n\) pieces that the collection forms. Now, observe that by Venn diagram, \(A\cup B=(A\cap B)\cup(A\cap B^c)\cup(A^c\cap B)\). It follows that \(E\cap(A\cup B)=(E\cap A\cap B)\cup(E\cap A\cap B^c)\cup(E\cap A^c\cap B)\). Subadditivity then yields \[\mu^*(E\cap(A\cup B))\leq\mu^*(E\cap A\cap B)+\mu^*(E\cap A^c\cap B)+\mu^*(E\cap A\cap B^c).\] To this, we add the De Morgan's law identity \(\mu^*(E\cap(A\cup B)^c)=\mu^*(A\cap A^c\cap B^c)\) to obtain \[\mu^*(E\cap (A\cup B))+\mu^*(E\cap(A\cup B)^c)\leq\mu^*(E).\] Of course, the reverse inequality follows immediately from subadditivity. Hence, \(\mathscr{M}\) is closed under finite unions and is at least an algebra. The outer measure is also at least finitely additive take \(A,B\in\mathscr{M}\) to be disjoint. Then, we have \[\mu^*(A\cup B)=\mu^*((A\cup B)\cap A)+\mu^*((A\cup B)\cap A^c)=\mu^*(A)+\mu^*(B).\] We need to extend these results to the countable case in order to establish the first part of the theorem. It suffices to consider a countable collections of disjoint sets from \(\mathscr{M}\), because the union of arbitrary countable collections from \(\mathscr{M}\) can be expressed as the union of a countable collection of disjoint sets from \(\mathscr{M}\) via the standard trick. So we let \(A_1,A_2,\dots\in\mathscr{M}\) be pairwise disjoint. Define \[B_n=\bigcup_{j=1}^{n}{A_j}\qquad B=\bigcup_{j=1}^{\infty}{A_j}.\] For any \(n\), since \(A_n\in\mathscr{M}\), we have that \[\mu^*(E\cap B_n)=\mu^*(E\cap B_n\cap A_n)+\mu^*(E\cap B_n\cap A_n^c).\] By definition, the first term is \(\mu^*(E\cap A_n)\) while the second term is \(\mu^*(E\cap B_{n-1})\). Repeating this inductively, we obtain that \[\mu^*(E\cap B_n)=\sum_{j=1}^{n}{\mu^*(E\cap A_j)}.\] Now, observe that \(B_n\subseteq B\), so \(B^c\subseteq B_n^c\) and \(E\cap B^c\subseteq E\cap B_n^c\). By monotonicity, we then have that \(\mu^*(E\cap B^c)\leq\mu^*(E\cap B_n^c)\). To this we add the identity above (reversed) to obtain \[\mu^*(E\cap B^c)+\sum_{j=1}^{n}{\mu^*(E\cap A_j)}\leq\mu^*(E\cap B_n^c)+\mu^*(E\cap B_n)=\mu^*(E).\] The far RHS comes from the fact that \(B_n\in\mathscr{M}\), since we have already established that \(\mathscr{M}\) is an algebra. Now, we take \(n\to\infty\) so that \[\begin{split} \mu^*(E)&\geq\mu^*(E\cap B^c)+\sum_{j=1}^{\infty}{\mu^*(E\cap A_j)}\\ &\geq\mu^*(E\cap B^c)+\mu^*\left(\bigcup_{j=1}^{\infty}{E\cap A_j}\right)\\ &=\mu^*(E\cap B^c)+\mu^*(E\cap B)\\ &\geq\mu^*(E). \end{split}\] The second line comes from countable subadditivity, the third line is by definition, and the last line comes from finite subadditivity. Note that we cannot immediately suppose that we have equality in the last line, because that would assume that \(B\in\mathscr{M}\), which is what we want to show in the first place. In any case, we have bounded \(\mu^*(E\cap B^c)+\mu^*(E\cap B)\) above and below by \(\mu^*(E)\), so in fact we do have equality in the last line. This establishes that \(\mathscr{M}\) is a \(\sigma\)-algebra. Countable additivity on \(\mathscr{M}\) comes immediately by taking \(E=B\). It remains to show that \(\mu^*|_{\mathscr{M}}\) is a complete measure. We already know that \(\mu^*\) will vanish on subsets of null sets by monotonicity. So we just need to show that subsets of null sets in \(\mathscr{M}\) are themselves in \(\mathscr{M}\). Let \(\mu^*(A)=0\). Then, by subadditivity, \(\mu^*(E)\leq\mu^*(E\cap A)+\mu^*(E\cap A^c)\), while by monotonicity, \(\mu^*(E\cap A)=0\) and \(\mu^*(E\cap A^c)\leq\mu^*(E)\), hence, \[\mu^*(E)\leq\mu^*(E\cap A^c)\leq\mu^*(E).\] Hence, \(A\) is \(\mu^*\)-measurable. In particular, if \(A'\subseteq A\) then we know that \(\mu^*(A')=0\) and by the above, \(A'\in\mathscr{M}\). We are done. \(\square\) Combining Carathéodory's theorem with the fact that premeasures on an algebra induce an outer measure, and every set in the algebra is measurable with respect to that outer measure, one can establish that a premeasure on an algebra can be extended to a measure on the generated \(\sigma\)-algebra. In fact: it is precisely the restriction of the induced outer measure to the generated \(\sigma\)-algebra. It turns out that this is the natural extension of the premeasure in the sense that the extension is unique if the premeasure is \(\sigma\)-finite, and in the general case, any other measure that extends the premeasure will be equal to the extension defined above on sets with finite measure. The construction of the Lebesgue measure thus proceeds by defining the natural premeasure on the set of half-open half-closed intervals on the real line, checking that this indeed a premeasure on an algebra, and then invoking the above result which tells us that it extends to a measure on the Borel \(\sigma\)-algebra of \(\mathbb{R}\). The completion of this measure is what we call Lebesgue measure. Taking the completion explodes the domain of the measure to a pretty large \(\sigma\)-algebra. In fact, the \(\sigma\)-algebra of Lebesgue measurable sets is so large that it has been shown that it is impossible to show that there is a Lebesgue nonmeasurable set without the axiom of choice. See here. That the rationals are dense in the real numbers is a basic property that any student would learn in a first course in topology or real analysis. The standard proof uses the Archimedean property of the real numbers. Here, I outline a different way to prove that an irrational number can be approximated to arbitrary accuracy by rational numbers. This is, in some sense, the "hardest" case to prove, if we attempt to show that rational numbers and irrational numbers can both be approximated by rationals, separately.
The idea is that a finitely generated additive subgroup of \(\mathbb{R}\) need not be discrete. Let \(x\) be irrational and \(p\) rational. Consider the set of real numbers of the form \(ax+bp\) where \(a,b\in\mathbb{Z}\). Suppose there is a nontrivial solution to \(ax+bp=0\). Then, \(x=-\frac{bp}{a}\in\mathbb{Q}\), a contradiction. Hence, no integer multiple of \(x\) is equal to an integer multiple of \(p\). If we let \(p=1\), we discover a consequence of this: if we move around a circle in steps by doing \(x\) rotations around the circle each step, we will never reach the same point more than once. Let \(S\) denote the subset of points of the circle that we will reach by performing these steps. Our reasoning shows that \(S\) is countably infinite. If we partition the circle into \(N\) equal arcs, where \(N\) is any integer, it follows from the pigeonhole principle that there exists an arc that contains infinitely many of the points on the circle we will reach by making such steps. This means that for any arbitrarily small neighborhood size, one may find a point on the circle with a neighborhood of that size that has infinitely many points from \(S\). By rotating the starting point of our steps, we rotate the point on the circle that is guaranteed the existence of this neighborhood. So every point on the circle has infinitely many points from \(S\). That is, \(S\) is dense in the circle. Let \(p=q\) where \(q\) is some fixed nonzero rational number. Since \(S\) is dense in the circle, and the integer multiples of \(q\) can be represented as points in the circle, we have that for any \(\epsilon>0\), we can find \(m,n\in\mathbb{Z}\) (with \(n\neq0\)) such that \[|mq+nx|<\epsilon\Longrightarrow\left|x+\frac{mq}{n}\right|<\frac{\epsilon}{|n|}\leq\epsilon.\] So \(-\frac{mq}{n}\) is a rational number that is within \(\epsilon\) of \(x\). Two days ago, I had an interesting interaction with some other students in my REU that destroyed two misconceptions I have had in analysis for a long time.
First, we introduce the idea of a bump function. Let \(M\) be a smooth manifold with \(A\subseteq U\subseteq M\), where \(A\) is closed and \(U\) is open. We define a bump function for \(A\) supported in \(U\) to be a continuous function \(\psi\colon M\to\mathbb{R}\) with \(0\leq\psi\leq1\) on \(M\), \(\psi=1\) on \(A\), and \(\mathrm{supp}\ \psi\subseteq U\). So far so good. It is not a particular surprise that such a function exists. We only require that \(\psi\) be continuous. Here is where my intuition had failed me for years up to this point: smooth bump functions exist! This was a complete shock to me. I was not familiar with any smooth functions that were constant in some neighborhood, and non-constant elsewhere. Moreover, I have thought about this scenario many times before and concluded that it ought to be impossible. I figured if a function was of class \(C^{\infty}\), then it could not be constant (so that its derivatives of all orders vanished) and then suddenly get kicked and start moving. I believed that for such a "kick" to occur, one of the functions derivatives must be discontinuous, staying at zero in some neighborhood before suddenly jumping to some nonzero value. Alas, I was wrong. Smooth bump functions do exist. This follows from the existence of smooth partitions of unity. Most of the technical work that goes into proving that smooth bump functions exist is actually hidden under the rug of proving that smooth partitions of unity exist. But taking that for granted, we note that \(U\) and \(M\setminus A\) form an open cover of \(M\). Then, one may find a smooth partition of unity \(\{\psi,\lambda\}\) subordinate to this cover. By definition, \(\lambda\) vanishes on \(A\), so \(1=\psi+\lambda=\psi\) on \(A\). Indeed, this \(\psi\) is the desired bump function! Having crushed one of my childhood misconceptions, another was crushed the other day when I saw the following proposition in Lee's Introduction to Smooth Manifolds. Proposition: Let \(M\) be a smooth manifold with or without boundary, \(p\in M\), \(v\in T_pM\). If \(f,g\in C^{\infty}(M)\) agree on some neighborhood of \(p\), then \(vf=vg\). The statement of this theorem is not particularly relevant, but the hypothesis immediately caught my eye. Why is it stipulated that \(f\) and \(g\) agree just on some neighborhood of \(p\)? I had always figured that the evolution of smooth functions were determined by their behavior in any neighborhood (similar to how continuous functions are determined by their behavior on dense subsets). The motivation here was the uniqueness of solutions to differential equations and dynamical systems. Once one describes a global relationship between a a function and its derivatives, and an initial condition, the evolution of the function past the starting point is determined. I believed that if we strengthen the agreement between two functions to occur not at a single point, but an entire neighborhood, and that both functions were \(C^{\infty}\), then the evolutions of the functions beyond on the neighborhood must also agree. In other words, I did not understand why the proposition said that \(f\) and \(g\) agree on some neighborhood, because I thought that that occurs precisely when they agree everywhere. Alas, this is also wrong. Upon discussing this with some other students at my REU, I found the counterexample I was looking for. Using our previous notation, if one considers \(g=\psi f\), where \(\psi\) is a bump function, both functions agree in any neighborhood contained in \(A\) by construction, but are very free to disagree outside of \(A\)! Somehow, it seems that bump functions allow us to construct strange functions whose derivatives fail to propagate local information globally, even though the functions are infinitely smooth! This observation is further strengthened by the fact that bump functions are actually used in the proof of the proposition written above (and the proposition itself is a statement about how tangent vectors do not care about global behavior—they only care about local behavior). No contradiction is reached with my intuition stemming from unique solutions to differential equations. After all, having a relationship between a function and its derivatives that holds everywhere is quite a strong condition, and it's not any surprise that unique evolution is forced in that scenario. After a little bit of digging, I found the gap in my knowledge. The identity theorem states that analytic functions that agree on some neighborhood must agree everywhere. So it seems that I have stumbled upon a crucial qualitative difference between analytic functions and smooth functions. In particular, analyticity is a strong enough condition for local behavior (at least in a neighborhood) to propagate globally, while smoothness is not. Obviously, any concrete examples that I attempted to think of were analytic. My intuition is still grappling with this. But it's good to know that I was wrong for so long. Now that we have the tools to do so, let us discuss Kantorovich's theorem.
Let \(\vec{a}_0\) be a point in \(\mathbb{R}^n\), \(U\) an open neighborhood of \(\vec{a}_0\) in \(\mathbb{R}^n\), and \(f\colon U\to\mathbb{R}^n\) a differentiable mapping, with its derivative \([Df(\vec{a}_0)]\) invertible. Define \[\vec{h}_0=-[Df(\vec{a}_0)]^{-1}f(\vec{a}_0),\ \vec{a}_1=\vec{a}_0+\vec{h}_0,\ U_1=B_{|\vec{h}_0|}(\vec{a}_1).\] If \(\overline{U}_1\subset U\) and the derivative \([Df(\vec{x})]\) satisfies a Lipschitz condition with Lipschitz ratio \(M\) for all points in \(\overline{U}_1\), and if the inequality \[|f(\vec{a}_0)|\left|[Df(\vec{a}_0)]^{-1}\right|^2M\leq\frac{1}{2}\] is satisfied, the equation \(f(\vec{x})=\vec{0}\) has a unique solution in the closed ball \(\overline{U}_1\), and Newton's method with initial guess \(\vec{a}_0\) converges to it. What a mouthful, eh? To understand what's really being said here, we need to unpack the inequality. The inequality essentially encodes the three heuristic conditions that we discussed in the previous part.
\[|f(\vec{a}_i)|\leq\frac{M}{2}|\vec{h}_{i-1}|^2\leq\frac{M}{2^i}|\vec{h}_0|^2,\] and by the squeeze theorem, we get the desired result as we take the limit \(i\to\infty\). A hefty task, I know. The theorem was only proven in 1948! The proof of uniqueness proceeds by showing that if \(\vec{y}\in U_1\) is a root, then \(\vec{y}\) must be the limit of \(i\mapsto a_i\), which suffices due to the uniqueness of the limit. As it turns out, this statement of the theorem can actually be strengthened, provided that we use a different type of norm of linear transformations in the inequalities of the theorem statement. Recall that we originally used the Frobenius norm. Now, we introduce the operator norm of a linear transformation. For a linear transformation \(A\colon\mathbb{R}^n\to\mathbb{R}^m\), the operator norm is defined as \[||A||=\sup{|A\vec{x}|},\] where \(\vec{x}\in\mathbb{R}^n\) is of unit length. As it turns out, we can replace the Frobenius norm with the operator norm in the inequalities of Kantorovich's theorem, and the theorem remains true. The reason why this gives us a stronger form of the theorem is that \[||A||\leq|A|,\] for any linear transformation. That is, the operator norm of a linear transformation is always at most the Frobenius norm of the linear transformation. This means that the inequalities of Kantorovich's theorem are easier to satisfy if we are allowed to use the operator norm instead. So in fact, we are not groping around in the dark when we use Newton's method. There are conditions where Newton's method is guaranteed to converge. Note that while Kantorovich gives a sufficient condition for convergence, it is not necessary. Now we can be a bit more justified when we use Newton's method in proving the implicit and inverse function theorems. Back in April, I discussed the basics of manifolds here, and mentioned that I might begin a blog series on the foundations of the implicit function theorem. I start the journey here.
I have thought long and hard about whether I should keep this exploration in the blog, or perhaps flesh it out and push it to the main mathematics page. Ultimately, I have decided to keep it here for a few reasons.
The material I am referencing and learning from is Vector Calculus, Linear Algebra, and Differential Forms, by Hubbard. This is the primary textbook for the MATH 31-series at UC San Diego. Newton's method is a central technique in numerical analysis, and the single-variable form is often introduced in single-variable calculus. The generalization to multiple variables is pretty direct. We wish to find roots of a differentiable function \(f\colon U\to\mathbb{R}^n\), where \(U\) is an open subset of \(\mathbb{R}^n\). We make a guess at the root, say \(\vec{a}_0\). Now, we determine the root of the linearization of \(f\) at \(\vec{a}_0\). This comes out to finding the solution \(\vec{a}_1\) to the equation \[f(\vec{a}_0)+[Df(\vec{a}_0)](\vec{a}_1-\vec{a}_0)=\vec{0}.\] If everything goes well, the solution \(\vec{a}_1\) is unique. Furthermore, if things are good, we can iterate this process, and the sequence \(i\mapsto\vec{a}_i\) converges to a root of \(f\). There are a few things to notice here. The first thing to notice is that for unique \(\vec{a}_i\) to exist at each step, the linear equation defining \(\vec{a}_i\) must have a unique solution: \[[Df(\vec{a}_{i-1})](\vec{a}_i-\vec{a}_{i-1})=-f(\vec{a}_{i-1}).\] For this to be the case, \([Df]\) needs to be a square matrix. This explains why the domain and codomain of \(f\) must have the same dimension. Furthermore, we must have that \([Df(\vec{a}_i)]\) is always invertible. These are two obvious conditions that must be met for Newton's method to converge to a root. There are some other intuitive conditions that we could impose that would ensure that Newton's method converges. Obviously, our initial guess should be somewhat close to a root in the sense that the function evaluated at that point is almost 0. That is, \(|f(\vec{a}_0)|\) should be small. Furthermore, the derivative should be large (in magnitude). This way, we have a large change in \(f\) from where we start, and have so we have a larger likelihood of approaching a root from that initial guess. Finally, the derivative of \(f\) must not change too much. If it did, then it does not matter if we have a large change in \(f\) where we start – that change in \(f\) itself can drastically change if we move a bit away from \(f\) if the derivative changes too much. We have already made it precise what we mean by \(\vec{a}_0\) should be "close" to being a root (\(|f(\vec{a}_0)|\) should be small). Now we make our third point more precise. A natural way to measure the rate of change of the derivative is by taking the second derivative. But perhaps even more natural than that is to take the linearization of the second derivative and bound the change in the image of two points under the derivative. To do this, we introduce the idea of a Lipschitz condition. Let \(U\) be an open subset of \(\mathbb{R}^n\) and let \(f\colon U\to\mathbb{R}^n\). Then \(f\) satisfies a Lipschitz condition on \(V\subset U\) with Lipschitz ratio \(M\) if for all \(\vec{x},\vec{y}\in V\), \[|f(\vec{x})-f(\vec{y})|\leq M|\vec{x}-\vec{y}|.\] This makes pretty good sense. In fact, contraction mappings, which we have discussed earlier, satisfy a Lipschitz condition, by definition. In particular, they are Lipschitz functions with \(0<M<1\). Another observation we can quickly make is that any Lipschitz function must also be continuous. What about differentiability? It turns out that deducing anything about differentiability from a Lipschitz condition is nontrivial, but indeed, Rademacher's theorem states that if \(f\) is Lipschitz on \(V\), then it is also differentiable almost everywhere on \(V\). Where the function happens to be differentiable and Lipschitz, the Lipschitz condition immediately places a bound on the derivative. In fact, as we will see, the converse is also true. A bounded derivative implies Lipschitz. But before we proceed further, we need to clarify something. We need the derivative to be Lipschitz. But the derivative is a matrix, and it is unclear how the norm of a matrix is defined. For our purposes, we use the Frobenius norm which is defined as just an extension of the normal vector norm. For a matrix \(A\), it is defined as \[|A|^2=\sum_{A}{a_{i,j}^2}.\] So now it makes sense for us to say that a "derivative is bounded". What we really mean is that the Frobenius norm of the Jacobian is bounded. Computing Lipschitz ratios is somewhat of an art, like integrating elementary functions. They require a bit of fiddling with inequalities, and typically we establish the region over which we wish to verify that the function is Lipschitz to be some open ball around the origin, so we force some inequalities ourselves. As it turns out, there is a more systematic way to do this – using the fact that a bounded derivative implies Lipschitz. Since we are bounding the derivative in the first place (i.e. trying to establish that the derivative is Lipschitz), we will actually be working with second partial derivatives. Let \(U\subset\mathbb{R}^n\) be an open ball and \(f\colon U\to\mathbb{R}^n\) be a \(C^2\) mapping. If \[\left|\frac{\partial f_i}{\partial x_k\partial x_j}(\vec{x})\right|\leq c_{i,j,k}\] for any \(\vec{x}\in U\) and for all triples \(1\leq i,j,k\leq n\), then the derivative of \(f\) is Lipschitz with Lipschitz ratio \[M=\sqrt{\sum_{1\leq i,j,k\leq n}{c_{i,j,k}^2}}\] over \(U\). The cost of using a systematic approach such as this is that the Lipschitz ratios that we obtain aren't quite as strong as we could get from fiddling with inequalities ourselves. But the main utility is that using the second derivatives to find a Lipschitz ratio for the derivative is quite straightforward. The proof of this statement is quite straightforward. It is a simple application of the mean value theorem. Now that we have the machinery to quantify any possible condition that would ensure that Newton's method converges, we can state Kantorovich's theorem. This theorem gives a sufficient (but not necessary) condition for Newton's method to converge. As such, it is an enormously important theoretical result. Without it, no one would really be justified in using Newton's method! We will discuss the theorem in the next post. The summer has arrived and I'm back in Florida! Let us begin with a cutie.
Problem: Consider the number \[k(n)=1\underbrace{4...4}_{\text{$n$ digit $4$'s}}.\] For what values \(n\) is \(k(n)\) a perfect square? Solution: \(k(0)\) is obviously a perfect square and \(k(1)\) is obviously not. For \(n>1\), we have that \(k(n)\) is even, since it ends with the digit 4. Any even square is divisible by 4. By long division, we find that for \(n>1\), \[\frac{k(n)}{4}=36\underbrace{1...1}_{\text{$n-2$ digit $1$'s}}.\] So it suffices to find the \(n\) for which this number is a square. Observe that squares are congruent to either 0 or 1 modulo 4 (if you've stuck around here for some time, you'd notice that we LOVE taking squares modulo 4). But, by long division, we see that \(\frac{k(n)}{4}\) is congruent to 3 modulo 4 whenever \(n>3\). So, we need to only check \(n=2\) and \(n=3\). Indeed, \(144=12^2\) and \(1444=38^2\). So \(k(n)\) is square precisely when \(n\in\{0,2,3\}\). \(\square\) I have a few ideas about what I plan on doing next here. I want to talk about Lebesgue integration. The theory of Riemann integration that we thoroughly developed in MATH 31CH is a powerful conceptual tool, but it has several weaknesses. For example, the Riemann integral is not well-defined over \(\mathbb{R}^n\) for functions with unbounded supports. The Lebesgue integral allows us to integrate over unbounded supports. This is just one of the advantages it gives. The Lebesgue integral is also considerably more well-behaved under limits, and we'll see that this gives rise to the dominated convergence theorem (DCT), regarding the interchangeability of integrals and limits (which is something that many of us take for granted *cough*MAO heuristics*cough*). This can be extended even further to talk about the interchangeability of differentiation and integration when they are composed on a function. Yes, that's right. The theory of Lebesgue integration is what forms the rigorous underpinnings of Feynman's trick! I will be going through these topics soon, in a main page post. I like this idea, because it sort of pulls together a bunch of stuff that I've seen or worked on for the past few weeks (my final project in MATH 31CH was on Lebesgue integration and Feynman's trick, which a homework problem in physics happened to use, and I also recently stumbled into a problem where one needs to apply DCT). I don't expect that I'll be proving certain things, such as DCT itself and the existence of the Lebesgue integral, because the proofs are quite technical, and honestly you'd be better off reading a textbook if you're looking for those. Instead, I will try my best to show the motivation behind the Lebesgue integral and its utility in approaching problems that are pretty much intractable with Riemann's theory. I should have more time to work on this now. After all, UCSD decided to cancel one of my summer classes (MATH 120A).... |
Categories
All
Archives
July 2023
|