Complex differentiability is a famously strong condition, especially when it occurs in a neighborhood of every point in the domain. One longstanding question that I had was about the qualitative difference between the complex derivative and the standard Fréchet derivative from multivariable calculus.
To be precise, suppose that \(f\colon\mathbb{R}^2\to\mathbb{R}^2\) is a function given by \(f(x,y)=(u(x,y),v(x,y))\). Let \(h\colon\mathbb{C}\to\mathbb{R}^2\) be the canonical identification \(h(x+yi)=(x,y)\), and let \(g\colon\mathbb{C}\to\mathbb{C}\) be given by \(g=h^{-1}\circ f\circ h\). Observe that there is a natural correspondence between \(f\) and \(g\), and we very well could have defined \(g\) first and then \(f\) (formally, \(f\) and \(g\) are related by conjugation which is an inner automorphism). What is the relationship between the Fréchet differentiability of \(f\) and the complex differentiability of \(g\)? When both derivatives exist, what is the relationship between the Fréchet derivative \(Df\) and the complex derivative \(g'\)? First, suppose \(g\) is complex differentiable at \(z\in\mathbb{C}\). Then, \[g'(z)=\lim_{w\to0}{\frac{g(z+w)-g(z)}{w}}\] exists. Pick \(\epsilon>0\). By the above, there exists \(\delta>0\) such that \(0<|w|<\delta\) implies that \[\left|\frac{g(z+w)-g(z)}{w}-g'(z)\right|=\frac{|g(z+w)-g(z)-wg'(z)|}{|w|}<\epsilon.\] In other words, we have that \[\lim_{w\to0}{\frac{|g(z+w)-g(z)-wg'(z)|}{|w|}}=0.\] Suppose \(\alpha=\mathrm{Re}\ g'(z)\), \(\beta=\mathrm{Im}\ g'(z)\), \(x=\mathrm{Re}\ w\), and \(y=\mathrm{Im}\ w\). Now, if we define the linear operator \[A=\begin{bmatrix} \alpha & -\beta\\ \beta & \alpha \end{bmatrix},\] we have that \[wg'(z)=(x+yi)(\alpha+\beta i)=\alpha x-\beta y+(\alpha y+\beta x)i=h^{-1}\left(\begin{bmatrix} \alpha & -\beta\\ \beta & \alpha \end{bmatrix}\begin{bmatrix} x\\ y \end{bmatrix}\right)=(h^{-1}\circ A\circ h)(w).\] Observe that since \(\mathbb{C}\) and \(\mathbb{R}^2\) are isomorphic as metric spaces, distances between corresponding points are the same irrespective of which space we decide to compute them in. In particular, if \(t=h(w)\) and \(s=h(z)\), then \[\frac{|g(z+w)-g(z)-wg'(z)|}{|w|}=\frac{|(h\circ g)(z+w)-(h\circ g)(z)-h(wg'(z))|}{|h(w)|}.\] But by our work above, \(h(wg'(z))=(A\circ h)(w)\), so \[\frac{|g(z+w)-g(z)-wg'(z)|}{|w|}=\frac{|(h\circ g)(z+w)-(h\circ g)(z)-(A\circ h)(w)|}{|h(w)|}=\frac{|f(s+t)-f(s)-At|}{|t|}.\] Taking \(t\to\vec{0}\) is equivalent to taking \(w\to0\), hence \[\lim_{t\to\vec{0}}{\frac{|f(s+t)-f(s)-At|}{|t|}}=\lim_{w\to0}{\frac{|g(z+w)-g(z)-wg'(z)|}{|w|}}=0.\] So \(f\) is Fréchet differentiable at \(h(z)\), with Fréchet derivative \(A\) which has a simple relationship with \(g'(z)\). On the other hand, if we start with the assumption that \(f\) is Fréchet differentiable, it is not necessarily true that \(g\) is complex differentiable. For example, suppose \(f(x,y)=x^2+y^2\), so \(g(z)=|z|^2\). Obviously, \(f\) is Fréchet differentiable everywhere. But, suppose that \(z_0=a+bi\) is a nonzero complex number. Then, we can compute the limit \[L_1=\lim_{x\to0}{\frac{g(z+x)-g(z)}{x}},\] when \(x\) is restricted to the real numbers. Along this path, we compute \[\begin{split} L_1&=\lim_{x\to0}{\frac{(a-x)^2+b^2-a^2-b^2}{x}}\\ &=\lim_{x\to0}{\frac{x^2-2ax}{x}}\\ &=\lim_{x\to0}{x-2a}\\ &=-2a. \end{split}\] On the other hand, we can take a limit along the imaginary axis. Define, \[L_2=\lim_{y\to0}{\frac{g(z+yi)-g(z)}{yi}}.\] Then, \[\begin{split} L_2&=\lim_{y\to0}{\frac{a^2+(b-y)^2-a^2-b^2}{yi}}\\ &=\lim_{y\to0}{\frac{y^2-2by}{yi}}\\ &=\frac{1}{i}\lim_{y\to0}{y-2b}\\ &=2bi. \end{split}\] Of course, since \(z_0\neq0\), we have that \(L_1\neq L_2\), so \(g'(z_0)\) fails to exist. The complex derivative of \(g\) only exists at \(0\), where it is easily checked that \(g'(0)=0\). So the complex differentiability of \(g\) is stronger than the Fréchet differentiability of \(f\). This leads to the next natural question: what do we need in addition to the Fréchet differentiability of \(f\) to ensure the complex differentiability of \(g\)? One can easily see that our argument that the complex differentiability of \(g\) implies the Fréchet differentiability of \(f\) works in reverse. In particular, if the Fréchet derivative of \(f\) is of the form given by \(A\), then the complex derivative of \(g\) exists at the point in question where it is \(\alpha+\beta i\). We can say more if we assume continuity of the complex derivative of \(g\) and the partial derivatives of \(u\) and \(v\). Suppose \(g'\) exists in an open connected set and is continuous. One may compute the complex derivative of \(g\) in two different ways. Just as we did above, if one computes the limit along the real axis, it is immediate that \[g'(z)=\frac{\partial u}{\partial x}(h(z))+i\frac{\partial v}{\partial x}(h(z)).\] On the other hand, computing the limit along the imaginary axis gives us \[g'(z)=\frac{\partial v}{\partial y}(h(z))-i\frac{\partial u}{\partial y}(h(z)).\] Equating these two expressions for \(g'(z)\) yields the Cauchy-Riemann equations \[\frac{\partial u}{\partial x}=\frac{\partial v}{\partial y}\qquad \frac{\partial u}{\partial y}=-\frac{\partial v}{\partial x}.\] These equations are actually precisely the extra nudge that Fréchet differentiability needs to translate into complex differentiability (with the assumptions of continuity we mentioned before). To see this, suppose that \(u\) and \(v\) satisfy the Cauchy-Riemann equations in an open connected set \(G\) and that they have continuous partial derivatives. Fix \(z=x+yi\in G\). Since \(G\) is open, there exists \(r>0\) such that \(B_r(z)\subseteq G\). Let \(h=s+ti\) such that \(|h|<r\). Then, by the mean value theorem, there exists \(s_1,t_1\in\mathbb{R}\) such that \(|s_1|<|s|\), \(|t_1|<|t|\), and \[u(x+s,y+t)-u(x,y+t)=u_x(x+s_1,y+t)s\] \[u(x,y+t)-u(x,y)=u_y(x,y+t_1)t.\] Now, we may define \[\varphi(s,t)=[u(x+s,y+t)-u(x,y)]-[u_x(x,y)s+u_y(x,y)t].\] Notice that \[\begin{split} \frac{\varphi(s,t)}{s+ti}&=\frac{[u(x+s,y+t)-u(x,y)]-[u_x(x,y)s+u_y(x,y)t]}{s+ti}\\ &=\frac{[u(x+s,y+t)-u(x,y+t)]+[u(x,y+t)-u(x,y)]-[u_x(x,y)s+u_y(x,y)t]}{s+ti}\\ &=\frac{u_x(x+s_1,y+t)s+u_y(x,y+t_1)t-[u_x(x,y)s+u_y(x,y)t]}{s+ti}\\ &=\frac{s}{s+ti}[u_x(x+s_1,y+t)-u_x(x,y)]+\frac{t}{s+ti}[u_y(x,y+t_1)-u_y(x,y)] \end{split}\] Observe that \(\left|\frac{s}{s+ti}\right|,\left|\frac{t}{s+ti}\right|\leq 1\). Moreover, since \(|s_1|<|s|\) and \(|t_1|<|t|\), taking \(s,t\to0\) will force \(s_1,t_1\to0\). Hence, the last line above gives \[\lim_{s+ti\to0}{\frac{\varphi(s,t)}{s+ti}}=0.\] Similarly, one can check that \[\psi(s,t)=[v(x+s,y+t)-v(x,y)]-[v_x(x,y)s+v_y(x,y)t]\] satisfies \[\lim_{s+ti\to0}{\frac{\psi(s,t)}{s+ti}}=0.\] Now, we can compute \[\begin{split} \frac{g(z,s+ti)-g(z)}{s+ti}&=\frac{u(x+s,y+t)+iv(x+s,y+t)-u(x,y)-iv(x,y)}{s+ti}\\ &=\frac{u(x+s,y+t)-u(x,y)}{s+ti}+i\frac{v(x+s,y+t)-v(x,y)}{s+ti}\\ &=\frac{u_x(x,y)s+u_y(x,y)t+\varphi(s,t)}{s+ti}+i\frac{v_x(x,y)s+v_y(x,y)t+\psi(s,t)}{s+ti}\\ &=\frac{u_x(x,y)s+u_y(x,y)t}{s+ti}+i\frac{v_x(x,y)s+v_y(x,y)t}{s+ti}+\frac{\varphi(s,t)+i\psi(s,t)}{s+ti}\\ &=u_x(x,y)+iv_x(x,y)+\frac{\varphi(s,t)+i\psi(s,t)}{s+ti}. \end{split}\] The last line is where we finally invoke the assumption that \(u\) and \(v\) satisfy the Cauchy-Riemann equations. Taking the limit \(s+ti\to0\) and using our above results on how \(\varphi\) and \(\psi\) will behave in these limits, we obtain \[g'(z)=(u_x\circ h)(z)-i(v_x\circ h)(z).\] Since the partial derivatives of \(u\) and \(v\) are continuous and \(h\) is continuous, we conclude that \(g'\) is continuous. That is, \(g\) is continuously differentiable on \(G\). It turns out that complex differentiability in a neighborhood of every point of the domain (holomorphicity) is a strong condition that implies pretty much all the nice things that you would want. In fact, it implies that the function is infinitely complex differentiable and analytic. In the future, I will talk about these relationships, which form the core of complex analysis.
0 Comments
There are certain theoretical preliminaries that precede the construction of most measures. One example of a measure that is more or less immediate from the definition of measures is the counting measure. It is the measure \(\mu\) on the measurable space \((X,\mathscr{P}(X))\) given by
\[\mu(E)=\sum_{x\in E}{1}.\] It is intuitive that this ought to be a measure, and it is easily checked that it is one. However, the modern construction of most other "nice" measures requires quite a bit of ground work. The first important result towards this direction is the following. Carathéodory's Theorem: Let \(\mu^*\) be an outer measure on \(X\). The collection \(\mathscr{M}\) of \(\mu^*\)-measurable subsets of \(X\) is a \(\sigma\)-algebra on \(X\), and the restriction of \(\mu^*\) to \(\mathscr{M}\) is a complete measure. As a side note, the idea of \(\mu^*\)-measurability is sometimes called the Carathéodory criterion which is that a subset \(A\) is \(\mu^*\)-measurable if for every \(B\) in the power set of \(X\), we have that \(\mu^*(B)=\mu^*(A\cap B)+\mu^*(A^c\cap B)\). Proof: It is clear by the definition of the Carathéodory criterion, and the fact that \((A^c)^c=A\), \(\mathscr{M}\) is closed under complements. It remains to show that \(\mathscr{M}\) is closed under countable unions to establish that it is a \(\sigma\)-algebra. First we show that it is closed under finite unions. Suppose \(A,B\in\mathscr{M}\). Let \(E\) be an arbitrary subset of \(X\). Then, since \(A\) is \(\mu^*\)-measurable, \[\mu^*(E)=\mu^*(E\cap A)+\mu^*(E\cap A^c).\] We can split up the first term using the \(\mu^*\)-measurability of \(B\). This gives us \[\mu^*(E)=\mu^*((E\cap A)\cap B)+\mu^*((E\cap A)\cap B^c)+\mu^*(E\cap A^c).\] Performing a similar expansion on the last term gives us \[\mu^*(E)=\mu^*(E\cap A\cap B)+\mu^*(E\cap A\cap B^c)+\mu^*(E\cap A^c\cap B)+\mu^*(E\cap A^c\cap B^c).\] In fact, the Carathéodory criterion is the natural condition on sets that allows them to be "building blocks" of other sets in the \(\mu^*\) sense. That is, for any finite collection of \(n\) \(\mu^*\)-measurable sets, the outer measure of any set can be expressed as the sum of the outer measures of intersections of the set with the \(2^n\) pieces that the collection forms. Now, observe that by Venn diagram, \(A\cup B=(A\cap B)\cup(A\cap B^c)\cup(A^c\cap B)\). It follows that \(E\cap(A\cup B)=(E\cap A\cap B)\cup(E\cap A\cap B^c)\cup(E\cap A^c\cap B)\). Subadditivity then yields \[\mu^*(E\cap(A\cup B))\leq\mu^*(E\cap A\cap B)+\mu^*(E\cap A^c\cap B)+\mu^*(E\cap A\cap B^c).\] To this, we add the De Morgan's law identity \(\mu^*(E\cap(A\cup B)^c)=\mu^*(A\cap A^c\cap B^c)\) to obtain \[\mu^*(E\cap (A\cup B))+\mu^*(E\cap(A\cup B)^c)\leq\mu^*(E).\] Of course, the reverse inequality follows immediately from subadditivity. Hence, \(\mathscr{M}\) is closed under finite unions and is at least an algebra. The outer measure is also at least finitely additive take \(A,B\in\mathscr{M}\) to be disjoint. Then, we have \[\mu^*(A\cup B)=\mu^*((A\cup B)\cap A)+\mu^*((A\cup B)\cap A^c)=\mu^*(A)+\mu^*(B).\] We need to extend these results to the countable case in order to establish the first part of the theorem. It suffices to consider a countable collections of disjoint sets from \(\mathscr{M}\), because the union of arbitrary countable collections from \(\mathscr{M}\) can be expressed as the union of a countable collection of disjoint sets from \(\mathscr{M}\) via the standard trick. So we let \(A_1,A_2,\dots\in\mathscr{M}\) be pairwise disjoint. Define \[B_n=\bigcup_{j=1}^{n}{A_j}\qquad B=\bigcup_{j=1}^{\infty}{A_j}.\] For any \(n\), since \(A_n\in\mathscr{M}\), we have that \[\mu^*(E\cap B_n)=\mu^*(E\cap B_n\cap A_n)+\mu^*(E\cap B_n\cap A_n^c).\] By definition, the first term is \(\mu^*(E\cap A_n)\) while the second term is \(\mu^*(E\cap B_{n-1})\). Repeating this inductively, we obtain that \[\mu^*(E\cap B_n)=\sum_{j=1}^{n}{\mu^*(E\cap A_j)}.\] Now, observe that \(B_n\subseteq B\), so \(B^c\subseteq B_n^c\) and \(E\cap B^c\subseteq E\cap B_n^c\). By monotonicity, we then have that \(\mu^*(E\cap B^c)\leq\mu^*(E\cap B_n^c)\). To this we add the identity above (reversed) to obtain \[\mu^*(E\cap B^c)+\sum_{j=1}^{n}{\mu^*(E\cap A_j)}\leq\mu^*(E\cap B_n^c)+\mu^*(E\cap B_n)=\mu^*(E).\] The far RHS comes from the fact that \(B_n\in\mathscr{M}\), since we have already established that \(\mathscr{M}\) is an algebra. Now, we take \(n\to\infty\) so that \[\begin{split} \mu^*(E)&\geq\mu^*(E\cap B^c)+\sum_{j=1}^{\infty}{\mu^*(E\cap A_j)}\\ &\geq\mu^*(E\cap B^c)+\mu^*\left(\bigcup_{j=1}^{\infty}{E\cap A_j}\right)\\ &=\mu^*(E\cap B^c)+\mu^*(E\cap B)\\ &\geq\mu^*(E). \end{split}\] The second line comes from countable subadditivity, the third line is by definition, and the last line comes from finite subadditivity. Note that we cannot immediately suppose that we have equality in the last line, because that would assume that \(B\in\mathscr{M}\), which is what we want to show in the first place. In any case, we have bounded \(\mu^*(E\cap B^c)+\mu^*(E\cap B)\) above and below by \(\mu^*(E)\), so in fact we do have equality in the last line. This establishes that \(\mathscr{M}\) is a \(\sigma\)-algebra. Countable additivity on \(\mathscr{M}\) comes immediately by taking \(E=B\). It remains to show that \(\mu^*|_{\mathscr{M}}\) is a complete measure. We already know that \(\mu^*\) will vanish on subsets of null sets by monotonicity. So we just need to show that subsets of null sets in \(\mathscr{M}\) are themselves in \(\mathscr{M}\). Let \(\mu^*(A)=0\). Then, by subadditivity, \(\mu^*(E)\leq\mu^*(E\cap A)+\mu^*(E\cap A^c)\), while by monotonicity, \(\mu^*(E\cap A)=0\) and \(\mu^*(E\cap A^c)\leq\mu^*(E)\), hence, \[\mu^*(E)\leq\mu^*(E\cap A^c)\leq\mu^*(E).\] Hence, \(A\) is \(\mu^*\)-measurable. In particular, if \(A'\subseteq A\) then we know that \(\mu^*(A')=0\) and by the above, \(A'\in\mathscr{M}\). We are done. \(\square\) Combining Carathéodory's theorem with the fact that premeasures on an algebra induce an outer measure, and every set in the algebra is measurable with respect to that outer measure, one can establish that a premeasure on an algebra can be extended to a measure on the generated \(\sigma\)-algebra. In fact: it is precisely the restriction of the induced outer measure to the generated \(\sigma\)-algebra. It turns out that this is the natural extension of the premeasure in the sense that the extension is unique if the premeasure is \(\sigma\)-finite, and in the general case, any other measure that extends the premeasure will be equal to the extension defined above on sets with finite measure. The construction of the Lebesgue measure thus proceeds by defining the natural premeasure on the set of half-open half-closed intervals on the real line, checking that this indeed a premeasure on an algebra, and then invoking the above result which tells us that it extends to a measure on the Borel \(\sigma\)-algebra of \(\mathbb{R}\). The completion of this measure is what we call Lebesgue measure. Taking the completion explodes the domain of the measure to a pretty large \(\sigma\)-algebra. In fact, the \(\sigma\)-algebra of Lebesgue measurable sets is so large that it has been shown that it is impossible to show that there is a Lebesgue nonmeasurable set without the axiom of choice. See here. The summer has progressed quite a bit. My REU at UC Santa Barbara has ended. I still cannot say much about the work that I did (I will have to wait until a preprint is up on arXiv before I can really talk about technical details). However, I can say that broadly, the research was in Lie theory. I have presented the work at the Southern California Math REU Conference and at the YMC. I also do plan to apply to a few other conferences.
Overall, the experience was great. I learned a lot about Lie theory and how math research is conducted in general. I found by lack of experience in algebra to be the main bottleneck. Often times, I was left gawking at some basic algebraic facts (such as the isomorphism theorems, universal properties, etc.). This was very reminiscent of the struggles I faced when studying algebraic topology this past spring quarter. Not only was the subject conceptually very difficult, but I was often stuck on the algebra. My deficiencies in algebra should be resolved this upcoming academic year. I have decided to drop my physics major. Superficially, this may seem like pretty major news, but honestly, not much has changed. Since early last year, I had been intending to pursue a PhD program in mathematics anyway. This is not to say that I am no longer interested in physics—much of my mathematical interests are from mathematical physics. The main issue is that the physics major at UCSD is structured in a very annoying way. It forces me to take a very particular sequence of classes in a very particular order. All things considered, the sequence ends up being far too slow for my liking. Students are denied access to the serious physics classes until their third year—but that is the time that one should really be thinking about grad school! When I was constructing my schedule for my upcoming fall quarter, I realized that I had hit a crossroads. If I continued with physics, I would have to take various lab classes with first and second year students. Alternatively, if I focused entirely on math, I would be able to immediately take some graduate courses. I decided that the opportunity cost of staying in physics was just too great. My schedule for the fall quarter consists of abstract algebra, logic, graduate-level complex analysis, and graduate-level real analysis. I will also be taking the Putnam seminar. I am especially excited about logic: this is a class that I really wanted to take at some point in my life, and now I am able to since I am no longer a physics major! I will also be taking the Putnam seminar with Daniel Kane once again. As this will be my final year before graduate school applications, I do wish to perform well this year. After resting for the past two weeks, I have finally begun preparations for that. My schedule should keep me busy this fall. I do not think I will be doing much else other than what I have indicated. I may do a little bit of reading on the side. Graduate school applications seem to be looming right around the corner and that worries me a bit, but I have about a year left. Hopefully, I can make it count. That the rationals are dense in the real numbers is a basic property that any student would learn in a first course in topology or real analysis. The standard proof uses the Archimedean property of the real numbers. Here, I outline a different way to prove that an irrational number can be approximated to arbitrary accuracy by rational numbers. This is, in some sense, the "hardest" case to prove, if we attempt to show that rational numbers and irrational numbers can both be approximated by rationals, separately.
The idea is that a finitely generated additive subgroup of \(\mathbb{R}\) need not be discrete. Let \(x\) be irrational and \(p\) rational. Consider the set of real numbers of the form \(ax+bp\) where \(a,b\in\mathbb{Z}\). Suppose there is a nontrivial solution to \(ax+bp=0\). Then, \(x=-\frac{bp}{a}\in\mathbb{Q}\), a contradiction. Hence, no integer multiple of \(x\) is equal to an integer multiple of \(p\). If we let \(p=1\), we discover a consequence of this: if we move around a circle in steps by doing \(x\) rotations around the circle each step, we will never reach the same point more than once. Let \(S\) denote the subset of points of the circle that we will reach by performing these steps. Our reasoning shows that \(S\) is countably infinite. If we partition the circle into \(N\) equal arcs, where \(N\) is any integer, it follows from the pigeonhole principle that there exists an arc that contains infinitely many of the points on the circle we will reach by making such steps. This means that for any arbitrarily small neighborhood size, one may find a point on the circle with a neighborhood of that size that has infinitely many points from \(S\). By rotating the starting point of our steps, we rotate the point on the circle that is guaranteed the existence of this neighborhood. So every point on the circle has infinitely many points from \(S\). That is, \(S\) is dense in the circle. Let \(p=q\) where \(q\) is some fixed nonzero rational number. Since \(S\) is dense in the circle, and the integer multiples of \(q\) can be represented as points in the circle, we have that for any \(\epsilon>0\), we can find \(m,n\in\mathbb{Z}\) (with \(n\neq0\)) such that \[|mq+nx|<\epsilon\Longrightarrow\left|x+\frac{mq}{n}\right|<\frac{\epsilon}{|n|}\leq\epsilon.\] So \(-\frac{mq}{n}\) is a rational number that is within \(\epsilon\) of \(x\). Two days ago, I had an interesting interaction with some other students in my REU that destroyed two misconceptions I have had in analysis for a long time.
First, we introduce the idea of a bump function. Let \(M\) be a smooth manifold with \(A\subseteq U\subseteq M\), where \(A\) is closed and \(U\) is open. We define a bump function for \(A\) supported in \(U\) to be a continuous function \(\psi\colon M\to\mathbb{R}\) with \(0\leq\psi\leq1\) on \(M\), \(\psi=1\) on \(A\), and \(\mathrm{supp}\ \psi\subseteq U\). So far so good. It is not a particular surprise that such a function exists. We only require that \(\psi\) be continuous. Here is where my intuition had failed me for years up to this point: smooth bump functions exist! This was a complete shock to me. I was not familiar with any smooth functions that were constant in some neighborhood, and non-constant elsewhere. Moreover, I have thought about this scenario many times before and concluded that it ought to be impossible. I figured if a function was of class \(C^{\infty}\), then it could not be constant (so that its derivatives of all orders vanished) and then suddenly get kicked and start moving. I believed that for such a "kick" to occur, one of the functions derivatives must be discontinuous, staying at zero in some neighborhood before suddenly jumping to some nonzero value. Alas, I was wrong. Smooth bump functions do exist. This follows from the existence of smooth partitions of unity. Most of the technical work that goes into proving that smooth bump functions exist is actually hidden under the rug of proving that smooth partitions of unity exist. But taking that for granted, we note that \(U\) and \(M\setminus A\) form an open cover of \(M\). Then, one may find a smooth partition of unity \(\{\psi,\lambda\}\) subordinate to this cover. By definition, \(\lambda\) vanishes on \(A\), so \(1=\psi+\lambda=\psi\) on \(A\). Indeed, this \(\psi\) is the desired bump function! Having crushed one of my childhood misconceptions, another was crushed the other day when I saw the following proposition in Lee's Introduction to Smooth Manifolds. Proposition: Let \(M\) be a smooth manifold with or without boundary, \(p\in M\), \(v\in T_pM\). If \(f,g\in C^{\infty}(M)\) agree on some neighborhood of \(p\), then \(vf=vg\). The statement of this theorem is not particularly relevant, but the hypothesis immediately caught my eye. Why is it stipulated that \(f\) and \(g\) agree just on some neighborhood of \(p\)? I had always figured that the evolution of smooth functions were determined by their behavior in any neighborhood (similar to how continuous functions are determined by their behavior on dense subsets). The motivation here was the uniqueness of solutions to differential equations and dynamical systems. Once one describes a global relationship between a a function and its derivatives, and an initial condition, the evolution of the function past the starting point is determined. I believed that if we strengthen the agreement between two functions to occur not at a single point, but an entire neighborhood, and that both functions were \(C^{\infty}\), then the evolutions of the functions beyond on the neighborhood must also agree. In other words, I did not understand why the proposition said that \(f\) and \(g\) agree on some neighborhood, because I thought that that occurs precisely when they agree everywhere. Alas, this is also wrong. Upon discussing this with some other students at my REU, I found the counterexample I was looking for. Using our previous notation, if one considers \(g=\psi f\), where \(\psi\) is a bump function, both functions agree in any neighborhood contained in \(A\) by construction, but are very free to disagree outside of \(A\)! Somehow, it seems that bump functions allow us to construct strange functions whose derivatives fail to propagate local information globally, even though the functions are infinitely smooth! This observation is further strengthened by the fact that bump functions are actually used in the proof of the proposition written above (and the proposition itself is a statement about how tangent vectors do not care about global behavior—they only care about local behavior). No contradiction is reached with my intuition stemming from unique solutions to differential equations. After all, having a relationship between a function and its derivatives that holds everywhere is quite a strong condition, and it's not any surprise that unique evolution is forced in that scenario. After a little bit of digging, I found the gap in my knowledge. The identity theorem states that analytic functions that agree on some neighborhood must agree everywhere. So it seems that I have stumbled upon a crucial qualitative difference between analytic functions and smooth functions. In particular, analyticity is a strong enough condition for local behavior (at least in a neighborhood) to propagate globally, while smoothness is not. Obviously, any concrete examples that I attempted to think of were analytic. My intuition is still grappling with this. But it's good to know that I was wrong for so long. Here is a problem from Chapter 1 of B. C. Hall. Lie groups, Lie algebras and Representations that I am working on for my REU reading.
Problem: A subset \(E\) of a matrix Lie group \(G\) is called discrete if for each \(A\) in \(E\) there is a neighborhood \(U\) of \(A\) in \(G\) such that \(U\) contains no point in \(E\) except for \(A\). Suppose that \(G\) is a path-connected matrix Lie group and \(N\) is a discrete normal subgroup of \(G\). Show that \(N\) is contained in the center of \(G\). Solution: To solve this problem, I was first immediately reminded of the map \(\Phi_X(A)\colon N\to G\) given by \(\Phi_X(A)=XAX^{-1}\) that Hall introduces earlier in the chapter, where we have \(X\in G\) and \(A\in N\). This map is obviously continuous in both \(X\) and \(A\) due to the nature of matrix multiplication. More motivation for thinking of this map comes from the fact that \(\Phi_X\) actually has its image as a subset of \(N\) since \(N\) is normal. I saw that if \(A\) was in the center, it must be true that \(\Phi_X(A)=XX^{-1}A=A\) for any choice of \(X\in G\). Moreover, the converse is true, since \(XAX^{-1}=A\) implies \(XA=AX\). So it sufficed to show that \(XAX^{-1}=A\). A common construction in path-connected abstract topological spaces is to draw paths connecting a point of interest to an "important point" in the space. I have seen this construction before in algebraic topology, where numerous proofs involving path-lifting requires paths to be drawn from a chosen point in the fiber (which determines the lift), to some other point of interest. In our case, \(G\) is a topological group, and an important element of any group is the identity. So we let \(\lambda_X\colon[0,1]\to G\) to be the path in \(G\) such that \(\lambda_X(0)=X\) and \(\lambda_X(1)=I\). Now define \(f\colon[0,1]\to G\) given by \[f(t)=\Phi_{\lambda_X(t)}(A).\] What I had to show was clear: the function \(\Phi\) evaluated at \(A\) must be constant along the path \(\lambda_X(t)\). In other words, \(f(t)\) needed to be the constant function with image \(A\). At this point, I began fiddling with a literal notion of "closeness". One may put the metric induced by the Hilbert-Schmidt (Frobenius) norm on \(G\) to turn it into a metric space, and do some \(\epsilon\)-\(\delta\) calculations to determine that in fact \(f(t)=A\) for all \(t\). But I later realized that this is not a nice solution, since we are imposing additional (unnecessary) structure on \(G\). It turns out that we can prove the claim entirely topologically. First note that as a composition of continuous functions, \(f\) is continuous. Since \(N\) is discrete, around each \(Y\in\mathrm{Im}\ f\), there exists an open neighborhood \(U_Y\subseteq G\) that intersects \(N\) only at \(Y\). By the continuity of \(f\), the sets of the form \(f^{-1}(U_Y)\) are open in \([0,1]\) (with the subspace topology inherited from \(\mathbb{R}\)) and form an open cover of \([0,1]\), which is compact by the Heine-Borel theorem. So we can extract a finite subcover. But the sets in the cover are preimages, so they are pairwise disjoint. The only way to write \([0,1]\) as a finite union of pairwise disjoint sets that are open in \([0,1]\) is if \([0,1]\) is the only (nonempty) set in the union. So one of the preimages is \([0,1]\), the entire domain of \(f\). That is, there is only one element in the image of \(f\). Since \(f(1)=IAI=A\), this element is \(A\). \(\square\) Yeah, I know. Long time no see. I've been busy. I do intend to add all the stuff that has been on my mind eventually. Anyway, a quick update.
Overall, good progress I would say. Not much more I could ask for during a pandemic. It is easy to see that the set of all \(3\times3\) real matrices, which I will denote by \(\mathbb{M}_{3\times3}\) is a vector space. Consider \(H\subset\mathbb{M}_{3\times3}\) defined by
\[H=\{A\in\mathbb{M}_{3\times3}\ \text{such that }\det{(A+A^T)}=0\}.\] Is \(H\) a subspace of \(\mathbb{M}_{3\times3}\)? It turns out, the answer is no. \(H\) has a zero element and is closed under scalar multiplication, but it is not closed under vector addition. It is sufficient to find \(A,B\in H\) with \(A+B\notin H\). Easy, right? Just take \[A=\begin{bmatrix} 1 & 2 & 1\\ 2 & 1 & 2\\ 1 & 2 & 1 \end{bmatrix}\] \[B=\begin{bmatrix} \sqrt{2} & \sqrt{3} & 2\sqrt{2}\\ \sqrt{3} & \sqrt{2} & \sqrt{3}\\ 2\sqrt{2} & \sqrt{3} & \sqrt{2} \end{bmatrix},\] and we are done. \(\square\) As satisfying it would be to just leave these matrices without any trace of where they come from, it contradicts the mantra: a slick write-up is not an instructional one. The first thing to observe is that a matrix plus its transpose is always a symmetric matrix. This is easy to show (addition on real numbers is commutative). So the condition \(\det{(A+A^T)}=0\) is really saying that a certain symmetric matrix related to \(A\) is singular. Moreover, if a matrix is already symmetric, then it is equal to its transpose. This means that the sum of itself with its transpose is equal to two times itself. That is, if \(A\) is a symmetric, \(A+A^T=2A\). But determinants are multilinear (linear on each column vector). So \(\det{2A}=2^n\det{A}\), if \(A\) is an \(n\times n\) matrix. This means that if we choose a symmetric \(A\in\mathbb{M}_{3\times3}\) that is singular, then \(A\in H\), since \(\det{(A+A^T)}=\det{2A}=8\det{A}=0\). In other words, if we choose a symmetric matrix from \(H\), instead of worrying about the sum of that matrix with its transpose, we can just focus on the matrix itself (that is, if \(A\) is symmetric, then \(\det{(A+A^T)}=0\) if and only if \(\det{A}=0\)). So let's investigate symmetric \(3\times3\) matrices. They look like this \[S=\begin{bmatrix} a & b & c\\ b & d & e\\ c & e & f \end{bmatrix}.\] Full disclosure: I stumbled upon this SE post which inspired me to consider the special case of a \(3\times3\) symmetric matrix where \(d=f=a\) and \(e=b\): \[S=\begin{bmatrix} a & b & c\\ b & a & b\\ c & b & a \end{bmatrix}.\] We compute the determinant of such a matrix. \[\begin{split} \det{S}&=a(a^2-b^2)-b(ab-bc)+c(b^2-ac)\\ &=a(a^2-b^2)-b^2(a-c)+b^2c-ac^2\\ &=a^3-b^2a+b^2c-ac^2-b^2(a-c)\\ &=a(a+c)(a-c)-b^2(a-c)-b^2(a-c)\\ &=(a-c)(a^2+ac-2b^2). \end{split}\] In essence, there are three ways that this special case symmetric matrix can be singular. Either only the first factor is equal to zero, only the second factor is equal to zero, or both factors are equal to zero. The sum of two symmetric matrices is also a symmetric matrix, so applying our previous reasoning, instead of worrying about the determinant \(\det{(A+B+(A+B)^T)}\), if we select \(A\) and \(B\) to be symmetric, we need to only worry about \(\det{(A+B)}\). Now watch what happens if we let \(A\) and \(B\) be special-case symmetric matrices in \(H\) with \(A\) having only the first factor in its determinant equal to zero and \(B\) having only the second factor in its determinant equal to zero. In other words, write \[A=\begin{bmatrix} a & b & c\\ b & a & b\\ c & b & a \end{bmatrix}\] \[B=\begin{bmatrix} x & y & z\\ y & x & y\\ z & y & x \end{bmatrix}\] with \(a-c=0\neq a^2+ac-2b^2\), and \(x^2-xz-2y^2=0\neq x-z\). In that case, \[A+B=\begin{bmatrix} p & q & r\\ q & p & q\\ r & q & p \end{bmatrix},\] where \(p=a+x\), \(q=b+y\), and \(r=c+z\). Clearly then it follows that \[p-r=a+x-c-z=x-z\neq0,\] \[p^2-pr-2q^2=(a+x)^2-(a+x)(c+z)-2(b+y)^2=a^2+ac-2b^2+2ax-az-cx-4by=\xi.\] So as long as we choose \(a,b,c,x,y,z\) such that \(\xi\) is nonzero, the determinant of \(A+B\) is guaranteed to be nonzero! This is easy to accomplish. Setting \(a=c=1\) and \(b=2\) satisfies our conditions for the matrix \(A\). Setting \(x=\sqrt{2}\), \(y=\sqrt{3}\), and \(z=2\sqrt{2}\) satisfies our conditions for matrix \(B\), and some quick algebra verifies that \(\xi\neq0\). The result follows! Definitely a fun problem. I reckon there is a simpler solution. My apartment-mate found a counterexample against the closure of \(H\) under addition by trying random things on Matlab. I haven't yet looked into why that solution works, and its corresponding generalization. Stick around for updates! As you can tell, there has been a bit of a hiatus. Haven't really been productive (beyond just working on my classes) in the meanwhile. But I'm back now! I suppose this will be a bit of a meta post, followed by some electromagnetic theory.
First of all, I'm working on a research project (along with principal investigator Dr. Thomas Siegert). The project pertains the impact of small solar system bodies (SSSBs) on gamma ray data from INTEGRAL. I'll probably talk a bit more about this once I move farther into the project and have a better grasp of things. I'm also giving a lecture on linear algebra at the San Diego Math Circle (SDMC). I plan on drawing inspiration from some of the stuff that I've exposited on here. In particular, I really want to show the students that linear algebra is a lot more than just Euclidean vectors and solving simultaneous linear equations. That lecture will be happening this Saturday (11/14). Some of the heuristic arguments in physics rub me in the wrong way (due to the lack of mathematical rigor), but I cannot deny the importance of being able to reason quickly with heuristics in physics (and in general, using ad hoc methods that aren't entirely rigorous is, in my opinion, both natural and fine in the path to a more rigorous solution). Here are some examples. The electric field and potential are related by \(\vec{E}=-\nabla\phi\). It follows that the divergence of the electric field is related to the Laplacian of the potential: \[\nabla\cdot\vec{E}=-\nabla^2\phi.\] Gauss' law tells us that \(\int_{S}{\vec{E}\cdot\mathrm{d}\vec{a}}=\frac{Q}{\epsilon_0}=\frac{1}{\epsilon_0}\int_{V}{\rho\ \mathrm{d}v}\), where \(\rho\) is the charge density. However, Gauss' theorem says that the integral of the flux over the surface is equal to the integral of the divergence in the volume. Equating the integrands gives us \(\nabla\cdot\vec{E}=\frac{\rho}{\epsilon_0}\), hence \[\nabla^2\phi=-\frac{\rho}{\epsilon_0}.\] But when \(\rho=0\), as is the case in empty space where there are no charges, the potential function must satisfy Laplace's equation \[\nabla^2\phi=0.\] This equation is pretty important, and there is quite a bit of theory behind it, as it not only pops up here, but in other areas (such as heat transfer). I don't even remember much of the basic theory we learned about it in MATH 110, but there is one property of functions that satisfy Laplace's equation (called harmonic functions) that is relevant here. Theorem: Let \(f\colon U\to\mathbb{R}\) be a harmonic function over the set \(U\subset\mathbb{R}^3\). Then, the average value of \(f\) over any sphere contained in \(U\) is equal to the value of \(f\) at the center of that sphere. An analogous result holds in two dimensions as well, if \(U\subset\mathbb{R}^2\) and we replace spheres with circles. Proof (sketch): I know of two ways of thinking about this. Ironically, neither of them are entirely rigorous. One of them is something you'd expect to encounter from an applied math class (like MATH 110), and the other is a physical argument that applies just to electric potentials (provided in Purcell). The applied math-y way is to consider the Taylor expansions of \(f\) incremented and decremented by \(h\) in each variable, one at a time. Then, you add each pair (for instance, \(f(x+h,y,z)+f(x-h,y,z)\)). This eliminates mixed partials (at least the lower-order ones). Then, you can isolate the repeated second partial derivative terms (partial with respect to \(x\) and then \(x\), etc.) in each pair-sum. Plugging these expressions into Laplace's equation, one finds that neglecting the higher-order terms, the value of \(f\) is equal to the to the average of the values of function at the points that are incremented and decremented by \(h\) away from the center, in each direction. Heuristically, we can extend this to every direction in a full sphere around the center point. Physically, we consider a point charge \(P\) with charge \(Q\) and a sphere \(\Omega\) that has a charge \(q\) distributed uniformly over it, such that the center of \(\Omega\) is a distance \(R\) from \(P\). We can compute how much work it takes to construct this configuration, in two different ways. The first way is to note that due to the shell theorems, outside of \(\Omega\), it is electromagnetically indistinguishable from a point charge with charge \(q\) at its center. Hence, the work required to assemble the configuration is the same as the work required to bring two point charges together (or more precisely, bring \(P\) in from infinity). This is simply \(\frac{Qq}{4\pi\epsilon_0R}\). On the other hand, instead of bringing \(P\) in from infinity, we can bring \(\Omega\) in from infinity. Then, the work done must be equal to the average potential over \(\Omega\) times the total charge of \(\Omega\). But clearly, the work done doesn't change from us thinking about it this way! So we still have a work of \(\frac{Qq}{4\pi\epsilon_0R}\). This means that the average potential over \(\Omega\) is simply \(\frac{q}{4\pi\epsilon_0R}\), which is precisely the potential at the center of \(\Omega\) due to \(P\). So the electric potential function satisfies the theorem in empty space that does not contain charges, where there is only one point charge in the universe. Superposition gives the general result for arbitrary charge distributions in the universe. A simple corollary of this result is that no harmonic function can have local extrema (or at least, extrema will be on the boundary of the domain over which the function is harmonic). This leads itself naturally into Earnshaw's theorem, which states that there exists no configuration that provides a stable equilibrium for a point charge. To see this, observe that if there was a point in space where there was a stable equilibrium, the potential there must be a local minimum, a contradiction. Alternatively, the electric field at such a point would have to have a negative divergence. Gauss' law then says a negative must exist at that point, contradicting our assumption that we were considering empty space. There's quite a bit more to talk about, but I'll end it here for now. I've procrastinated quite a bit and I need to catch up on other stuff! Here is an interesting problem that has surprisingly far-reaching consequences.
Consider a permutation \(\sigma\colon S\to S'\) of some list, say \(S=\{1,2,\dots,n\}\) that maps \(S\mapsto\{k_1,k_2,\dots,k_n\}\), where \(k_i\in S\). Define a transposition to be a permutation that only swaps two elements of the list it acts on. It is clear that every permutation can be decomposed into transpositions. Seriously. It is intuitively obvious. But if we wanted to be a bit more rigorous, observe that we can give a state to every element \(j_i\in S\). Let \(j_i\) be correct if \(j_i=\sigma(j_i)\), that is, \(j_i\) is a fixed point under the permutation. Let \(j_i\) be incorrect otherwise. For every incorrect element in \(S\), there must exist another incorrect element in \(S\) such that the transposition of the two elements makes the former element correct. So, we can iteratively exclude correct elements, find the unique elements that are in the correct spot of each incorrect element, and perform the transpositions. Since \(S\) has finite cardinality, this algorithm terminates, and in fact leaves every element correct. So any permutation can be decomposed into transpositions. But this does not say anything about the decomposition itself. In fact, there are an infinite number of sequences of transpositions that when composed, yield a given permutation. This is trivial, since we don't have to necessarily stop after the aforementioned algorithm terminates and everything is in the correct spot. You could perform another transposition to "mess things up" and then reverse it, and then continue that an arbitrary number of times. However, a powerful invariant hides behind all of these decompositions. For a given permutation, the number of transpositions in any decomposition of that permutation always has the same parity. This intuitively makes a lot of sense, though it's not very clear how to prove it. I suppose it would perhaps be possible to do so by performing a case analysis with my states (by considering transpositions of the form "incorrect/correct to correct/incorrect", "correct/correct to incorrect/incorrect", etc.). But this is a bit ugly. There is a much more elegant way, though it is arguably more complicated if you are required to prove the preliminaries from scratch as well. This parity of a permutation is called its signature, and we are concerned with its existence and uniqueness. The signature of a permutation is \(1\) when a permutation can only be decomposed into an even number of transpositions and \(-1\) otherwise. But there is actually an equivalent definition of signature that we can give with which it is much easier to probe the questions of existence and uniqueness. Let \(P_n\) denote the set of all permutations that act on lists with cardinality \(n\). Then the signature is defined as the function \[\varsigma\colon P_n\to\{-1,1\}\] such that \(\varsigma(\sigma_1\circ\sigma_2)=\varsigma(\sigma_1)\varsigma(\sigma_2)\) for all \(\sigma_1,\sigma_2\in P_n\) and \(\varsigma(\tau)=-1\) if \(\tau\) is a transposition. The signature is usually denoted by "sgn", but due to MathJax limitations, I will denote it with \(\varsigma\). We need to show that this definition is equivalent to our parity definition, and also that the signature exists and is unique (using this definition). Showing the equivalence of the definitions is trivial. Since the signature of the composition is the product of the signatures, and the signature of a transposition is \(-1\), we must have that the signature is \(\varsigma(\sigma)=(-1)^{T(\sigma)}\), where \(T(\sigma)\) gives the number of transpositions in any one of the decompositions of \(\sigma\). Now we must prove that this function exists and is unique. Proving existence is the hard part. For any permutation \(\sigma\in P_n\), define the permutation matrix \(M_{\sigma}\) as the matrix that maps \[M_{\sigma}\vec{e}_i=\vec{e}_{\sigma(i)},\] where \(\vec{e}_i\) is just a standard basis vector in \(\mathbb{R}^n\). It follows that \(M_{\sigma}\) is some \(n\times n\) matrix which only 0's and 1's, which I won't compute explicitly (though it isn't hard to). It is easy to see for \(\sigma_1,\sigma_2\in P_n\), we have \(M_{\sigma_1\circ\sigma_2}=M_{\sigma_1}M_{\sigma_2}\). Here comes our leap of faith. Define \[\varsigma(\sigma)=\det{M_{\sigma}}.\] This fits our first property, since the determinant of a product is the product of the determinants! Furthermore, the second property is satisfied as well, since one can easily show that for some transposition \(\tau\), the matrix \(M_{\tau}\) is found by transposing two columns in the identity matrix (and so the determinant is negated, due to the antisymmetry of the determinant – this argument is sometimes phrased in terms of the determinant of an elementary matrix representing the transposition of two columns). Hence, the function \(\varsigma\) exists. But, if \(\varsigma\), a function, exists, then each element in the domain can only map to one element in the codomain, so it is also unique. By equivalence of definitions, this means that the number of transpositions of every decomposition of \(\sigma\) into transpositions always has the same parity! The signature ends up being quite important for various definitions in the theory of forms in vector calculus, such as the wedge product. All of this from just viewing a permutation as a series of transpositions. |
Categories
All
Archives
July 2023
|