The Jacobian & Divergence

On the last post, I gave a naive idea of where the Jacobian determinant comes from when changing variables: it was the volume of an approximate parallelotope. The rest of the post proved that determinant did indeed measure volume, change in volume under linear transformations to be specific. However, proving that the approximate parallelotopes are good enough to compute the integral is a very difficult problem. So instead I'm going to try to bypass it altogether by using the divergence theorem. The volume intuition for the Jacobian is lost and it instead appears almost out of nowhere, a byproduct of the algebra.

Definition 1. Given \( \varphi: {\mathbb R}^n \to {\mathbb R}^n \), the Jacobian matrix \( J(\varphi) \) is given by \[ J(\varphi)_{i,j} = \partial_j \varphi_i \] Notation: \( \varphi_i \) means the projection of \( \varphi \) to the \( i- \)th coordinate and \( \partial_j \) means the partial derivative with respect to the \( j- \)th variable.

Theorem 1 (Change of Variables). Suppose \( R, \tilde R \subset {\mathbb R}^n \) are “nice” regions, \( \varphi : R \to \tilde R \) is a homeomorphism with continuous second partial derivatives, and \( J = J(\varphi) \) is nonsingular on \( R \). Then for all continuous functions \( f: {\mathbb R}^n \to {\mathbb R} \), we have \[ \int_{R'} f\, dV^n = \int_R (f\circ\varphi) |\det J| \, dV^n \] Note: For simplicities sake, I'll use a single integral sign \( \int \) even when dealing with multiple integrals. The superscript on the differentials will denote the dimension for the integral. So \( dV^3 \) for regular triple integrals and \( dS^2 \) for surface integrals.

To prove this, I will use the divergence theorem which relates the integral over regions like \( R, \tilde R \) with the net outward flux of some vector field across their respective boundaries \( \Sigma, \tilde \Sigma \), which are homeomorphic to \( S^{n-1} \). To be clear, the theorem as stated and proven only applies to very special cases: “nice” regions for which the divergence theorem applies; homeomorphism must have continuous second partial derivative (or at least, first partials exist and mixed partials are equal). In all the references to the theorem that I have seen, the regions are simply open in \( {\mathbb R}^n \) and the homeomorphism has continuous first-order partial derivatives. See [Shwartz], [Lax1], and [Lax2] in the references for more complete proofs with very different approaches as well.

Divergence Theorem. Given a simple closed \( (n-1)- \)manifold \( \Sigma \subset {\mathbb R}^{n} \) which encloses a region \( R \) and a \( n \)-dimensional vector field \( \vec F \) defined on \( R \), then \[ \oint_\Sigma \vec F \bullet \vec n \, dS^{n-1} = \int_R \left( \nabla \bullet \vec F\right) \, dV^{n} \]

When \( n=2 \), this is the flux form Green's Theorem. \( n=3 \) is the divergence theorem taught at the end of Calc 3, see this sketch proof. It isn't too hard to see how to generalize that proof to higher dimensions. The real issue is that all proofs of Green's Theorem or Divergence Theorem rely on the fact that “nice” regions (no formal definition I can find) can be decomposed into finitely many simple regions.

Proof of Thm 1, \( n=2 \): Let \( r: I \to {\mathbb R}^2 \) be a parametrization of \( \Sigma \) the boundary of \( R \). By our hypothesis, \( \tilde r = \varphi \circ r \), is a parametrization of \( \tilde \Sigma \). Reversing the orientation of \( r \) if necessary, suppose that \( \tilde r \) is oriented counter-clockwise. Also define \( f^1 \) as the partial antiderivative of \( f \) w.r.t. the first-variable and \( f^2 \) is defined similarly. Now apply the divergence theorem to \( \tilde R \): \[ \begin{aligned} \int_{\tilde R} f \, dV^2 &= \frac{1}{2} \oint_{\tilde \Sigma} \left< f^1, f^2 \right> \bullet \vec n \, dS^1 \\ &= \frac{1}{2} \int_I \left< f^1, f^2 \right>_{\tilde r} \bullet \left< \tilde r_2', -\tilde r_1'\right>_t \, dt \\ & \qquad \text{apply chain-rule and use matrix notation} \\ &= \frac{1}{2} \int_I \begin{bmatrix}f^1 & f^2 \end{bmatrix}_{\varphi \circ r} \begin{bmatrix} \partial_1 \varphi_2 \cdot r_1' + \partial_2 \varphi_2 \cdot r_2' \\ \partial_1 \varphi_1 \cdot r_1' + \partial_2 \varphi_1 \cdot r_2' \end{bmatrix}_t \, dt; \\ & \qquad \partial_i \varphi_j \text{ is evaluated at } r(t) \text{ in above matrix} \\ &= \frac{1}{2} \int_I \begin{bmatrix}f^1\circ \varphi & f^2\circ \varphi \end{bmatrix}_{r} \begin{bmatrix} \partial_2 \varphi_2 & -\partial_1 \varphi_2 \\ -\partial_2 \varphi_1 & \partial_1 \varphi_1\end{bmatrix}_r \begin{bmatrix} r_2' \\ - r_1' \end{bmatrix}_t \, dt \\ &= \pm \frac{1}{2} \int_\Sigma \left(\begin{bmatrix}f^1\circ \varphi & f^2\circ \varphi \end{bmatrix} \begin{bmatrix} \partial_2 \varphi_2 & -\partial_1 \varphi_2 \\ -\partial_2 \varphi_1 & \partial_1 \varphi_1\end{bmatrix}\right) \bullet \vec n \, dS^1 \\ &= \int_R (f\circ \varphi) | \det J | dV^2 \end{aligned} \]\(~\Box\)

The plus/minus sign accounts for the fact \( \left< r_2', - r_1' \right> \) might be an inward normal vector for \( \Sigma \). We have a minus sign iff the change of orientation of \( r \) was necessary at the beginning of the proof. The change of orientation is necessary iff the \( \det J < 0 \) on \( R \). It is a tedious exercise to verify that the divergence of the last vector field is infact \( = 2(f \circ \varphi) \det J \). This is where the hypothesis on the second partial derivatives is needed. Since the sign on the outside match the sign of the determinant, we simply have its absolute value. I am also using the fact \( J \) is nonsingular; by continuity, this means \( \det J \) never changes signs on \( R \). Finally, notice that the \( 2 \times 2 \) matrix in the last flux integral is the cofactor matrix of the Jacobian!

Definition 2. For any \( n \times n- \)matrix \( M \), the cofactor matrix \( C(M) \) is the matrix whose entries are the signed minors: \[ C(M)_{i,j} = (-1)^{i+j} M_{ij} \] Recall: the minor \( M_{ij} \) is the determinant of the matrix that you get by deleting the \( i- \)th row and \( j- \)th column.

Proof of Thm 1, \( n=3 \): Let \( C = C(J) \) be the cofactor matrix of the Jacobian, \( r, \tilde r:U \to {\mathbb R}^3 \) a parametrization of \( \Sigma, \tilde \Sigma \) as before and assume that \( \partial_1 \tilde r \times \partial_2 \tilde r \) is the outward normal vector on \( \tilde \Sigma \). Again define \( f^1, f^2, f^3 \) as partial antiderivatives w.r.t. the corresponding variables. Then by the divergence theorem on \( \tilde R \): \[ \begin{aligned} \int_{\tilde R} f \, dV^3 &= \frac{1}{3} \oint_{\tilde \Sigma} \left< f^1, f^2, f^3 \right> \bullet \vec n \, dS^2 \\ &= \frac{1}{3} \int_U \left< f^1, f^2, f^3 \right>_{\tilde r} \bullet \left( \partial_1 \tilde r \times \partial_2 \tilde r \right)_{(t_1, t_2)} \, dt_1 \, dt_2 \\ &= \frac{1}{3} \int_U \begin{bmatrix}f^1 & f^2 & f^3 \end{bmatrix}_{\varphi \circ r} \left( \partial_1 \tilde r \times \partial_2 \tilde r \right)^\top _{(t_1, t_2)} \, dt_1 \, dt_2 \\ &= \frac{1}{3} \int_U \begin{bmatrix}f^1 & f^2 & f^3 \end{bmatrix}_{\varphi \circ r} \cdot \left. C \right|_r \cdot \left( \partial_1 r \times \partial_2 r \right)^\top _{(t_1, t_2)} \, dt_1 \, dt _2 \\ &= \pm \frac{1}{3} \int_\Sigma \left(\begin{bmatrix}f^1\circ \varphi & f^2\circ \varphi & f^3 \circ \varphi \end{bmatrix}C\right) \bullet \vec n \, dS^2 \\ &= \int_R (f\circ \varphi) | \det J | dV^3 \end{aligned} \] Just as before: the divergence of the last vector field \( = 3 (f\circ \varphi) \det J \) and the plus/minus sign will match the determinant to give an absolute value.\(~\Box\)

So for a normal surface, the normal vector is the cross-product of the two partial derivatives of your parametrization. That is, given a parametrization \( r: U \to {\mathbb R}^{3} \) for a surface \( \Sigma \) and let \( \partial_i r = \left< \partial_i r_1, \, \partial_i r_2, \, \partial_i r_3\right> \), then we use \( \partial_1 r \times \partial_2 r \) or its negative as the outward normal vector. So for dimensions \( n \ge 3 \), I'll need a generalization of the cross-product which takes \( n-1 \) vectors and produces a vector orthogonal to all of them.

Definition 3. If \( n \ge 3 \), then the the wedge product is a multilinear \( (n-1)- \)ary operation \( \wedge \) on \( {\mathbb R}^{n} \) such that

  • \( \wedge(e_{i+1}, e_{i+2} \ldots, e_{i+n-1}) = e_{i} \)
  • \( \wedge(v_1, v_2, \ldots, v_{n-1}) = \vec 0 \) if \( v_i = v_j \) for some \( i \neq j \); ~ we say \( \wedge \) is alternating

Note: \( e_1, \ldots, e_{n} \) are the standard basis elements for \( {\mathbb R}^{n} \) and their subscripts are written \( \mod n. \) This is a special case of the exterior product/algebra that I mentioned in the end of last post. It captures the essence of the cross product, dot product, and the determinant. In particular, it allows us to form orthogonal vectors: \[ v_i \bullet \wedge(v_1, \ldots, v_{n-1}) = \vec 0 \quad \forall i \] The operation is essentially the \( n \times n \) determinant of a matrix whose first row is filled with the standard basis for \( {\mathbb R}^n \) and the subsequent rows are the vectors \( v_1, \ldots, v_{n-1} \): just like the cross-product when \( n=3. \)

Disclaimer: I don't know the official definition of the flux integral across a high dimensional manifold. However, my intuition tells me it is something like this:

Definition 4. Suppose \( \Sigma \subset {\mathbb R}^n \) is an \( (n-1)- \)manifold with a parametrization: \( r:U \to {\mathbb R}^n \) and let \( \vec F \) be an n-dimensinal vector field, then the net outward flux across \( \Sigma \) is given by \[ \oint_\Sigma \vec F \bullet \vec n \, dS^{n-1} = \int_U \vec F \bullet \wedge\left(\partial_1 r, \ldots, \partial_{n-1} r\right) \, dt_1 \cdots dt_{n-1} \] where the order of the arguments in \( \wedge( \ldots ) \) can be changed to ensure the product is an outward normal vector.

In this general case, the argument will rely on the wedge relation \[ \left(\wedge\left(\partial_1 \tilde r, \ldots, \partial_{n-1}\tilde r \right)\right)^\top = C \left(\wedge\left(\partial_1 r, \ldots, \partial_{n-1} r\right)\right)^\top \] and the divergence relation: \[ \nabla \bullet \left(\begin{bmatrix}f^1\circ \varphi & \cdots & f^n \circ \varphi \end{bmatrix}C\right) = n (f \circ \varphi) \det J \] Since the orientation of the normal vector on \( \Sigma \) is chosen such that the corresponding normal vector on \( \tilde \Sigma \) is outward, then we have the same interplay between a plus/minus sign on the integral and the sign of the determinant to get the absolute value. The wedge and divergence relations are the crux of my argument; surprisingly, the divergence one relies on a similar relation highlighted in the proof by Lax (see [Lax1]). I believe the relations hold and I've sorta sketch proofs but its too tedious and I'm not really worried about their validity.

I'm only worried about what it takes to be a “nice” region. On the bright side, this is a different (hopefully simpler?) problem from the one posed by the naive approach in the last post. The issue in the naive approach was that \( \varphi \) was potentially warping our (rectangular) partition of \( R \) so bad that the approximate parallelotopes in \( R' \) were not good enough to approximate the integral. This second approach is no longer concerned with the local properties of homeomorphism \( \varphi \) but the global properties of the regions themselves. I think this is a good place to end my foray into the change of variables formula. I managed to appreciate the properties of the determinant and exterior algebra that I did not fully understand just a month ago. I also got a basic understanding of what differential forms are and how they trivialize this whole discussion.

References

Lax, Peter D. “Change of Variables in Multiple Integrals.” Amer. Math. Monthly 106.6 (1998): 497-501. Web.

Lax, Peter D. “Change of Variables in Multiple Integrals II.” Amer. Math. Monthly 108.2 (2001): 115-119. Web.

Shwartz, J. “The Formula for Change in Variables in a Multiple Integral.” Amer. Math. Monthly 61.2 (1954): 81-85. Web.