Calculus &
Linear Algebra II

Chapter 22

22 Critical points in $n$-dimensions

This chapter brings together a great deal of what we have studied so far in this course.

The goal is to be able to classify critical points of functions of any number of variables.

Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}



22 Critical points in $n$-dimensions

Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}


where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$



22 Critical points in $n$-dimensions

Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}

where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$

$$ \text{and }\,H(\mathbf{x}_0)=\left( \begin{array}{cccc} \frac{\partial^2 f}{\partial x_1 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_1 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_1 \partial x_n}(\mathbf{x}_0)\\ \frac{\partial^2 f}{\partial x_2 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_2 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_2 \partial x_n}(\mathbf{x}_0)\\ \vdots&\vdots&\vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_n \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_n \partial x_n}(\mathbf{x}_0) \end{array} \right).$$



22 Critical points in $n$-dimensions

Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}

where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$ and $\;H(\mathbf{x}_0)=\left( \begin{array}{cccc} \frac{\partial^2 f}{\partial x_1 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_1 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_1 \partial x_n}(\mathbf{x}_0)\\ \frac{\partial^2 f}{\partial x_2 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_2 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_2 \partial x_n}(\mathbf{x}_0)\\ \vdots&\vdots&\vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_n \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_n \partial x_n}(\mathbf{x}_0) \end{array} \right).$

Note that $H(\mathbf{x}_0)=H(\mathbf{x}_0)^T$, i.e. $H(\mathbf{x}_0)$ is a real symmetric matrix.




22.1 Classification of critical points in $n$ dimensions

In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.

Definition 1. A point $\mathbf{x}_0$ is said to be a critical point if $\nabla f(\mathbf{x}_0)=\mathbf{0}$ or $\nabla f(\mathbf{x}_0)$ is undefined.

Definition 2. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a local maximum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \ge f(\mathbf{x})$ for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$




22.1 Classification of critical points in $n$ dimensions

In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.

Definition 3. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a local minimum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \le f(\mathbf{x})$ for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$

Definition 4. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a saddle point if it is neither a local maximum nor a local minimum, i.e. there exist $\mathbf{x}_1, \mathbf{x}_2$ around $\mathbf{x}_0$ such that $||\mathbf{x}_1-\mathbf{x}_0||<\epsilon,~~||\mathbf{x}_2-\mathbf{x}_0||<\epsilon$ for some $\epsilon > 0$, such that $$f(\mathbf{x}_1)\lt f(\mathbf{x}_0)\lt f(\mathbf{x}_2).$$



22.1 Classification of critical points in $n$ dimensions

In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.

Definition 1. A point $\mathbf{x}_0$ is said to be a critical point if $\nabla f(\mathbf{x}_0)=\mathbf{0}$ or $\nabla f(\mathbf{x}_0)$ is undefined.

Definition 2. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a local maximum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \ge f(\mathbf{x})$ for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$

Definition 3. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a local minimum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \le f(\mathbf{x})$ for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$

Definition 4. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a saddle point if it is neither a local maximum nor a local minimum, i.e. there exist $\mathbf{x}_1, \mathbf{x}_2$ around $\mathbf{x}_0$ such that $||\mathbf{x}_1-\mathbf{x}_0||<\epsilon,~~||\mathbf{x}_2-\mathbf{x}_0||<\epsilon$ for some $\epsilon > 0$, such that $$f(\mathbf{x}_1)\lt f(\mathbf{x}_0)\lt f(\mathbf{x}_2).$$


22.2 Example: In 2d.

1. $f(x_1,x_2) = x_1^2+x_2^2.$ Then $ \nabla f = \left( \begin{array}{c} 2x_1\\ 2x_2 \end{array} \right)=\mathbf 0. $ Thus, the critical point at $(0,0)$ is a minimum.

22.2 Example: In 2d.

2. $f(x_1,x_2) = -x_1^2-x_2^2.$ Then $ \nabla f = \left( \begin{array}{c} -2x_1\\ -2x_2 \end{array} \right)=\mathbf 0. $ Critical point at $(0,0)$ is a maximum.

22.2 Example: In 2d.

3. $f(x_1,x_2) = -x_1^2+x_2^2.$ Then $ \nabla f = \left( \begin{array}{r} -2x_1\\ 2x_2 \end{array} \right)=\mathbf 0. $ Critical point at $(0,0)$ is a saddle point.

22.2 Example: In 2d.


1. $f(x_1,x_2) = x_1^2+x_2^2.$ The critical point at $(0,0)$ is a minimum.


2. $f(x_1,x_2) = -x_1^2-x_2^2.$ Critical point at $(0,0)$ is a maximum.


3. $f(x_1,x_2) = -x_1^2+x_2^2.$ Critical point at $(0,0)$ is a saddle point.




22.3 Critical points by Taylor series

In MATH1052: We used the "Second derivative test" for functions of two variables.

In MATH2001/7000: We consider a variant of this test that generalises easily to higher dimensions.

Let $\mathbf{x}_0$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$



22.3 Critical points by Taylor series

Let $\mathbf{x_0}$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$

Without loss of generality, we take $\mathbf{x}_0=\mathbf{0}$ (i.e., by shifting/translating variables if necessary). We have, $$f(\mathbf{x})=f(\mathbf{0})+\frac{1}{2}\mathbf{x}^TH\mathbf{x}+\langle \text{ higher order terms }\rangle.$$ Here $H=H(\mathbf{0}).$ Thus, the behaviour about $\mathbf{0}$ (i.e., the critical point) depends on this second order term.


22.3 Critical points by Taylor series

Let $\mathbf{x_0}$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$

Without loss of generality, we take $\mathbf{x}_0=\mathbf{0}$ (i.e. by shifting/translating variables if necessary). We have, $$f(\mathbf{x})=f(\mathbf{0})+\frac{1}{2}\mathbf{x}^TH\mathbf{x}+\langle \text{ higher order terms }\rangle.$$ Here $H=H(\mathbf{0}).$ Thus, the behaviour about $\mathbf{0}$ (i.e., the critical point) depends on this second order term.

Observe that $H$ is real symmetric $\implies~H$ is orthogonally diagonalisable, i.e., there exists an orthogonal matrix $P$ such that $P^THP=D$ with some diagonal matrix $D$.




22.3 Critical points by Taylor series

Observe that $H$ is real symmetric $\implies~H$ is orthogonally diagonalisable, i.e., there exists an orthogonal matrix $P$ such that $P^THP=D$ with some diagonal matrix $D$.

Let $\left\{ \mathbf e_1, \mathbf e_2, \ldots, \mathbf e_n\right\}$ be the orthonormal set of eigenvectors of $H$. Form the orthogonal matrix \[ P = \big ( \mathbf e_1~|~ \mathbf e_2~|~ \ldots~|~ \mathbf e_n \big). \] Then $P^THP = D,$ with $ D = \left( \begin{array}{cccc} \lambda_1 & 0 &\cdots & 0 \\ 0 & \lambda_2 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \lambda_n \\ \end{array} \right), \; \lambda_i\in \R. $

That is $H = PDP^T.$



22.3 Critical points by Taylor series

It follows that $$ \mathbf{x}^TH\mathbf{x} =(\mathbf{x}^TP)D(P^T\mathbf{x})=\mathbf{y}^T D \mathbf{y} $$ i.e., the diagonalization suggests setting $\mathbf{y}=P^T\mathbf{x}$. Note that the critical point is still at $\mathbf{y}=\mathbf{0},$ because $P^T\mathbf{0}=\mathbf{0}.$

Let $F$ denote the function $f$ expressed in the new coordinates

$ \mathbf{y}= \left( \begin{array}{c} y_1\\y_2\\ \vdots \\ y_n \end{array} \right),\;$ i.e., $\;F(\mathbf{y})=f(\mathbf{x}(\mathbf{y})).$


22.3 Critical points by Taylor series

Let $F$ denote the function $f$ expressed in the new coordinates

$ \mathbf{y}= \left( \begin{array}{c} y_1\\y_2\\ \vdots \\ y_n \end{array} \right),\;$ i.e., $\;F(\mathbf{y})=f(\mathbf{x}(\mathbf{y})).$

$\Ra ~F(\mathbf{y})$ $=\;f(\mathbf{0})+\frac{1}{2} \mathbf{y}^TD\mathbf{y}+\langle \text{ higher order terms }\rangle$
$=\;f(\mathbf{0})+\frac{1}{2}\left(\lambda_1y_1^2 +\lambda_2y_2^2+\dots +\lambda_ny_n^2\right)$
$\qquad \quad+\;\langle \text{ higher order terms }\rangle$



There are four cases to consider:

👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$

Case 1: If $\lambda_i\gt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly positive in every direction from the critical point. Thus we have a local minimum.

Case 2: If $\lambda_i\lt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly negative in every direction from the critical point. We have a local maximum.



There are four cases to consider:

👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$

Case 3: If any pair of $\lambda_i, \lambda_j$ have opposite sign for $i\neq j$, then the quadratic form is positive in some direction and negative in others. We have a saddle.

Case 4: If all non-zero $\lambda_i$ have same sign but there are some $\lambda_i=0,$ then we can not indentify the type of critical point. The test is inconclusive.



There are four cases to consider:

👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$

Case 1: If $\lambda_i\gt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly positive in every direction from the critical point. We have a local minimum.

Case 2: If $\lambda_i\lt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly negative in every direction from the critical point. We have a local maximum.

Case 3: If any pair of $\lambda_i, \lambda_j$ have opposite sign for $i\neq j$, then the quadratic form is positive in some direction and negative in others. We have a saddle.

Case 4: If all non-zero $\lambda_i$ have same sign but there are some $\lambda_i=0,$ then we can not indentify the type of critical point. The test is inconclusive.



22.4 Example: $Q=ax^2+bxy+cy^2$

From MATH1052: Complete the square in $x$: \[ Q = a \left[ \left(x+\frac{b}{2a}y\right)^2+ \frac{4ac -b^2}{4a^2}y^2 \right] \] \[ = a \left[ \left(x+\frac{b}{2a}y\right)^2+ \frac{D}{4a^2}y^2 \right]\quad \] \[ = a \left[ u^2+ \frac{D}{4a^2}v^2 \right]\qquad\qquad\;\; \]



22.4 Example: $Q=ax^2+bxy+cy^2$

Thus we have \[ Q= a \left[ u^2+ \frac{D}{4a^2}v^2 \right],\;\; D = 4ac-b^2, u = x+ \frac{b}{2a}y, v = y. \]

Case 1. $a\gt 0, D\gt 0:$ minimum

Case 2. $a\lt 0, D\gt 0:$ maximum

Case 3. $D\lt 0:$ saddle

Case 4. $D = 0:$ inconclusive



22.4 Example: $Q=ax^2+bxy+cy^2$

In MATH2001: We analyse the same expression but using eigenvalues. Consider $Q$ written as follows:

\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right) \left( \begin{array}{c} x \\ y \end{array} \right) \]

\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \underbrace{\left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right)}_{{\Large H}} \left( \begin{array}{c} x \\ y \end{array} \right) \]

Thus $0 = \text{det}(H-\lambda I)$ $ =\text{det} \left( \begin{array}{cc} 2a - \lambda & b\\ b & 2c - \lambda \end{array} \right). $

$\Ra$ $\lambda_{\pm} = (a+c)\pm \sqrt{(a+c)^2-D},\,$ where $ \,D = \text{det}(H). $



22.4 Example: $Q=ax^2+bxy+cy^2$

\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \underbrace{\left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right)}_{{\Large H}} \left( \begin{array}{c} x \\ y \end{array} \right) \]

$\Ra$ $\lambda_{\pm} = (a+c)\pm \sqrt{(a+c)^2-D},\,$ where $ \,D = \text{det}(H). $


$ Q = \frac{1}{2}\left(\lambda_{+}\zeta_{+}^2 + \lambda_{-}\zeta_{-}^2\right), $ $\;\; \left( \begin{array}{c} \zeta_{+} \\ \zeta_{-} \end{array} \right) = P^T \left( \begin{array}{c} x \\ y \end{array} \right). $



22.4 Example: $Q=ax^2+bxy+cy^2$

$ Q = \frac{1}{2}\left(\lambda_{+}\zeta_{+}^2 + \lambda_{-}\zeta_{-}^2\right), $ $\;\; \left( \begin{array}{c} \zeta_{+} \\ \zeta_{-} \end{array} \right) = P^T \left( \begin{array}{c} x \\ y \end{array} \right). $

Case 1. $D\lt 0$ $\Ra$ $\lambda_{+}, \lambda_{-}$ have opposite sign $\Ra$ Saddle.

Case 2. $D= 0$ $\Ra$ $ \lambda_{+}=0$ or $\lambda_{-} = 0$ $\Ra$ Inconclusive.

Case 3. $D\gt 0$ $\Ra$ $4ac-b^2\gt 0$ $\Ra$ $ac \gt 0$.

  • If $a\gt 0$, then $c\gt 0 $ and $\lambda_{\pm} \gt 0$ and we have a local minimum.
  • If $a\lt 0$, then $c\lt 0 $ and $\lambda_{\pm} \lt 0$ and we have a local maximum.


Credits