Chapter 2 Convex functions

2.1 Definition

Definition 2.1 \(f: \mathbf R^n \rightarrow \mathbf R\) is convex if \(\mathbf{dom} f\) is a convex set and \[\begin{equation} \tag{2.1} f(\underbrace{\theta x + (1-\theta) y}_\text{convex mixture}) \leq \theta f(x) + (1-\theta) f(y) \end{equation}\] for all \(x, y \in \mathbf{dom} f\), \(0\leq \theta \leq 1\).

i.e. “Any chord is above function.”

  • \(f\) is concave if \(-f\) is convex.

2.2 Examples

convex:

  • affine functions \(ax+b\)
  • exponential \(e^{ax}\)
  • powers: \(x^a\) on \(R_{++}\), for \(a\geq 1\) or \(a\leq 0\)
  • powers of absolute value: \(| x |^p\) on \(R\), for \(p\geq 1\)
  • negative entropy: \(x \log x\) on \(R_{++}\)

concave:

  • affine functions \(ax+b\) (lol!)
  • powers: \(x^a\) on \(R_{++}\), for \(0 \leq a \leq 1\)
  • log: \(\log x\) on \(R_{++}\)

2.3 Examples on \(R^n\) and \(R^{m\times n}\)

Affine functions are both convex and concave; all norms are convex.

2.3.1 On \(R^n\)

  • affine: \(a^Tx+b\)
  • norms: \(\| x \|_p = (\sum_{i=1}^n | x_i | ^p )^{1/p}\) for \(p\geq 1\); \(\| x\|_\infty = max_k |x_k|\)

2.3.2 On \(R^{m\times n}\) (matrices)

Affine: \[\begin{align} \tag{2.2} f(X) & = & \langle A, X \rangle + b \\ & = & \mathbf{tr}(A^TX) + b \end{align}\]

Spectral norm (maximum singular value norm): \[\begin{align} \tag{2.3} f(X)= \| X \|_2 = \sigma_{\mathrm{max}}(X) = (\lambda_{\mathrm{max}}(X^TX))^{1/2} \end{align}\]

2.4 Restriction of a convex function to a line

Theorem 2.1 \(f: \mathbf R^n \rightarrow \mathbf R\) is convex iff the function \(g: \mathbf R \rightarrow \mathbf R\) \[\begin{equation*} g(t) = f(x + tv), \mathbf{dom} g = \{ t | x + tv \in \mathbf{dom} f \} \end{equation*}\] is convex in \(t\) for any \(x \in \mathbf{dom} f\), \(v \in \mathbf R^n\).

This fact (2.1) is very important when speculating :) and proving whether a function is convex or concave.

2.5 Extension of a function outside of its domain

extended-value extension \(\bar f\) of \(f\) is \[\begin{align} \tag{2.4} \bar f = f(x), x\in \mathbf{dom} f\\ \bar f = \infty, x \notin \mathbf{dom} f \end{align}\]

This usually simplifies the notations when doing analysis.

2.6 1st-order condition

Theorem 2.2 differentiable \(f\) with convex domain is convex iff \[\begin{equation*} f(y) \geq f(x) + \nabla f(x)^T (y-x) \text{ for all } x,y \in \mathbf{dom} f \end{equation*}\]

i.e. the 1st-order Taylor expansion (local) is the global under-estimator for \(f\).

2.7 2nd-order conditions

Theorem 2.3 twice differentiable \(f\) with convex domain is convex iff \[\begin{equation*} \nabla^2 f(x) \succeq 0 \text{ for all } x \in \mathbf{dom} f. \end{equation*}\] If (not iff) \(\nabla^2 f(x) \succ 0\), then \(f\) is strictly convex.

2.8 Examples

Example 2.1 quadratic function: \(f(x) = (1/2)x^TPx+ q^T x + r\) (with \(P\in S^n\)) \[\begin{equation*} \nabla f(x) = Px+q, \nabla^2 f(x) = P \end{equation*}\] convex if \(P\succeq 0\).
Example 2.2 least squares objective: \(f(x) = \|Ax-b\|^2_2\) \[\begin{equation*} \nabla f(x) = 2A^T(Ax-b), \nabla^2 f(x) = 2A^TA, \end{equation*}\] is convex.
Example 2.3 quadratic-over-linear: \(f(x,y) = x^2/y\) \[\begin{equation*} \nabla^2 f(x,y) = \frac 2 {y^3}\begin{bmatrix} y \\ -x \end{bmatrix} \begin{bmatrix} y \\ -x \end{bmatrix}^T \succeq 0, \end{equation*}\] is convex if \(y > 0\).

Example 2.4 log-sum-exp (softmax): \(f(x) = \log \sum^n_{k=1} e^{x_k}\), \(z_k = \exp x_k\) \[\begin{equation*} \nabla^2 f(x) = \text{<horrible mess>} = \frac 1 {\mathbf{1}^{T}z} \mathbf{diag}(z) - \frac 1 {(\mathbf{1}^T z)^2}zz^T, \end{equation*}\] is convex.

Proof is omitted, but very important!

2.9 Epigraph and sublevel set

Definition 2.2 \(\alpha\)-sublevel set of \(f: \mathbf R^n \rightarrow \mathbf R\): \[\begin{equation*} C_\alpha = \{ x\in \mathbf{dom} f | f(x) \leq \alpha \} \end{equation*}\]

Sublevel sets of convex functions are convex (but not in reverse).

Definition 2.3 epigraph of \(f: \mathbf R^n \rightarrow \mathbf R\): \[\begin{equation*} \mathbf{epi} f = \{ (x,t) \in R^{n+1} | x\in \mathbf{dom} f, f(x) \leq t \} \end{equation*}\]

\(f\) is convex iff \(\mathbf{epi} f\) is a convex set.

2.10 Jensen’s inequality

An extension to (2.1) is the Jensen’s inequality for expectations: \[\begin{equation} \tag{2.5} f(\mathbf E z) \leq \mathbf E f(z) \end{equation}\] for any random variable \(z\).

For a discrete distribution, (2.1) reduces to equality \(P(B) = 1 - P(A)\).

2.11 Positive weighted sum and composition with affine functions

Note that if \(f\) is convex, \(f(Ax+b)\) is also convex.

Examples:

  • log barrier: \[\begin{equation*} f(x) = - \sum^m \log(b_i - a_i^Tx), \mathbf{dom} f = \{ x| a_i^Tx < b_i\} \end{equation*}\]

2.12 Pointwise maximum

Proposition 2.1 if \(f_1,\dots, f_m\) are convex, then \(f(x) = max\{f_1(x), \dots, f_m(x)\}\) is convex.
Example 2.5 piecewise-linear: \(\max_{i=1,\dots,m}(a_i^Tx+b_i)\) is convex.

Example 2.6 sum of \(r\) largest components of \(x\in R^n\): \[\begin{equation*} f(x) = x_{[1]} + \cdots + x_{[r]} \end{equation*}\] is convex.

Note: \(f(x)\) can be seen as the max of \(C_n^r\) sums.

2.13 Pointwise supremum

Proposition 2.2 if \(f(x,y)\) is convex in \(x\) for each \(y \in \mathcal A\), then \[\begin{equation} \tag{2.6} g(x) = \sup_{y\in \mathcal A} f(x, y) \end{equation}\] is convex.
Example 2.7 support function of a set \(C\): \(S_C(x) = \sup_{y\in C}y^Tx\)
Example 2.8 max eigenvalue of symmetric matrix: for \(X\in S^n\), \[\begin{equation} \tag{2.7} \lambda_{max}(X) = \sup_{\|y\|_2=1}\underbrace{y^TXy}_{\text{linear in }X} \end{equation}\] is convex.

2.14 Composition

\[\begin{equation} \tag{2.8} f(x) = h(g(x)) \end{equation}\]

  • \(f\) is convex if
    • \(g\) convex, \(\tilde h\) convex and nondecreasing
    • \(g\) concave, \(\tilde h\) convex and nonincreasing

Note that \(\tilde h\) is the extended value function so be careful!

2.15 Vector Composition

Similar to 2.14

2.16 Pointwise infinium

Proposition 2.3 if \(f(x,y)\) is convex in \((x,y)\) and \(C\) is a convex set, then \[\begin{equation} \tag{2.9} g(x) = \inf_{y\in \mathcal C} f(x, y) \end{equation}\] is convex.

Example 2.9 Schur complement \(f(x,y) = x^TAx + 2x^TBy + y^TCy\) with \[\begin{equation} \tag{2.10} \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \succeq 0, C \succ 0 \end{equation}\]

minimizing over \(y\) gives \(g(x) = \inf_y f(x,y) = x^T(A-BC^{-1}B^T)x\) (Schur complement).

\(g\) is convex, thus the Schur complement \(A-BC^{-1}B^T \succeq 0\).

Example 2.10 distance to a set \[\begin{equation} \tag{2.11} \mathbf{dist}(x, S) = \inf_{y\in S} \|x-y\| \end{equation}\] is convex if \(S\) is convex.

2.17 Perspective function

Definition 2.4 the perspective of a function \(f: \mathbf R^n \rightarrow \mathbf R\) is the function \(g: \mathbf R^n \times R \rightarrow \mathbf R\), \[\begin{equation} g(x,t) = tf(x/t), \mathbf{dom} g = \{ (x,t) | x/t \in \mathbf{dom} f, t>0 \} \end{equation}\] \(g\) is convex if f is convex.
Example 2.11 \(f(x) = x^Tx\) is convex, then \(g(x, t) = x^Tx/t\) is convex for \(t>0\).
Example 2.12 negative log \(f(x) = -\log x\) is convex, thus the relative entropy \[\begin{equation} \tag{2.12} g(x,t) = t\log t - t \log x \end{equation}\] is convex on \(\mathbf R^2_{++}\). (This is related to KL divergence)
Example 2.13 if \(f\) convex, then \[\begin{equation} \tag{2.13} g(x) = (c^Tx+d) f \left ( (Ax+b)/(c^Tx+d) \right ) \end{equation}\] is convex on \(\{ x | c^Tx+d > 0 , (Ax+b)/(c^Tx+d) \in \mathbf{dom} f \}\)

2.18 The conjugate function

Definition 2.5 the conjugate of a function \(f\) is \[\begin{equation} \tag{2.14} f^*(y) = \sup_{x\in \mathbf{dom} f}(y^Tx-f(x)) \end{equation}\]

Note that \(f^*\) is convex regardless of what \(f\) is.

And an interesting fact is that \[\begin{equation} \mathbf{epi} (f^{env}) = \mathbf{conv}(\mathbf{epi} f) \end{equation}\] (convex envelope)

2.19 Quasiconvexity

Definition 2.6 \(f: R^n \rightarrow R\) is quasiconvex (unimodal) if \(\mathbf{dom} f\) is convex and the sublevel sets \[\begin{equation} \tag{2.15} S_\alpha = \{ x\in \mathbf{dom} f | f(x) \leq \alpha \} \end{equation}\] are convex for all \(\alpha\).
  • \(f\) is quasiconcave if \(-f\) is quasiconvex
  • \(f\) is quasilinear if it is both quasiconvex and quasiconcave

2.19.1 Examples

  • \(\sqrt{|x|}\) is quasiconvex on \(R\)
  • \(\mathrm{ceil}(x) = \inf \{ z\in Z | z \geq x \}\) is quasiconvex
  • \(\log x\) is quasilinear on \(R_{++}\)
  • \(f(x_1, x_2) = x_1 x_2\) is quasiconcave on \(R^2_{++}\)
  • linear fractional functions are quasilinear (!)
  • distance ratio \[\begin{equation} \tag{2.16} f(x) = \frac{\| x-a \|_2}{\| x-b \|_2}, \mathrm{dom}f=\{ x| \|x-a\|_2 \leq \| x-b \|_2 \} \end{equation}\] is quasiconvex

Note that the sum of two quasiconvex functions are generally not quasiconvex.

2.19.1.1 Internal rate of return (IRR)

  • cash flow \(x=(x_0,\dots, x_n)\), \(x_i\) is payment in period \(i\)
  • assume \(x_0 < 0\) and \(\sum x_n > 0\)
  • present value of cash flow \(x\) for interest rate \(r\): \[\begin{equation} \tag{2.17} \mathrm{PV}(x,r) = \sum_{i=0}^{n} (1+r)^{-i}x_i \end{equation}\]
  • IRR is smallest interest rate for which \(PV(x,r) = 0\): \[\begin{equation} \tag{2.18} \mathrm{IRR}(x) = \inf \{r \geq 0 | PV(x,r) = 0\} \end{equation}\]
  • IRR is quasiconcave: superlevel set is intersection of halfspaces \[\begin{equation} \tag{2.19} \mathrm{IRR}(x)\geq R \Longleftrightarrow \mathrm{PV}(x,r) \geq 0\text{ for } 0\leq r \leq R \end{equation}\]

2.20 Properties of quasiconvex functions

Theorem 2.4 Jensen’s inequality for quasiconvex functions if \(f\) quasiconvex, then \[\begin{equation} \tag{2.20} 0\leq \theta \leq 1 \Longrightarrow f(\theta x + (1- \theta)y) \leq \max \{f(x), f(y) \} \end{equation}\]

This fact is shown in (2.1).

Jensen's inequality for quasiconvex functions

Figure 2.1: Jensen’s inequality for quasiconvex functions

Theorem 2.5 first order condition: differentiable \(f\) with convex domain is quasiconvex iff \[\begin{equation} \tag{2.21} f(y)\leq f(x) \Longrightarrow \nabla f(x)^T (y-x) \leq 0 \end{equation}\]

2.21 Log-concavity

  • \(\log f\) is concave
  • many probability densities are log-concave, e.g. the Gaussian \[\begin{equation} \tag{2.22} \frac 1 {\sqrt{(2\pi)^n \det \Sigma}} \exp (-\frac 1 2 (x-\bar x)^T\Sigma^{-1}(x-\bar x)) \end{equation}\]
  • cumulative Gaussian \(\Phi\) is log-concave \[\begin{equation} \tag{2.23} \Phi (x) = \frac 1 {\sqrt{2\pi}} \int_{-\infty}^x \exp(-u^2/2) du \end{equation}\]

2.21.1 Properties

Proposition 2.4 twice differentiable \(f\) with convex domain is log-concave iff \[\begin{equation} \tag{2.24} \nabla^2f(x) \preceq \frac {\nabla f(x) \nabla f(x)^T}{f(x)} \end{equation}\]

(2.4) basically says that you can have 1 positive eigenvalue for a log-concave function.

  • product of log-concave is still log-concave
  • sum, not really
  • integration: yes, see following (2.5)
Proposition 2.5 if \(f: R^n \times R^m \rightarrow R\) is log-concave, then \[\begin{equation*} g(x) = \int f(x,y) dy \end{equation*}\] is log-concave.

b/c (2.5), we have the follow interesting results:

Proposition 2.6 convolution preserves log-concavity: \[\begin{equation} \tag{2.25} (f*g)(x) = \int f(x-y) g(y) dy \end{equation}\] is log-concave.
Proposition 2.7 if \(C\subseteq R^n\) convex and \(y\) is a random variable with log-concave pdf then \[\begin{equation} \tag{2.26} f(x) = \mathbf{prob} (x+y\in C) \end{equation}\] is log-concave.
Proof. \[\begin{equation*} f(x) = \int g(x+y)p(y)dy, g(u) = \begin{cases}1 & u\in C \\ 0 & u\notin C \end{cases} \end{equation*}\] \(p\) is pdf of \(y\).

Note that 2.7 directly relates to the yield optimization problem in operational research.

2.22 Generic inequalities

Definition 2.7 \(f: \mathbf R^n \rightarrow \mathbf R^m\) is \(K\)-convex if \(\mathbf{dom} f\) is a convex set and \[\begin{equation} \tag{2.27} f(\theta x + (1-\theta) y) \preceq_K \theta f(x) + (1-\theta) f(y) \end{equation}\] for all \(x, y \in \mathbf{dom} f\), \(0\leq \theta \leq 1\).

One particularly interesting case is for matrices: \(f: \mathbf S^n \rightarrow \mathbf S^m\).

Example 2.14 \(f(X) = X^2\) is \(S_+^m\)-convex.