Semidefinite programming

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

Motivation and definition

Initial motivation A linear programming problem is one in which we wish to maximize or minimize a linear objective function of real variables over a polytope. In semidefinite programming, we instead use real-valued vectors and are allowed to take the dot product of vectors; nonnegativity constraints on real variables in LP (linear programming) are replaced by semidefiniteness constraints on matrix variables in SDP (semidefinite programming). Specifically, a general semidefinite programming problem can be defined as any mathematical programming problem of the form : \begin{array}{rl} {\displaystyle \min_{x^1, \ldots, x^n \in \mathbb{R}^n}} & {\displaystyle \sum_{i,j \in [n]} c_{i,j} (x^i \cdot x^j)} \\ \text{subject to} & {\displaystyle \sum_{i,j \in [n]} a_{i,j,k} (x^i \cdot x^j) \leq b_k} \text{ for all }k \\ \end{array} where the c_{i,j}, a_{i,j,k}, and the b_k are real numbers and x^i \cdot x^j is the dot product of x^i and x^j. Equivalent formulations An n \times n matrix M is said to be positive semidefinite if it is the Gram matrix of some vectors (i.e. if there exist vectors x^1, \ldots, x^n such that m_{i,j}=x^i \cdot x^j for all i,j). If this is the case, we denote this as M \succeq 0. Note that there are several other equivalent definitions of being positive semidefinite, for example, positive semidefinite matrices are self-adjoint matrices that have only non-negative eigenvalues. Denote by \mathbb{S}^n the space of all n \times n real symmetric matrices. The space is equipped with the inner product (where {\rm trace} denotes the trace): \langle A,B\rangle := {\rm trace}(A^T B) = \sum_{i=1,j=1}^n A_{ij}B_{ij}. We can rewrite the mathematical program given in the previous section equivalently as : \begin{array}{rl} {\displaystyle\min_{X \in \mathbb{S}^n}} & \langle C, X \rangle \\ \text{subject to} & \langle A_k, X \rangle \leq b_k, \quad k = 1,\ldots,m \\ & X \succeq 0. \end{array} where entry i,j in C is given by \frac{c_{i,j} + c_{j,i}}{2} from the previous section and A_k is a symmetric n \times n matrix having i,jth entry \frac{a_{i,j,k}+a_{j,i,k}}{2} from the previous section. Thus, the matrices C and A_k are symmetric and the above inner products are well-defined. Note that if we add slack variables appropriately, this SDP can be converted to an equational form: : \begin{array}{rl} {\displaystyle\min_{X \in \mathbb{S}^n}} & \langle C, X \rangle \\ \text{subject to} & \langle A_k, X \rangle = b_k, \quad k = 1,\ldots,m \\ & X \succeq 0. \end{array} For convenience, an SDP may be specified in a slightly different, but equivalent form. For example, linear expressions involving nonnegative scalar variables may be added to the program specification. This remains an SDP because each variable can be incorporated into the matrix X as a diagonal entry (X_{ii} for some i). To ensure that X_{ii} \geq 0, constraints X_{ij} = 0 can be added for all j \neq i. As another example, note that for any positive semidefinite matrix X, there exists a set of vectors \{ v_i \} such that the i, j entry of X is X_{ij} = (v_i, v_j) the scalar product of v_i and v_j. Therefore, SDPs are often formulated in terms of linear expressions on scalar products of vectors. Given the solution to the SDP in the standard form, the vectors \{ v_i \} can be recovered in O(n^3) time (e.g., by using an incomplete Cholesky decomposition of X). == Relations to other optimization problems ==

Relations to other optimization problems

The space of semidefinite matrices is a convex cone. Therefore, SDP is a special case of conic optimization, which is a special case of convex optimization. When the matrix C is diagonal, the inner products is equivalent to a vector product of the diagonal of C and the diagonal of X. Analogously, when the matrices Ak are diagonal, the corresponding inner products are equivalent to vector products. In these vector products, only the diagonal elements of X are used, so we can add constraints equating the non-diagonal elements of X to 0. The condition X \succeq 0 is then equivalent to the condition that all diagonal elements of X are non-negative. Then, the resulting SDP becomes a linear program in which the variables are the diagonal elements of X. == Duality theory ==

Duality theory

Definitions Analogously to linear programming, given a general SDP of the form : \begin{array}{rl} {\displaystyle\min_{X \in \mathbb{S}^n}} & \langle C, X \rangle \\ \text{subject to} & \langle A_i, X \rangle = b_i, \quad i = 1,\ldots,m \\ & X \succeq 0 \end{array} (the primal problem or P-SDP), we define the dual semidefinite program (D-SDP) as : \begin{array}{rl} {\displaystyle\max_{y \in \mathbb{R}^m}} & b^T y \\ \text{subject to} & {\displaystyle\sum_{i=1}^m} y_i A_i \preceq C \end{array} where for any two matrices P and Q, P \succeq Q means P-Q \succeq 0. Weak duality The weak duality theorem states that the value of the primal SDP is at least the value of the dual SDP. Therefore, any feasible solution to the dual SDP lower-bounds the primal SDP value, and conversely, any feasible solution to the primal SDP upper-bounds the dual SDP value. This is because : \langle C, X \rangle - b^T y = \langle C, X \rangle - \sum_{i=1}^m y_i b_i = \langle C, X \rangle - \sum_{i=1}^m y_i \langle A_i, X \rangle = \langle C - \sum_{i=1}^m y_i A_i, X \rangle \geq 0, where the last inequality is because both matrices are positive semidefinite, and the result of this function is sometimes referred to as duality gap. Strong duality When the value of the primal and dual SDPs are equal, the SDP is said to satisfy the strong duality property. Unlike linear programs, where every dual linear program has optimal objective equal to the primal objective, not every SDP satisfies strong duality; in general, the value of the dual SDP may lie strictly below the value of the primal, and the P-SDP and D-SDP satisfy the following properties: (i) Suppose the primal problem (P-SDP) is bounded below and strictly feasible (i.e., there exists X_0\in\mathbb{S}^n, X_0\succ 0 such that \langle A_i,X_0\rangle = b_i, i=1,\ldots,m). Then there is an optimal solution y^* to (D-SDP) and :\langle C,X^*\rangle = b^T y^*. (ii) Suppose the dual problem (D-SDP) is bounded above and strictly feasible (i.e., \sum_{i=1}^m (y_0)_i A_i \prec C for some y_0\in\R^m). Then there is an optimal solution X^* to (P-SDP) and the equality from (i) holds. A sufficient condition for strong duality to hold for a SDP problem (and in general, for any convex optimization problem) is the Slater's condition. It is also possible to attain strong duality for SDPs without additional regularity conditions by using an extended dual problem proposed by Ramana. == Examples ==

Examples

Example 1 Consider three random variables A, B, and C. A given set of correlation coefficients \rho_{AB}, \ \rho_{AC}, \rho_{BC} are possible if and only if :\begin{pmatrix} 1 & \rho_{AB} & \rho_{AC} \\ \rho_{AB} & 1 & \rho_{BC} \\ \rho_{AC} & \rho_{BC} & 1 \end{pmatrix} \succeq 0. This matrix is called the correlation matrix. Suppose that we know from some prior knowledge (empirical results of an experiment, for example) that -0.2 \leq \rho_{AB} \leq -0.1 and 0.4 \leq \rho_{BC} \leq 0.5. The problem of determining the smallest and largest values that \rho_{AC} \ can take is given by: :\begin{array}{rl} {\displaystyle\min/\max} & x_{13} \\ \text{subject to} & -0.2 \leq x_{12} \leq -0.1\\ & 0.4 \leq x_{23} \leq 0.5\\ & \begin{pmatrix} 1 & x_{12} & x_{13} \\ x_{12} & 1 & x_{23} \\ x_{13} & x_{23} & 1 \end{pmatrix} \succeq 0 \end{array} We set \rho_{AB} = x_{12}, \ \rho_{AC} = x_{13}, \ \rho_{BC} = x_{23} to obtain the answer. This can be formulated by an SDP. We handle the inequality constraints by augmenting the variable matrix and introducing slack variables, for example \mathrm{tr}\left(\left(\begin{array}{cccccc} 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\end{array}\right)\cdot\left(\begin{array}{cccccc} 1 & x_{12} & x_{13} & 0 & 0 & 0\\ x_{12} & 1 & x_{23} & 0 & 0 & 0\\ x_{13} & x_{23} & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & s_{1} & 0 & 0\\ 0 & 0 & 0 & 0 & s_{2} & 0\\ 0 & 0 & 0 & 0 & 0 & s_{3}\end{array}\right)\right)=x_{12} + s_{1}=-0.1 Solving this SDP gives the minimum and maximum values of \rho_{AC} = x_{13} \ as -0.978 and 0.872 respectively. Example 2 Consider the problem : minimize \frac{(c^T x)^2}{d^Tx} : subject to Ax +b\geq 0 where we assume that d^Tx>0 whenever Ax+b\geq 0. Introducing an auxiliary variable t the problem can be reformulated: : minimize t : subject to Ax+b\geq 0, \, \frac{(c^T x)^2}{d^Tx}\leq t In this formulation, the objective is a linear function of the variables x,t. The first restriction can be written as : \textbf{diag}(Ax+b)\geq 0 where the matrix \textbf{diag}(Ax+b) is the square matrix with values in the diagonal equal to the elements of the vector Ax+b. The second restriction can be written as : td^Tx-(c^Tx)^2\geq 0 Defining D as follows : D=\left[\begin{array}{cc}t&c^Tx\\c^Tx&d^Tx\end{array}\right] We can use the theory of Schur Complements to see that :D \succeq 0 (Boyd and Vandenberghe, 1996) The semidefinite program associated with this problem is : minimize t : subject to \left[\begin{array}{ccc}\textbf{diag}(Ax+b)&0&0\\0&t&c^Tx\\0&c^Tx&d^Tx\end{array}\right] \succeq 0 Example 3 (Goemans–Williamson max cut approximation algorithm) Semidefinite programs are important tools for developing approximation algorithms for NP-hard maximization problems. The first approximation algorithm based on an SDP is due to Michel Goemans and David P. Williamson (JACM, 1995). Other applications Semidefinite programming has been applied to find approximate solutions to combinatorial optimization problems, such as the solution of the max cut problem with an approximation ratio of 0.87856. SDPs are also used in geometry to determine tensegrity graphs, and arise in control theory as LMIs, and in inverse elliptic coefficient problems as convex, non-linear, semidefiniteness constraints. It is also widely used in physics to constrain conformal field theories with the conformal bootstrap. == Run-time complexity ==

Run-time complexity

The semidefinite feasibility problem (SDF) is the following decision problem: given an SDP, decide whether it has at least one feasible solution. The exact run-time complexity of this problem is unknown (as of 1997). However, Ramana proved the following: • In the Turing machine model, SDF is in NP iff it is in co-NP. Therefore, SDF is not NP-complete unless NP=coNP. • In the Blum–Shub–Smale machine model, SDF is in the intersection of NP and co-NP. == Algorithms for solving SDPs ==

Algorithms for solving SDPs

There are several types of algorithms for solving SDPs. These algorithms output the value of the SDP up to an additive error \epsilon in time that is polynomial in the program description size and \log (1/\epsilon). Ellipsoid method The ellipsoid method is a general method for convex programming, and can be used in particular to solve SDPs. In the context of SDPs, the ellipsoid method provides the following guarantee. are based on this approach. First-order methods First-order methods for conic optimization avoid computing, storing and factorizing a large Hessian matrix and scale to much larger problems than interior point methods, at some cost in accuracy. A first-order method is implemented in the Splitting Cone Solver (SCS). Another first-order method is the alternating direction method of multipliers (ADMM). This method requires in every step projection on the cone of semidefinite matrices. Bundle method The code ConicBundle formulates the SDP problem as a nonsmooth optimization problem and solves it by the Spectral Bundle method of nonsmooth optimization. This approach is very efficient for a special class of linear SDP problems. Other solving methods Algorithms based on Augmented Lagrangian method (PENSDP) are similar in behavior to the interior point methods and can be specialized to some very large scale problems. Other algorithms use low-rank information and reformulation of the SDP as a nonlinear programming problem (SDPLR, ManiSDP). Approximate methods Algorithms that solve SDPs approximately have been proposed as well. The main goal of such methods is to achieve lower complexity in applications where approximate solutions are sufficient and complexity must be minimal. A prominent method that has been used for data detection in multiple-input multiple-output (MIMO) wireless systems is Triangular Approximate SEmidefinite Relaxation (TASER), which operates on the Cholesky decomposition factors of the semidefinite matrix instead of the semidefinite matrix. This method calculates approximate solutions for a max-cut-like problem that are often comparable to solutions from exact solvers but in only 10-20 algorithm iterations. Hazan has developed an approximate algorithm for solving SDPs with the additional constraint that the trace of the variables matrix must be 1. == Preprocessing algorithms ==

Preprocessing algorithms

Facial reduction algorithms are algorithms used to preprocess SDPs problems by inspecting the constraints of the problem. These can be used to • Detect lack of strict feasibility; • Delete redundant rows and columns; • Reduce the size of the variable matrix. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com