-
Providing a function for solving KKT equations.
- The most expensive step of
each iteration of conelp() or coneqp() is the solution of a set of linear
equations (‘KKT equations’) of the form
 | (7.1) |
(with P = 0 in conelp()). The matrix W depends on the current iterates and
is defined as follows. We use the notation of sections 7.1 and 7.2.
Let
Then W is a block-diagonal matrix,
with the following diagonal blocks.
- The first block is a positive diagonal scaling with a vector d:
This transformation is symmetric:
- The next M blocks are positive multiples of hyperbolic Householder
transformations:
where
These transformations are also symmetric:
- The last N blocks are congruence transformations with nonsingular
matrices:
In general, this operation is not symmetric:
It is often possible to exploit problem structure to solve (7.1) faster than by
standard methods. The last argument kktsolver of conelp() and coneqp()
allows the user to supply a Python function for solving the KKT equations.
This function will be called as ”f = kktsolver(W)”, where W is a dictionary
that contains the parameters of the scaling:
- W[’d’] is the positive vector that defines the diagonal scaling.
W[’di’] is its componentwise inverse.
- W[’beta’] and W[’v’] are lists of length M with the coefficients and
vectors that define the hyperbolic Householder transformations.
- W[’r’] is a list of length N with the matrices that define the the
congruence transformations. W[’rti’] is a list of length N with the
transposes of the inverses of the matrices in W[’r’].
The function call ”f = kktsolver(W)” should return a routine for solving the
KKT system (7.1) defined by W. It will be called as ”f(bx, by, bz)”. On
entry, bx, by, bz contain the righthand side. On exit, they should contain the
solution of the KKT system, with the last component scaled, i.e., on
exit,
In other words, the function returns the solution of
-
Specifying constraints via Python functions.
- In the default use of conelp()
and coneqp(), the linear constraints and the quadratic term in the objective
are parameterized by CVXOPT matrices G, A, P. It is possible to specify these
parameters via Python functions that evaluate the corresponding matrix-vector
products and their adjoints.
If the argument G of conelp() or coneqp() is a Python function, it should be
defined as follows:
G(x, y [, alpha[, beta[, trans]]])
-
-
This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0,
beta = 0.0, trans = ’N’.
Similarly, if the argument A is a Python function, then it must be defined as
follows.
A(x, y [, alpha[, beta[, trans]]])
-
-
This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0,
beta = 0.0, trans = ’N’.
If the argument P of coneqp() is a Python function, then it must be defined as
follows:
P(x, y [, alpha[, beta]])
-
-
This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0,
beta = 0.0.
If G, A or P are Python functions, then the argument kktsolver must also be
provided.
-
Example: 1-norm approximation
-
The optimization problem
can be formulated as a linear program
By exploiting the structure in the inequalities, the cost of an iteration
of an interior-point method can be reduced to the cost of least-squares
problem of the same dimensions. (See section 11.8.2 in the book Convex
Optimization.) The code belows takes advantage of this fact.
from cvxopt import blas, lapack, solvers, matrix, spmatrix, mul, div
def l1(P, q):
"""
Returns the solution u, w of the l1 approximation problem
(primal) minimize ||P*u - q||_1
(dual) maximize q’*w
subject to P’*w = 0
||w||_infty <= 1.
"""
m, n = P.size
# Solve the equivalent LP
#
# minimize [0; 1]’ * [u; v]
# subject to [P, -I; -P, -I] * [u; v] <= [q; -q]
#
# maximize -[q; -q]’ * z
# subject to [P’, -P’]*z = 0
# [-I, -I]*z + 1 = 0
# z >= 0.
c = matrix(n*[0.0] + m*[1.0])
def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):
if trans==’N’:
# y := alpha * [P, -I; -P, -I] * x + beta*y
u = P*x[:n]
y[:m] = alpha * ( u - x[n:]) + beta * y[:m]
y[m:] = alpha * (-u - x[n:]) + beta * y[m:]
else:
# y := alpha * [P’, -P’; -I, -I] * x + beta*y
y[:n] = alpha * P.T * (x[:m] - x[m:]) + beta * y[:n]
y[n:] = -alpha * (x[:m] + x[m:]) + beta * y[n:]
h = matrix([q, -q])
dims = {’l’: 2*m, ’q’: [], ’s’: []}
def F(W):
"""
Returns a function f(x, y, z) that solves
[ 0 0 P’ -P’ ] [ x[:n] ] [ bx[:n] ]
[ 0 0 -I -I ] [ x[n:] ] [ bx[n:] ]
[ P -I -D1^{-1} 0 ] [ z[:m] ] = [ bz[:m] ]
[-P -I 0 -D2^{-1} ] [ z[m:] ] [ bz[m:] ]
where D1 = diag(di[:m])^2, D2 = diag(di[m:])^2 and di = W[’di’].
"""
# Factor A = 4*P’*D*P where D = d1.*d2 ./(d1+d2) and d1 = di[:m].^2, d2 = di[m:].^2.
di = W[’di’]
d1, d2 = di[:m]**2, di[m:]**2
D = div( mul(d1,d2), d1+d2 )
A = P.T * spmatrix(4*D, range(m), range(m)) * P
lapack.potrf(A)
def f(x, y, z):
"""
On entry bx, bz are stored in x, z. On exit x, z contain the solution,
with z scaled: z./di is returned instead of z.
""""
# Solve for x[:n]:
#
# A*x[:n] = bx[:n] + P’ * ( ((D1-D2)*(D1+D2)^{-1})*bx[n:]
# + (2*D1*D2*(D1+D2)^{-1}) * (bz[:m] - bz[m:]) ).
x[:n] += P.T * ( mul(div(d1-d2, d1+d2), x[n:]) + mul(2*D, z[:m]-z[m:]) )
lapack.potrs(A, x)
# x[n:] := (D1+D2)^{-1} * (bx[n:] - D1*bz[:m] - D2*bz[m:] + (D1-D2)*P*x[:n])
u = P*x[:n]
x[n:] = div(x[n:] - mul(d1, z[:m]) - mul(d2, z[m:]) + mul(d1-d2, u), d1+d2)
# z[:m] := d1[:m] .* ( P*x[:n] - x[n:] - bz[:m])
# z[m:] := d2[m:] .* (-P*x[:n] - x[n:] - bz[m:])
z[:m] = mul(d[:m], u - x[n:] - z[:m])
z[m:] = mul(d[m:], -u - x[n:] - z[m:])
return f
sol = solvers.conelp(c, G, h, dims, kktsolver = F)
return sol[’x’][:n], sol[’z’][m:] - sol[’z’][:m]
-
Example: SDP with diagonal linear term
-
The SDP
can be solved efficiently by exploiting properties of the diag operator.
from cvxopt import blas, lapack, solvers, matrix
def mcsdp(w):
"""
Returns solution x, z to
(primal) minimize sum(x)
subject to w + diag(x) >= 0
(dual) maximize -tr(w*z)
subject to diag(z) = 1
z >= 0.
"""
n = w.size[0]
c = matrix(1.0, (n,1))
def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):
"""
y := alpha*(-diag(x)) + beta*y.
"""
if trans==’N’:
# x is a vector; y is a symmetric matrix in column major order.
y *= beta
y[::n+1] -= alpha * x
else:
# x is a symmetric matrix in column major order; y is a vector.
y *= beta
y -= alpha * x[::n+1]
def cngrnc(r, x, alpha = 1.0):
"""
Congruence transformation
x := alpha * r’*x*r.
r and x are square matrices.
"""
# Scale diagonal of x by 1/2.
x[::n+1] *= 0.5
# a := tril(x)*r
a = +r
tx = matrix(x, (n,n))
blas.trmm(tx, a, side = ’L’)
# x := alpha*(a*r’ + r*a’)
blas.syr2k(r, a, tx, trans = ’T’, alpha = alpha)
x[:] = tx[:]
dims = {’l’: 0, ’q’: [], ’s’: [n]}
def F(W):
"""
Returns a function f(x, y, z) that solves
-diag(z) = bx
-diag(x) - r*r’*z*r*r’ = bz
where r = W[’r’][0] = W[’rti’][0]^{-T}.
"""
rti = W[’rti’][0]
# t = rti*rti’ as a nonsymmetric matrix.
t = matrix(0.0, (n,n))
blas.gemm(rti, rti, t, transB = ’T’)
# Cholesky factorization of tsq = t.*t.
tsq = t**2
lapack.potrf(tsq)
def f(x, y, z):
"""
On entry, x contains bx, y is empty, and z contains bz stored
in column major order.
On exit, they contain the solution, with z scaled
(vec(r’*z*r) is returned instead of z).
We first solve
((rti*rti’) .* (rti*rti’)) * x = bx - diag(t*bz*t)
and take z = - rti’ * (diag(x) + bz) * rti.
"""
# tbst := t * bz * t
tbst = +z
cngrnc(t, tbst)
# x := x - diag(tbst) = bx - diag(rti*rti’ * bz * rti*rti’)
x -= tbst[::n+1]
# x := (t.*t)^{-1} * x = (t.*t)^{-1} * (bx - diag(t*bz*t))
lapack.potrs(tsq, x)
# z := z + diag(x) = bz + diag(x)
z[::n+1] += x
# z := -vec(rti’ * z * rti)
# = -vec(rti’ * (diag(x) + bz) * rti
cngrnc(rti, z, alpha = -1.0)
return f
sol = solvers.conelp(c, G, w[:], dims, kktsolver = F)
return sol[’x’], sol[’z’]
-
Example: Minimizing 1-norm subject to a 2-norm constraint
- In the second
example, we use a similar trick to solve the problem
The code below is efficient, if we assume that the number of rows in A is
greater than or equal to the number of columns.
def qcl1(A, b):
"""
Returns the solution u, z of
(primal) minimize || u ||_1
subject to || A * u - b ||_2 <= 1
(dual) maximize b^T z - ||z||_2
subject to || A’*z ||_inf <= 1.
Exploits structure, assuming A is m by n with m >= n.
"""
m, n = A.size
# Solve equivalent cone LP with variables x = [u; v].
#
# minimize [0; 1]’ * x
# subject to [ I -I ] * x <= [ 0 ] (componentwise)
# [-I -I ] * x <= [ 0 ] (componentwise)
# [ 0 0 ] * x <= [ 1 ] (SOC)
# [-A 0 ] [ -b ]
#
# maximize -t + b’ * w
# subject to z1 - z2 = A’*w
# z1 + z2 = 1
# z1 >= 0, z2 >=0, ||w||_2 <= t.
c = matrix(n*[0.0] + n*[1.0])
h = matrix( 0.0, (2*n + m + 1, 1))
h[2*n] = 1.0
h[2*n+1:] = -b
def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):
y *= beta
if trans==’N’:
# y += alpha * G * x
y[:n] += alpha * (x[:n] - x[n:2*n])
y[n:2*n] += alpha * (-x[:n] - x[n:2*n])
y[2*n+1:] -= alpha * A*x[:n]
else:
# y += alpha * G’*x
y[:n] += alpha * (x[:n] - x[n:2*n] - A.T * x[-m:])
y[n:] -= alpha * (x[:n] + x[n:2*n])
def Fkkt(W):
"""
Returns a function f(x, y, z) that solves
[ 0 G’ ] [ x ] = [ bx ]
[ G -W’*W ] [ z ] [ bz ].
"""
# First factor
#
# S = G’ * W**-1 * W**-T * G
# = [0; -A]’ * W3^-2 * [0; -A] + 4 * (W1**2 + W2**2)**-1
#
# where
#
# W1 = diag(d1) with d1 = W[’d’][:n] = 1 ./ W[’di’][:n]
# W2 = diag(d2) with d2 = W[’d’][n:] = 1 ./ W[’di’][n:]
# W3 = beta * (2*v*v’ - J), W3^-1 = 1/beta * (2*J*v*v’*J - J)
# with beta = W[’beta’][0], v = W[’v’][0], J = [1, 0; 0, -I].
# As = W3^-1 * [ 0 ; -A ] = 1/beta * ( 2*J*v * v’ - I ) * [0; A]
beta, v = W[’beta’][0], W[’v’][0]
As = 2 * v * (v[1:].T * A)
As[1:,:] *= -1.0
As[1:,:] -= A
As /= beta
# S = As’*As + 4 * (W1**2 + W2**2)**-1
S = As.T * As
d1, d2 = W[’d’][:n], W[’d’][n:]
d = 4.0 * (d1**2 + d2**2)**-1
S[::n+1] += d
lapack.potrf(S)
def f(x, y, z):
# z := - W**-T * z
z[:n] = -div( z[:n], d1 )
z[n:2*n] = -div( z[n:2*n], d2 )
z[2*n:] -= 2.0*v*( v[0]*z[2*n] - blas.dot(v[1:], z[2*n+1:]) )
z[2*n+1:] *= -1.0
z[2*n:] /= beta
# x := x - G’ * W**-1 * z
x[:n] -= div(z[:n], d1) - div(z[n:2*n], d2) + As.T * z[-(m+1):]
x[n:] += div(z[:n], d1) + div(z[n:2*n], d2)
# Solve for x[:n]:
#
# S*x[:n] = x[:n] - (W1**2 - W2**2)(W1**2 + W2**2)^-1 * x[n:]
x[:n] -= mul( div(d1**2 - d2**2, d1**2 + d2**2), x[n:])
lapack.potrs(S, x)
# Solve for x[n:]:
#
# (d1**-2 + d2**-2) * x[n:] = x[n:] + (d1**-2 - d2**-2)*x[:n]
x[n:] += mul( d1**-2 - d2**-2, x[:n])
x[n:] = div( x[n:], d1**-2 + d2**-2)
# z := z + W^-T * G*x
z[:n] += div( x[:n] - x[n:2*n], d1)
z[n:2*n] += div( -x[:n] - x[n:2*n], d2)
z[2*n:] += As*x[:n]
return f
dims = {’l’: 2*n, ’q’: [m+1], ’s’: []}
sol = solvers.conelp(c, G, h, dims, kktsolver = Fkkt)
if sol[’status’] == ’optimal’:
return sol[’x’][:n], sol[’z’][-m:]
else:
return None, None
-
Example: 1-norm regularized least-squares
- As an example that illustrates how
structure can be exploited in coneqp(), we consider the 1-norm regularized
least-squares problem
with variable x. The problem is equivalent to the quadratic program
with variables x and u. The implementation below is efficient when A has
many more columns than rows.
from cvxopt import matrix, spdiag, mul, div, blas, lapack, solvers, sqrt
import math
def l1regls(A, y):
"""
Returns the solution of l1-norm regularized least-squares problem
minimize || A*x - y ||_2^2 + || x ||_1.
"""
m, n = A.size
q = matrix(1.0, (2*n,1))
q[:n] = -2.0 * A.T * y
def P(u, v, alpha = 1.0, beta = 0.0 ):
"""
v := alpha * 2.0 * [ A’*A, 0; 0, 0 ] * u + beta * v
"""
v *= beta
v[:n] += alpha * 2.0 * A.T * (A * u[:n])
def G(u, v, alpha=1.0, beta=0.0, trans=’N’):
"""
v := alpha*[I, -I; -I, -I] * u + beta * v (trans = ’N’ or ’T’)
"""
v *= beta
v[:n] += alpha*(u[:n] - u[n:])
v[n:] += alpha*(-u[:n] - u[n:])
h = matrix(0.0, (2*n,1))
# Customized solver for the KKT system
#
# [ 2.0*A’*A 0 I -I ] [x[:n] ] [bx[:n] ]
# [ 0 0 -I -I ] [x[n:] ] = [bx[n:] ].
# [ I -I -D1^-1 0 ] [zl[:n]] [bzl[:n]]
# [ -I -I 0 -D2^-1 ] [zl[n:]] [bzl[n:]]
#
# where D1 = W[’di’][:n]**2, D2 = W[’di’][:n]**2.
#
# We first eliminate zl and x[n:]:
#
# ( 2*A’*A + 4*D1*D2*(D1+D2)^-1 ) * x[:n] =
# bx[:n] - (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:]
#
# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
#
# zl[:n] = D1 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2 * (-x[:n] - x[n:] - bzl[n:] ).
#
# The first equation has the form
#
# (A’*A + D)*x[:n] = rhs
#
# and is equivalent to
#
# [ D A’ ] [ x:n] ] = [ rhs ]
# [ A -I ] [ v ] [ 0 ].
#
# It can be solved as
#
# ( A*D^-1*A’ + I ) * v = A * D^-1 * rhs
# x[:n] = D^-1 * ( rhs - A’*v ).
S = matrix(0.0, (m,m))
Asc = matrix(0.0, (m,n))
v = matrix(0.0, (m,1))
def Fkkt(W):
# Factor
#
# S = A*D^-1*A’ + I
#
# where D = 2*D1*D2*(D1+D2)^-1, D1 = d[:n]**-2, D2 = d[n:]**-2.
d1, d2 = W[’di’][:n]**2, W[’di’][n:]**2
# ds is square root of diagonal of D
ds = math.sqrt(2.0) * div( mul( W[’di’][:n], W[’di’][n:]), sqrt(d1+d2) )
d3 = div(d2 - d1, d1 + d2)
# Asc = A*diag(d)^-1/2
Asc = A * spdiag(ds**-1)
# S = I + A * D^-1 * A’
blas.syrk(Asc, S)
S[::m+1] += 1.0
lapack.potrf(S)
def g(x, y, z):
x[:n] = 0.5 * ( x[:n] - mul(d3, x[n:]) +
mul(d1, z[:n] + mul(d3, z[:n])) - mul(d2, z[n:] -
mul(d3, z[n:])) )
x[:n] = div( x[:n], ds)
# Solve
#
# S * v = 0.5 * A * D^-1 * ( bx[:n] -
# (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:] )
blas.gemv(Asc, x, v)
lapack.potrs(S, v)
# x[:n] = D^-1 * ( rhs - A’*v ).
blas.gemv(Asc, v, x, alpha=-1.0, beta=1.0, trans=’T’)
x[:n] = div(x[:n], ds)
# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
x[n:] = div( x[n:] - mul(d1, z[:n]) - mul(d2, z[n:]), d1+d2 )\
- mul( d3, x[:n] )
# zl[:n] = D1^1/2 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2^1/2 * ( -x[:n] - x[n:] - bzl[n:] ).
z[:n] = mul( W[’di’][:n], x[:n] - x[n:] - z[:n] )
z[n:] = mul( W[’di’][n:], -x[:n] - x[n:] - z[n:] )
return g
return solvers.coneqp(P, q, G, h, kktsolver = Fkkt)[’x’][:n]