Erling's blog

Friday, May 22, 2015

Nonsymmetric semidefinite optimization problems.

In semidefinite optimization we optimize over a matrix variable that must be symmetric and positive semidefinite.

Assume we want to relax the assumption about symmetry. Is that an important generalization? The answer is no for the following reason. Since

(X+X')/2 is PSD

implies

is PSD. Observe

X = (X+X')/2+(X-X')/2

and

y'( (X-X')/2) y >= 0.

implying X is PSD.

Note (X-X')'=-(X-X') implying X-X' is skew symmetric.

Hence, any nonsymmetric semidefinite optimization problem can easily be posed as a standard symmetric semidefinite optimization problem.

Wednesday, February 18, 2015

Multi threaded programming

I found the talk: Plain Threads are the GOTO of todays computing by Hartmut Kaiser very interesting because I have been working on improving the multithreaded code in MOSEK recently. And is also thinking how MOSEK should deal with all the cores in the CPUs in the future. I agree with Hartmut something else than plain threads is needed.

Here are some potential replacements:

Berkeley Unified Parallel C
CheckedThreads Source with a liberal license. Cilk like. Both C and C++.
Cilk Plus Supported by Intel C, gcc and Clang.
HPX C++ only.
Occa. A unified approach to multithreaded languages. See also.
OpenMP Supported by many compilers.
Pfunc Cilk like features. Liberal license. Can be called from C but is C++.
Starpu
Threaded building blocks. C++ only.
Wool A pure C implementation that provides something very close to Cilk.
XKAAPI

Previously I have used OpenMP but I really dislike that. In my opinion it is somewhat ugly and you feel the Fortran origins. Recently I have played with Cilk which is very simple.

Checkedthreads seems like a good option if a simple C only tool can do the job. I have plan to try that at MOSEK.

Pfunc seems very powerful and there is ph.d. thesis about its design. The project might be semi-dead though.

Wool also seems very interesting. It is plain C and the author claims the spawn overhead is very low. There is also an older comparison with other task libraries.

Btw I biased towards tools that has no C++ bindings because currently MOSEK is plain C code. Adding a dependency on a C++ runtime library adds headaches.

Some common terminology when working on parallism is

Fork-join parallel model
Pthreads programming and tutorial

Wednesday, November 6, 2013

Complexity of solving conic quadratic problems

First a clarification conic quadratic optimization and second order cone optimization is the same thing. I prefer the name conic quadratic optimization though.

Frequently it is asked on the internet what is the computational complexity of solving conic quadratic problems. Or the related questions what is the complexity of the algorithms implemented in MOSEK, SeDuMi or SDPT3.

Here are a some typical questions

To the best of my knowledge almost all open source and commercial software employ a primal-dual interior-point algorithm using for instance the so-called Nesterov-Todd scaling.

A conic quadratic problem can be stated on the form

\[
\begin{array}{lccl}
\mbox{min} & \sum_{j=1}^d (c^j)^T x^j & \\
\mbox{st} & \sum_{j=1}^d A^j x^j & = & b \\
& x^j \in K^j & \\
\end{array}
\]
where \(K_j\) is a \(n^j\) dimensional quadratic cone. Moreover, I will use \(A = [A^1,\ldots, A^d ]\) and \(n=\sum_j n^j\). Note that \(d \leq n\). First observe the problem cannot be solved exactly on a computer using floating numbers since the solution might be irrational. This is in contrast to linear problems that always have rational solution if the data is rational.

Using for instance the primal-dual interior point algorithm the problem can be solved to \(\varepsilon\) accuracy in \(O(\sqrt{d} \ln(\varepsilon^{-1}))\) interior-point iterations, where \(\varepsilon\) is the accepted duality gap. The most famous variant having that iteration complexity is based on Nesterov and Todds beautiful work on symmetric cones.

Each iteration requires solution of a linear system with the coefficient matrix
\[ \label{neweq} \left [ \begin{array}{cc} H & A^T \\ A & 0 \\ \end{array} \right ] \mbox{ (*)} \]
This is the most expensive operation and that can be done in \(O(n^3)\) complexity using Gaussian elimination so we end at the complexity \(O(n^{3.5}\ln(\varepsilon^{-1}))\).

That is the theoretical result. In practice the algorithms usually works much better because they normally finish in something like 10 to 100 iterations and rarely employs more than 200 iterations. In fact if the algorithm requires more than 200 iterations then typically numerical issues prevent the software from solving the problem.

Finally, typically conic quadratic problem is sparse and that implies the linear system mentioned above can be solved must faster when the sparsity is exploited. Figuring our to solve the linear equation system (*) in the lowest complexity when exploiting sparsity is NP hard and therefore optimization only employs various heuristics such minimum degree order that helps cutting the iteration complexity. If you want to know more then read my Mathematical Programming publication mentioned below. One important fact is that it is impossible to predict the iteration complexity without knowing the problem structure and then doing a complicated analysis of that. I.e. the iteration complexity is not a simple function of the number constraints and variables unless A is completely dense.

To summarize primal-dual interior-point algorithms solve a conic quadratic problem in less 200 times the cost of solving the linear equation system (*) in practice.

So can the best proven polynomial complexity bound be proven for software like MOSEK. In general the answer is no because the software employ an bunch of tricks that speed up the practical performance but unfortunately they destroy the theoretical complexity proof. In fact, it is commonly accepted that if the algorithm is implemented strictly as theory suggest then it will be hopelessly slow.

I have spend of lot time on implementing interior-point methods as documented by the Mathematical Programming publication and my view on the practical implementations are that are they very close to theory.

Why I hate OpenMP

At +MOSEK we use OpenMP to parallelize our code. Well, in fact we used to use OpenMP exclusively. However, we have now moved to use pthreads or native Windows threads.

This seems like a very bad idea because the normal recommendation is to OpenMP, CiLK or Intel parallel building blocks or like. But I will now explain why I ignore this recommendation and in fact think it is a bit naive.

At MOSEK we build a shared library (a DLL on Windows) that we sell to our customers. This library is called from the users program or programs like MATLAB, R etc. We have absolutely no control over the calling application. The calling application may be linked OpenMP or another threading library. Our users expects our application to work almost independently of what they do.

However, OpenMP is build on the implicit assumption you are an academic who have all the source code when building the application and if an application is linked with OpenMP then the same version of OpenMP is used throughout the application. This definitely does not hold for us.

We have to ask ourselves what happen if the user build their application linking with MOSEK using gcc OpenMP and we build our library with Intel OpenMP. I think the answer is that the final applications crashes.

I recently reported crash in the Intel MKL BLAS library when called from MATLAB that seems to be caused by OpenMP issue. See the report. The answer from Intel is are you sure you use the same OpenMP as MATLAB. This is illustrates very well what I hate about OpenMP. It does not work unconditionally. I mean if you link with pthreads it then it works because pthreads is integrated into the operating system.

So my conclusion is. If you are a commercial library vendor then do not use OpenMP unless you want to get into trouble.

Wednesday, June 26, 2013

Slow convergence in conic quadratic optimization

The recent paper by Gould, Orban, and Robinson in Mathematical Programming Computation study the QP
\[
\begin{array}{lc}
\mbox{min} & x_1^2    \\
\mbox{st}     &  x_1 \geq 0
\end{array}
\]
because the problem does not have s strict complementarity solution. This fact leads interior-point methods to converge slowly.

In interesting observation is that it can be formulated as the following conic quadratic optimization problem:
\[
\begin{array}{lc}
\mbox{min} &  x_2        \\
\mbox{st}    & x_2 \geq ||x_1|| \\
                    &  x_1  \geq 0
\end{array}
\]
An optimal primal solution is \(x_1 = 0\) and \(x_2=0\). Observe Vanderbei study the same problem in his working paper Section 2.2) on solving SOCPs with LOQO.

If I solve the problem with MOSEK I obtain

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+000 1.0e+000 2.0e+000 0.00e+000  1.000000000e+000  0.000000000e+000  1.0e+000 0.00  
1   1.6e-001 1.6e-001 3.2e-001 1.00e+000  2.404340377e-001  0.000000000e+000  1.6e-001 0.00  
2   8.2e-003 8.2e-003 1.6e-002 1.12e+000  1.181687831e-002  0.000000000e+000  8.2e-003 0.00  
3   4.1e-004 4.1e-004 8.2e-004 1.01e+000  5.891864452e-004  0.000000000e+000  4.1e-004 0.00  
4   2.1e-005 2.1e-005 4.1e-005 1.00e+000  2.945495827e-005  0.000000000e+000  2.1e-005 0.00  
5   2.6e-010 2.6e-010 5.2e-010 1.00e+000  4.144019664e-010  0.000000000e+000  2.6e-010 0.00

As can be seen from mu column (average complementarity gap) MOSEK converges quickly. In fact there is quadratic convergence in the last iteration. Now the dual problem is
\[
\begin{array}{lc}
\mbox{max} & 0        \\
\mbox{st}    & s_3 = 1 \\
                     & s_2 \geq ||s_1|| \\
                    &  s_1  \geq 0
\end{array}
\]
and an optimal dual solution is \(s_1=0.0\), \(s_2=1.0\) and \(s_3=1\). In this case the dual solution is in the interior and hence the solutions are stricly complementarity. This is also indicated by the interior-point optimizer in MOSEK which also converge quickly.

An equivalent conic quadratic problem using a rotated quadratic cone can be stated as follows
\[
\begin{array}{lc}
\mbox{min} &  x_2        \\
\mbox{st}    & 2 x_3 x_2 \geq ||x_1||^2 \\
                    & x_3 =1 \\
                    &  x_1  \geq 0
\end{array}
\]
Now solving that problem with MOSEK produces

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+000 1.0e+000 1.7e+000 0.00e+000  7.071067812e-001  0.000000000e+000  1.0e+000 0.00  
1   1.4e-001 1.4e-001 2.4e-001 6.09e-001  1.517397994e-001  -4.749764644e-002 1.4e-001 0.00  
2   2.2e-002 2.2e-002 3.7e-002 1.21e+000  2.120093451e-002  -6.971441263e-003 2.2e-002 0.00  
3   5.3e-003 5.3e-003 9.0e-003 1.14e+000  4.674957163e-003  -1.445168902e-003 5.3e-003 0.00  
4   1.5e-003 1.5e-003 2.6e-003 1.07e+000  1.234899144e-003  -3.403939914e-004 1.5e-003 0.00  
5   4.4e-004 4.4e-004 7.4e-004 1.04e+000  3.331026029e-004  -7.962820209e-005 4.4e-004 0.00  
6   1.2e-004 1.2e-004 2.1e-004 1.02e+000  8.925963682e-005  -1.860468131e-005 1.2e-004 0.00  
7   3.4e-005 3.4e-005 5.8e-005 1.01e+000  2.373850359e-005  -4.389268847e-006 3.4e-005 0.00  
8   9.2e-006 9.2e-006 1.6e-005 1.01e+000  6.273175074e-006  -1.049389687e-006 9.2e-006 0.00  
9   2.4e-006 2.4e-006 4.2e-006 1.00e+000  1.649538759e-006  -2.545236451e-007 2.4e-006 0.00  
10  6.5e-007 6.5e-007 1.1e-006 1.00e+000  4.321347177e-007  -6.259486109e-008 6.5e-007 0.00  
11  1.7e-007 1.7e-007 2.9e-007 1.00e+000  1.128991865e-007  -1.558439042e-008 1.7e-007 0.00  
12  4.5e-008 4.5e-008 7.7e-008 1.00e+000  2.943872208e-008  -3.919657806e-009 4.5e-008 0.00  
13  1.2e-008 1.2e-008 2.0e-008 1.00e+000  7.665414689e-009  -9.952574739e-010 1.2e-008 0.00

It is seen that interior-point optimizer now converge at a much slower rate and takes many more iterations. It is easy to verify that there is no strictly complementarity solution in this case.

This give rise to some comments. There might be several ways of representing a set and they not all equally effective. Secondly maybe we should think about improving the code for problems without a stricly complementarity solution as Gould et al. is suggesting.

Friday, May 25, 2012

An example of a badly scaled model with an easy fix

A scaled down version of problem we just received at MOSEK is

min s^2
st. Ax - b = s (P)
x >= 0

Assume for simplicity A is a matrix with 1 row. Now in this case the elements in A is very large i.e. of the order of magnitude 10^7. This implies the problem (P) is badly scaled because the row contains a lot of elements of the size 10^7 and one of the size 1.

Now an equally good model is

min s^2
st. Ax - b = (||A||_inf) s (P')
x >= 0

which is nicely scaled. An optimal solution to (P') is clearly also optimal to (P)!

Note this line of thinking could be relevant anywhere were penalty variables is employed. Many optimization problems contains penalty variables.

Friday, December 9, 2011

An observation about ns1687037

Hans Mittleman test various optimization software packages on his benchmark website. One of the test problems is ns1687037. An inspection of that problem reveals many constraint blocks of the form

R0002624: 5.015e+004 C0024008 ....+ C0025749 = 5.0307748e+007
R0002625: 0e+000 <= - C0025749 + C0025750
R0002626: 0e+000 <= C0025749 + C0025750
R0002627: 1.9877659e-008 C0025750 - C0025751 = 0e+000
C0025749 is free

These constraints implies

-C0025750 <= C0025749 <= C0025750
1.9877659e-008 C0025750 = C0025751

and hence

1.9877659e-008 abs( C0025749) <= C0025751

In other words variable C0025749 is an elastic/penalty variable for constraint R0002624. Moreover, we see variable C0025751 is identical 1.0e-8 of the elastic variable in constraint R0002624. Now the right hand side of constraint R0002624 is 10^7 and hence it is more or less the rounding error that is penalized. That makes me wonder if that model provides reliable results or does what the author wants!