Wednesday, November 6, 2013

Why I hate OpenMP

At +MOSEK we use OpenMP to parallelize our code. Well, in fact we used to use OpenMP exclusively. However, we have now moved to use pthreads or native Windows threads.

This seems like a very bad idea because the normal recommendation is to OpenMP, CiLK or Intel parallel building blocks or like. But I will now explain why I ignore this recommendation  and in fact think it is a bit naive.

At MOSEK we build a shared library (a DLL on Windows) that we sell to our customers. This library is called from the users program or programs like MATLAB, R etc. We have absolutely no control over the calling application. The calling application may be linked OpenMP or another threading library. Our users expects our application to work almost independently of what they do.

However, OpenMP is build on the implicit assumption you are an academic who have all the source code when building the application  and if an application is linked with OpenMP then the same version of OpenMP is used throughout the application. This definitely does not hold for us.

We have to ask ourselves what happen if the user build their application linking with MOSEK using gcc OpenMP and we build our library with Intel OpenMP. I think the answer is that the final applications crashes.

I recently reported crash in the Intel MKL BLAS library when called from MATLAB that seems to be caused by OpenMP issue. See the report. The answer from Intel is are you sure you use the same OpenMP as MATLAB. This is illustrates very well what I hate about OpenMP. It does not work unconditionally. I mean if you link with pthreads it then it works because pthreads is integrated into the operating system.

So my conclusion is. If you are a commercial library vendor then do not use OpenMP unless you want to get into trouble.

Wednesday, June 26, 2013

Slow convergence in conic quadratic optimization

The recent paper by Gould, Orban, and Robinson  in Mathematical Programming Computation study the QP
\[
\begin{array}{lc}
\mbox{min}  & x_1^2    \\
\mbox{st}     &  x_1 \geq 0
\end{array}
\]
because the problem does not have s strict complementarity solution. This fact leads interior-point methods to converge slowly.

In interesting observation is that it can be formulated as the following conic quadratic optimization problem:
\[
\begin{array}{lc}
\mbox{min} &  x_2        \\
\mbox{st}    & x_2 \geq ||x_1|| \\
                    &  x_1  \geq 0 
\end{array}
\]
An optimal primal solution is \(x_1 = 0\) and \(x_2=0\). Observe Vanderbei study the same problem in his working paper Section 2.2) on solving SOCPs with LOQO.

If I solve the problem with MOSEK I obtain

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+000 1.0e+000 2.0e+000 0.00e+000  1.000000000e+000  0.000000000e+000  1.0e+000 0.00  
1   1.6e-001 1.6e-001 3.2e-001 1.00e+000  2.404340377e-001  0.000000000e+000  1.6e-001 0.00  
2   8.2e-003 8.2e-003 1.6e-002 1.12e+000  1.181687831e-002  0.000000000e+000  8.2e-003 0.00  
3   4.1e-004 4.1e-004 8.2e-004 1.01e+000  5.891864452e-004  0.000000000e+000  4.1e-004 0.00  
4   2.1e-005 2.1e-005 4.1e-005 1.00e+000  2.945495827e-005  0.000000000e+000  2.1e-005 0.00  
5   2.6e-010 2.6e-010 5.2e-010 1.00e+000  4.144019664e-010  0.000000000e+000  2.6e-010 0.00  

As can be seen from mu column (average complementarity gap) MOSEK converges quickly. In fact there is quadratic convergence in the last iteration. Now the dual problem is
\[
\begin{array}{lc}
\mbox{max} & 0        \\
\mbox{st}    & s_3  = 1 \\
                     & s_2 \geq ||s_1|| \\
                    &  s_1  \geq 0 
\end{array}
\]
and an optimal dual solution is \(s_1=0.0\), \(s_2=1.0\) and \(s_3=1\). In this case the dual solution is in the interior and hence the solutions are stricly complementarity. This is also indicated by the interior-point optimizer in MOSEK which also converge quickly.

An equivalent conic quadratic problem using a rotated quadratic cone can be stated as follows
\[
\begin{array}{lc}
\mbox{min} &  x_2        \\
\mbox{st}    & 2 x_3 x_2 \geq ||x_1||^2 \\
                    & x_3  =1 \\
                    &  x_1  \geq 0 
\end{array}
\]
Now solving that problem with MOSEK produces
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+000 1.0e+000 1.7e+000 0.00e+000  7.071067812e-001  0.000000000e+000  1.0e+000 0.00  
1   1.4e-001 1.4e-001 2.4e-001 6.09e-001  1.517397994e-001  -4.749764644e-002 1.4e-001 0.00  
2   2.2e-002 2.2e-002 3.7e-002 1.21e+000  2.120093451e-002  -6.971441263e-003 2.2e-002 0.00  
3   5.3e-003 5.3e-003 9.0e-003 1.14e+000  4.674957163e-003  -1.445168902e-003 5.3e-003 0.00  
4   1.5e-003 1.5e-003 2.6e-003 1.07e+000  1.234899144e-003  -3.403939914e-004 1.5e-003 0.00  
5   4.4e-004 4.4e-004 7.4e-004 1.04e+000  3.331026029e-004  -7.962820209e-005 4.4e-004 0.00  
6   1.2e-004 1.2e-004 2.1e-004 1.02e+000  8.925963682e-005  -1.860468131e-005 1.2e-004 0.00  
7   3.4e-005 3.4e-005 5.8e-005 1.01e+000  2.373850359e-005  -4.389268847e-006 3.4e-005 0.00  
8   9.2e-006 9.2e-006 1.6e-005 1.01e+000  6.273175074e-006  -1.049389687e-006 9.2e-006 0.00  
9   2.4e-006 2.4e-006 4.2e-006 1.00e+000  1.649538759e-006  -2.545236451e-007 2.4e-006 0.00  
10  6.5e-007 6.5e-007 1.1e-006 1.00e+000  4.321347177e-007  -6.259486109e-008 6.5e-007 0.00  
11  1.7e-007 1.7e-007 2.9e-007 1.00e+000  1.128991865e-007  -1.558439042e-008 1.7e-007 0.00  
12  4.5e-008 4.5e-008 7.7e-008 1.00e+000  2.943872208e-008  -3.919657806e-009 4.5e-008 0.00  
13  1.2e-008 1.2e-008 2.0e-008 1.00e+000  7.665414689e-009  -9.952574739e-010 1.2e-008 0.00  

It is seen that interior-point optimizer now converge at a much slower rate and takes many more iterations. It is easy to verify that there is no strictly complementarity solution in this case.

This give rise to some comments. There might be several ways of representing a set and they not all equally effective. Secondly maybe we should think about improving the code for problems without a stricly complementarity solution as Gould et al. is suggesting.




Friday, May 25, 2012

An example of a badly scaled model with an easy fix

A scaled down version of problem we just received at MOSEK is

min s^2
st.   Ax - b = s   (P)
       x >= 0

Assume for simplicity A is a matrix with 1 row. Now in this case the elements in A is very large i.e. of the order of magnitude 10^7.  This implies the problem (P)  is badly scaled because the row contains a lot of elements of the size 10^7 and one of the size 1.

Now an equally good model is


min s^2
st.   Ax - b = (||A||_inf) s   (P')
       x >= 0


which is nicely scaled. An optimal solution to (P') is clearly also optimal to (P)!

Note this line of thinking could be relevant anywhere were penalty variables is employed. Many optimization problems contains penalty variables.



Friday, December 9, 2011

An observation about ns1687037


Hans Mittleman test various optimization software packages on his benchmark website. One of the test  problems is ns1687037. An inspection of that problem reveals many constraint blocks of the form

R0002624: 5.015e+004 C0024008   ....+ C0025749 = 5.0307748e+007
R0002625: 0e+000 <= - C0025749 + C0025750
R0002626: 0e+000 <= C0025749 + C0025750
R0002627: 1.9877659e-008 C0025750 - C0025751 = 0e+000
                  C0025749 is free

These constraints implies

 -C0025750  <=  C0025749 <=  C0025750
1.9877659e-008 C0025750  = C0025751

and hence

1.9877659e-008   abs( C0025749)  <= C0025751

In other words variable C0025749 is an elastic/penalty variable for constraint R0002624. Moreover, we see variable C0025751 is identical 1.0e-8 of the elastic variable in constraint R0002624. Now the right hand side of constraint R0002624 is 10^7 and hence it is more or less the rounding error that is penalized. That makes me wonder if that model provides reliable results or does what the author wants!

Wednesday, November 23, 2011

A follow up on lower bounds on fill-in

This is an follow up on my previous post on lower bounds on the fill in when doing a sparse Cholesky factorization.

I posted my question to the NA-NET and below is a commented version of the replies I got.

Sivan Toledo pointed out the paper:

  • A Polynomial Approximation Algorithm for the Minimum Fill-In Problem
  • A. Natanzon, R. Shamir and R.Sharan
  • SIAM Journal on Computing, Vol. 30, No. 4, pp. 1067-1079 (2000)

and mentioned there might be a paper by Phil Klein. Most likely it is the report:
and the paper
  • Cutting down on fill using nested dissection: provably good elimination orderings 
  • Ajit Agrawal, Philip N. Klein, and R. Ravi 
  • Graph Theory and Sparse Matrix Computation, edited by A. George, J. Gilbert, and J. W. H. Liu, volume 56 in the IMA Volumes in Mathematics and its Applications, Springer-Verlag (1993), pp. 31-55.
A related work to the above work the ph.d.by Wee-Liang  Heng with the title: "Approximately Optimal Elimination Orderings for Sparse Matrices".  I have printed copy somewhere but cannot locate it now.

The above mentioned work provides algorithm for computing a good  symmetric permutation. However, to the best of my knowledge they are not used in practice but definitely something that I should check out.

Esmond Ng replied and said it is hard to come up with lower bounds general matrices but mentioned bounds can be obtained for special matrices. The relevant papers are

  • Complexity Bounds for Regular Finite Difference and Finite Element Grids
  • Hoffman, Alan J.; Martin, Michael S.; Rose, Donald J.
  • SIAM Journal on Numerical Analysis, vol. 10, no. 2, pp. 364-369.
and
  • Nested Dissection of a Regular Finite Element Mesh 
  • A. George
  • SIAM J. Numer. Anal. 10, pp. 345-363.
I am aware of this work but I am was mainly looking for information about general matrices since the matrices I experience in MOSEK almost always never have grid structure. MOSEK is an optimization package and hence the underlying applications are mostly from economics and planning which give rise to very different matrices than those coming from physics applications. 
Jeff Ovall point a diffrent line of research in his reply:
If you are willing to relax your notion of fill-in a bit, then I may be able to point you in a helpful direction.  Instead of thinking of fill-in in terms of putting non-zeros where their used to be zeros, one can also think of it as (slightly) increasing the rank of low-rank blocks.  For example, the inverse
of the tridiagonal matrix with stencil (-1,2,-1) has no non-zero entries, but the rank of any off-diagonal block is precisely one, so the "fill-in" in this sense is small (requiring only O(n log n) storage instead of n^2).  Hierarchical matrices (Hackbusch, Grasedyck, Boerm, Bebendorf, ... ) exploit this notion of low-rank fill-in not only to compress the "important" information in a matrix, but also to maintain a compressed format while performing factorizations (LU, Cholesky).  From the algebraic point of view, it is the Rank-Nullity Theorem which implies that inverses of sparse matrices will have large blocks which are of low rank.  If the LU-factorization is thought of block-wise, then  this theorem also has something to say about the ranks of the blocks which appear, though it is not obvious to me how to get sharp lower-bounds.  The paper: 
  • The interplay of ranks of submatrices
  • Strang, G. & Nguyen, T. 
  • SIAM Rev., 2004, 46, 637-646 (electronic)
To summarize then there does not seem to be any good lower bound on the minimal amount of fill in possible when computing a sparse Cholesky factorization of a symmetric permuted matrix.




Thursday, November 17, 2011

Lower bounds on the minimal fill-in when factorizing a positive symmetric matrix. Any help out there?

When computing a Cholesky factorization of a positive definite
symmetric matrix A, then it is well-known that a suitable symmetric
reordering helps keeping the the number of fill-ins and the total
number flops down.

Finding the optimal ordering is NP-hard but good orderings can be
obtained with the minimum degree, nested dissection (aka graph
partitioning based) algorithms.  Those algorithms provides an upper
bound on the minimal amount of fill-in. However, I wonder is there any
way to compute a nontrivial lower bound on the minimal amount of
fill-in?

I have been searching the literature but have not found a any good
reference.  Do you have any suggestions? The problem sounds hard I
know.

Why is a lower bound important? Well, it would help evaluate the
quality of the ordering algorithms that we have implemented in MOSEK
(www.mosek.com).  Moreover, the minimum degree ordering is usually
computational cheap whereas nested dissection is expensive. A lower
bound could help me determine when the minimum degree ordering
potentially could be improved by using nested dissection.




Thursday, September 8, 2011

Conference in Honor of Etienne Loute

Yesterday, I was at a conference honoring Etienne Loute where I presented the talk: "Convex Optimization : Conic Versus Functional Form". The messages of the talk is if you can formulate an optimization problem as a conic quadratic optimization problem (aka SOCP) then there are many good reasons to prefer this form over other forms. Some reasons are:
  • Conic quadratic problems are convex construction.
  • They are almost as simple as LPs to deal with in software.
  • Duality theory is almost as simple as the linear case.
  • The (Nesterov-Todd) primal-dual algorithm for conic quadratic problems is extremely good.

I had feared that the distinguished audience would consider my talk too simple. However, the speaker before me was Bob Fourer of AMPL. The title of his talk was about checking convexity of general optimization problems formulated in AMPL and it had two parts. The first part was about checking convexity and the second part was about how you in some cases automatically could convert optimization problems on functional form to a conic quadratic optimization problem. So the two talks complemented each other very well and it made me feel better about my talk.

Finally, I would like to mention that Yurii Nesterov was one of the speakers. He must have enjoyed the day since two speakers was talking about his baby named optimization over symmetric cones.