The complex matrix cube problem summer project – summary of results

by Orr Shalit

In the previous post I announced the project that I was going to supervise in the Summer Projects in Mathematics week at the Technion. In this post I wish to share what we did and what we found in that week.

I had the privilege to work with two very bright students who have recently finished their undergraduate studies: Mattya Ben-Efraim (from Bar-Ilan University) and Yuval Yifrach (from the Technion). It is remarkable the amount of stuff they learned for this one week project (the basics of C*-algebras and operator spaces), and that they actually helped settle the question that I raised to them.

I learned a lot of things in this project. First, I learned that my conjecture was false! I also learned and re-learned some programming abilities, and I learned something about the subtleties and limitations of numerical experimentation (I also learned something about how to supervise an undergraduate research project, but that’s besides the point right now).

Statement of the problem

Following old advice of Halmos, the problem that I posed to Mattya and Yuval was in the form a  yes/no question. To state this question, we need to recall some definitions. If A is an m \times m matrix, another n \times n matrix B is said to be a dilation of A if

B = \begin{pmatrix} A & * \\ * & * \end{pmatrix}

In this case, A is said to be a compression of B. We then write A \prec B. If A = (A_1, \ldots, A_d) and B = (B_1, \ldots, B_d) are tuple of matrices, we say that B is a dilation of A, and that A is a compression of B, if A_i \prec B_i for all i=1, \ldots, d. We then write A \prec B.

A d-tuple B = (B_1, \ldots, B_d) is said to be normal if B_i is normal and B_i B_j = B_j B_i for all i,j=1, \ldots, d. Normal tuples of matrices (or operators) are the best understood ones, because – thanks to the spectral theorem – they are simultaneously unitarily diagonalizable.

If A is an n \times n matrix, we define its norm \|A\| to be the operator norm of A when considered as an operator on \mathbb{C}^n, that is: \|A\| = \sup\{\|Ax\| : \|x\|=1, x \in \mathbb{C}^n\} (here \|x\| denotes the Euclidean norm \|x\| = \sqrt{\sum_{i=1}^n |x_i|^2}).

The complex matrix cube problem: Given a tuple A = (A_1, \ldots, A_d) of n \times n matrices, can one find a normal dilation M = (M_1, \ldots, M_d) of A such that \|M_i\| \leq \sqrt{d} \|A_i\| for all i=1, \ldots, d

I had some reasons to believe that the answer is yes, one of which was that it was proved that the answer is yes if A_1, \ldots, A_d are all selfadjoint; see this paper by Passer, Solel, and myself (I reported on this paper in this previous post). Passer later proved that if we replace \sqrt{d} with \sqrt{2d} then the answer is yes for arbitrary tuples. Passer’s proof did not look optimal to me. Also, I carried out some primitive numerical experimentation that seemed to verify that \sqrt{d} is plausible.

Methods and results

Suppose we are given a d-tuple of n \times n contractions A = (A_1, \ldots, A_d). We wish to know whether it is true or false that A has a normal dilation M such that \|M_i\| \leq \sqrt{d} for all i=1, \ldots, d (this is not exactly the way we formulated the problem above, but it can be seen to be equivalent).

The first observation is that it is enough to consider only tuples of unitaries. Indeed, if T is a contraction (meaning that \|T\|\leq 1) then

\begin{pmatrix} T & \sqrt{I - TT^*} \\ \sqrt{I - T^* T} & -T^* \end{pmatrix}

is a unitary dilation of T. So given a d-tuple A of contractions, we can find a d-tuple U of unitaries such that A \prec U. Thus, we may as well assume that A is a tuple of unitaries, and ask whether we can dilate A.

We considered normal tuples N = (N_1, \ldots, N_d) with joint eigenvalues at the vertices of the polytope Q = P_k \times \cdots P_k, where P_k is a regular polygon with k vertices that circumscribes the unit disc \overline{\mathbb{D}}. When k is moderately large, the boundary of Q is very close to \mathbb{T} \times \cdots \times \mathbb{T}, and in this post I will ignore this difference (the reader can check that for the results we get, ignoring this difference actually puts us on the safe side of things).

Given a fixed tuple of unitaries A, it can be shown that A has a normal dilation M with \|M_i\| \leq \rho for all i=1, \ldots, d if and only if

(*)       \|p(A)\| \leq \|p(\rho N)\|

for every matrix valued polynomial p of degree one p(z) = X_0 + \sum_{j=1}^d z_j X_j, where N is the fixed normal tuple we constructed above. Let me emphasize: M here is some normal dilation that we don’t know whether it exists or not, and N is the fixed tuple with joint eigenvalues at the vertices of the polytope Q = P_k \times \cdots \times P_k from above. We recall that a matrix valued polynomial is evaluated on a tuple A of n \times n matrices as follows:

p(A) = I_n \otimes X_0 + \sum_{j=1}^d A_j \otimes X_j.

So the first method of attack was the following: we randomly sampled a unitary tuple A, and then we tried to find a polynomial p such that (*) was violated, with \rho = \sqrt{d}. We thought of several ways to look for such a polynomial given A, one of which was naively trying to iterate over a mesh of all possible coefficients X_0, X_1, \ldots, X_d. As you can easily see this method is so inefficient that even for moderately small d and n the search could take us a lifetime. Another idea was to try to run some numerical optimization such as gradient descent on the function p \mapsto \|p(A)\| / \|p(\sqrt{d}N)\| but since this function is not convex this was also quite futile. And all this just for a given tuple A, which might happen to have a dilation.

The second general approach was still to randomly select a tuple of n \times n unitaries A and to check whether it has a normal dilation, but this time the test was somewhat more indirect. Basically, modulo some equivalences within the theory, we know that A has the required dilation of size at most \rho, if and only if there exists a UCP map sending \rho N_i to A_i for i=1,\ldots, d, where N is the tuple of normals constructed above. This, modulo some more equivalences (and as been noted in this paper of Helton, Klep and McCullough) is equivalent to the existence of positive semidefinite n \times n matrices C_1, \ldots, C_m  such that

\sum_{j=1}^m \rho \lambda_{i}^j C_j = A_i for i=1, \ldots, d

where N_i = (\lambda_i^1, \lambda_i^2, \ldots, \lambda_i^m), for i=1,\ldots, d, and

\sum_{j=1}^m C_j = I_n.

The existence of such semidefinite matrices C_1, \ldots, C_m can be interpreted as the feasibility of a certain semidefinite program (SDP). In fact, we decided to treat the full semidefinite program as follows

minimize \rho > 0

such that

C_j \geq 0 , j = 1, \ldots, m

\sum_{j=1}^d \lambda_i^j C_j = \rho^{-1} A_jj=1,\ldots, m

\sum_{j=1}^m C_j = I_n.

Note that we moved \rho to the right hand side, to make the equality constraint afiine in the variables C_1, \ldots, C_m and \sigma := \rho^{-1}. Recall that N_i = (\lambda_i^1, \ldots, \lambda_i^m) and A_i are all fixed. In the implementation we actually defined this as a minimization problem

maximize \sigma > 0

such that

C_j \geq 0 , j = 1, \ldots, m

\sum_{j=1}^d \lambda_i^j C_j = \sigma A_jj=1,\ldots, m

\sum_{j=1}^m C_j = I_n.

Now, there exists available software in Matlab that let’s one solve the above SDP quite reliably, and we used the high level interface CVX which invoked either one of the solvers SDPT3 and SeDuMi (we used both solvers and played with precision parameters to increase our confidence that the results we got were correct). This approach had the great advantage that (besides being much faster), it could tell us what is the smallest \rho = \sigma^{-1} such that A had a normal dilation M such that \|M_i\| \leq \rho.

We ran the tests for small values of n and d. You can see some histograms in the presentation (the value plotted in the histograms in the presentation is \rho/\sqrt{d}, in order to have a direct comparison with the conjecture). Interestingly, we see that with very high probability, the required value of \rho is on average significantly lower than \sqrt{d}. For d = 2 and n=3,4,5, we found a few random counter examples, but they required \rho that was just 2% over \sqrt{d}.

Once we know that the average value of \rho is less than \sqrt{d}, it heuristically becomes reasonable that counter examples are hard to come by, because of concentration of measure phenomena: roughly speaking, the probability of a Lipschitz function on the unitaries (say) to be \epsilon away from the mean goes down exponentially like C_1 \exp(-C_2 n \epsilon) with the dimension. For the same reason, once we found a counter example A, it is very hard to find coefficients X_0, \ldots, X_1, \ldots, X_n of a matrix valued polynomial p(z) = X_0 + \sum_{j=1}^d z_j X_j such that \|p(A)\| > \|p(\sqrt{d} N)\|. And indeed, we did not yet verify by an independent method that our counter examples are indeed counter examples.

The counter examples we found are very unlikely to be caused by numerical error, since we tested the result with a couple of solvers and also the advertised precision of the solvers is several orders of magnitude less than 2%.

After we found the random counter examples, it occurred to us that there was no reason to sample the d unitaries A_1, \ldots, A_d independently. We recalled that in the selfadjoint case, tightness of the constant \sqrt{d} was established using anti-commuting unitaries. Indeed, since counter examples are rare, one would think that the matrices would have to conspire somehow in order to mess up the inequality. So we searched for things that are anti-commuting-like. And it did indeed turn out that the q commuting matrices

A_1 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & q & 0 \\ 0 & 0 & q^2 \end{pmatrix}    A_2 = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0  \end{pmatrix}

where q = \exp(\frac{2 \pi i}{3}) are also a counter example (in the case d = 2, n= 3). We also still haven’t found a polynomial for which \|p(A)\| > \|p(\sqrt{2} N)\|. We will probably continue looking for one when the holiday is over, and then I will update.

Materials and summary

Here are Mattya and Yuval’s slides which they presented in the talk they gave at the end of the week. I also plan to put the code and files with raw results online on my homepage at some point.

Acknowledgements and credits

The main method for checking what is the “inflation constant” required for a dilation, using a semidefinite program, is based on basic operator space theory, and in particular draws upon the algorithm described in this paper of Helton, Klep and McCullough.

We used Matlab. The numerical heavy lifting was done by others. We solved the semidefinite program using CVX – a high level Matlab software package for specifying and solving convex programs. We also used YALMIP  – another high level Matlab software package for specifying and solving convex programs – to verify the results we obtained with CVX. Both CVX and YALMIP invoked SDP solvers SDPT3 and SeDuMi.

This project came after several years of collaboration with colleagues, and in particular, I had many conversations on the subject with Ben Passer before and during the projects week.

I owe many thanks to the organizers of this projects week, Ram Band, Tali Pinsky, and Ron Rosenthal. Thanks to this opportunity I explored an avenue that I never walked through before.