The Moment-SOS hierarchy and the Christoffel-Darboux kernel

We consider the global minimization of a polynomial on a compact set B. We show that each step of the Moment-SOS hierarchy has a nice and simple interpretation that complements the usual one. Namely, it computes coefficients of a polynomial in an orthonormal basis of L 2 (B, $\mu$) where $\mu$ is an arbitrary reference measure whose support is exactly B. The resulting polynomial is a certain density (with respect to $\mu$) of some signed measure on B. When some relaxation is exact (which generically takes place) the coefficients of the optimal polynomial density are values of orthonormal polynomials at the global minimizer and the optimal (signed) density is simply related to the Christoffel-Darboux (CD) kernel and the Christoffel function associated with $\mu$. In contrast to the hierarchy of upper bounds which computes positive densities, the global optimum can be achieved exactly as integration against a polynomial (signed) density because the CD-kernel is a reproducing kernel, and so can mimic a Dirac measure (as long as finitely many moments are concerned).


Introduction
Consider the Polynomial Optimization Problem (POP): where B ⊂ R n is a compact basic semi-algebraic set.For the hierarchy of upper bounds discussed below, B is restricted to be a "simple" set like e.g. a box, an ellipsoid, a simplex, a discrete-hypercube, or their image by an affine transformation.Indeed, to define an SOS-hierarchy of upper bounds converging to the global minimum f * as described in e.g.[1,4,9], we use a measure µ whose support is exactly B, and for which all moments µ α := B x α dµ , α ∈ N n , can be obtained numerically or in closed-form.For instance if B is a box, an ellipsoid or a simplex, µ can chosen to be the Lebesgue measure restricted to B. On the hypercube {−1, 1} n µ one may choose for µ the counting measure, etc.
Work partly funded by the AI Interdisciplinary Institute ANITI through the French "Investing for the Future PI3A" program under the Grant agreement ANR-19-PI3A-0004.
A hierarchy of lower bounds.To approximate f * from below, consider the hierarchy of semidefinite programs indexed by t ∈ N: where Σ[x] t denotes the space of sum-of-squares (SOS) polynomials of degree at most 2t.Under some Archimedean assumption on the g j 's, ρ t ≤ f * for all t and the sequence of lower bounds (ρ t ) t∈N is monotone non decreasing and converges to f * as t increases.Moreover, by a result of Nie [7], its convergence is finite generically, and global minimizers can be extracted from an optimal solution of the semidefinite program which is the dual of (1.2); see e.g.[5].The sequence of semidefinite programs (1.2) and their duals, both indexed by t, forms what is called the Moment-SOS hierarchy initiated in the early 2000's.For more details on the Moment-SOS hierarchy and its numerous applications in and outside optimization, the interested reader is referred to [3,5].
A hierarchy of upper bounds.Let µ be a finite Borel measure whose support is exactly B, where now B is a "simple" set as mentioned earlier.
(Hence all moments of µ are available in closed form.)To approximate f * from above, consider the hierarchy of semidefinite programs for any feasible SOS σ.In [4] it was proved that u t ↓ f * as t increases, and in fact solving the dual of (1.3) is solving a generalized eigenvalue problem for a certain pair of real symmetric matrices.In a series of papers, de Klerk, Laurent an co-workers have provided several rates of convergence of u t ↓ f * for several examples of sets B. For more details and results, the interested reader is referred to [1,9,10,11] and references therein.
The meaning of (1.3) is clear if one recalls that where M (B) + is the space of all finite Borel measures on B. Indeed in (1.3) one only considers the (restricted) subset of probability measures on B that have a density (an SOS of degree at most 2t) with respect to µ whereas in (1.4) one considers all probability measures on B. In particular, the Dirac measure φ := δ ξ at any global minimiser ξ ∈ B belongs to M (B) + but does not have a density with respect to µ, which explains why the convergence u t ↓ f * as t increases, can be only asymptotic and not finite; an exception is when B is a finite set (e.g.B = {−1, 1} n and µ is the counting measure).
1.2.Contribution.Our contribution is to show that in fact the dual of the semidefinite program (1.2) for computing the lower bound ρ t has also an interpretation of the same flavor as (1.3)where one now considers signed Borel measures φ t with a distinguished polynomial density with respect to µ. Namely, the dual of (1.2) minimizes B f dφ t over signed measures φ t of the form: where : is a family of polynomials that are orthonormal with respect to µ, and -the coefficients σ t = (σ α ) α∈N n 2t of the polynomial σ t ∈ R[x] 2t satisfy the usual semidefinite constraints that are necessary for σ t to be moments of a measure on B.
Eventually for some t ∈ N, σ t satisfies: (1.6) where ξ is an arbitrary global minimizer and δ ξ is the Dirac measure at ξ ∈ B. Indeed then , considered to be a finite-dimensional subspace of the Hilbert space L 2 (B, µ).Moreover, σ t (ξ) −1 is nothing less than the Christoffel function evaluated at the global minimizer ξ of f on B.
As a take home message and contribution of this paper, it turns out that the dual of the step-t semidefinite relaxation (1.2) is a semidefinite program that computes the coefficients σ t = (σ α ) of the polynomial density σ t in (1.5).In addition, when the relaxation is exact then σ t (ξ) −1 is the Christoffel function of µ, evaluated at a global minimizer ξ of f on B.
Interestingly, in the dual of (1.2) there is no mention of the reference measure µ.Only after we fix some arbitrary reference measure µ on B, we can interpret an optimal solution as coefficients σ t of an appropriate polynomial density with respect to µ.
So in both (1.3) and the dual of (1.2), one searches for a polynomial "density" with respect to µ.In (1.3) one searches for an SOS density (hence a positive density) whereas in the dual of (1.2) one searches for a signed polynomial density whose coefficients (in the basis of orthonormal polynomials) are moments of a measure on B (ideally the Dirac at a global minimizer).
The advantage of the (signed) polynomial density in (1.5) compared to the (positive) SOS density in (1.3), is to be able to obtain the global optimum f * as the integral of f against this density, which is impossible with the SOS density of (1.3).
At last but not least, this interpretation establishes another (and rather surprising) simple link between polynomial optimization (here the Moment-SOS hierarchy), the Christoffel-Darboux kernel and the Christoffel function, fundamental tools in the theory of orthogonal polynomials and the theory of approximation.Previous contributions in this vein include [6] to characterize upper bounds (1.3), [1,9,10] to analyze their rate of convergence to f * , and the more recent [11] for rate of convergence of both upper and lower bounds on B = {0, 1} n .

Main result
2.1.Notation and definition.Let R[x] = R[x 1 , . . ., x n ] be the ring of real polynomials in the variables x 1 , . . ., x n and let R[x] t ⊂ R[x] be its subspace of polynomials of degree at most t.Let N n t := {α ∈ N n : |α| ≤ t} where |α| = i α i .For an arbitrary Borel subset X of R n , denote by M (X ) + the convex cone of finite Borel measures on X ⊂ R n , and by P(X ) is subset of probability measures on X ..

2.2.
Moment and localizing matrices.Given an sequence y = (y α ) α∈N n and polynomial g ∈ R[x], x → g(x) := γ g γ x γ , the localizing matrix M t (g y) associated with g and y is th real symmetric matrix with rows and columns indexed by α ∈ N n t and with entries If g(x) = 1 for all x then M t (g y) (= M t (y)) is called the moment matrix.
A sequence y = (y α ) α∈N n has a representing measure if there exists a (positive) finite Borel measure φ on R n such that y α = x α dφ for all α ∈ N n .
If y has a representing measure supported on {x : g(x) ≥ 0} then M t (y) 0 and M t (g y) 0 for all t ∈ N. The converse is not true in general; however, the following important result is at the core of the Moment-SOS hierarchy.
Then a sequence y = (y α ) α∈N n has a representing measure on G if and only if M t (g j y) 0 for all t ∈ N, and all j = 0, . . ., m.
Orthonormal polynomials.Let B ⊂ R n be the compact basic semialgebraic set defined in (1.1) assumed to have a nonempty interior.Let µ be a finite Borel (reference) measure whose support is exactly B and with associated sequence of orthonormal polynomials (T α ) For instance, if B = [−1, 1] n and µ is the uniform probability distribution on B, one may choose for the family (T α ) the tensorized Legendre polynomials.Namely if (T j ) ⊂ R[x] is the family of univariate Legendre polynomials, then For every t ∈ N, the mapping is called the Cristoffel-Darboux kernel associated with µ.An important property of K t is to reproduce polynomials of degree at most t, that is: This is why K t is called a reproducing kernel, and R[x] t viewed as a finitedimensional vector subspace of the Hilbert space L 2 (B, µ), is called a Reproducing Kernel Hilbert Space (RKHS).For more details on the theory of orthogonal polynomials, the interested reader is referred to e.g.[2] and the many references therein.

Main result.
An observation.Let f ∈ R[x] and let t ≥ deg(f ) = d f be fixed.Let P(B) ⊂ M (B) + be the space of probability measures on B. Then ) dµ(y) , where the second equality follows from Fubini-Tonelli interchange theorem valid in this simple setting.In other words, we have proved the following: Lemma 2.2.Let B ⊂ R n be as in (1.1) and let µ be a finite Borel (reference) measure whose support is exactly B and with associated sequence of orthonormal polynomials (T α ) α∈N n .Let f * = min {f (x) : x ∈ B}.Then for every fixed t ≥ deg(f ): (2.3) where the infimum is over all polynomials σ ∈ R[x] t of the form: So solving (2.3) is equivalent to searching for a signed measure σ dµ with polynomial (signed) density σ ∈ R[x] t that satisfies (2.4)-(2.5).

2.4.
A hierarchy of relaxations of (2.3).In this section we show the SOS-hierarchy defined in (1.2) is the dual semidefinite program of a natural SDP-relaxation of (2.3).In fact the only difficult constraint in (2.3) is (2.5) which demands σ to admit a representing probability measure φ on B.
Let D t be the lower triangular matrix for the change of basis of R and denote D ′ t the transpose of D t .The matrix D t is nonsingular with positive diagonal.Then with σ = (σ α ) α∈N n 2t , (2.5) reads That is, y = (y α ) α∈N n 2t is required to be a moment sequence as it has a representing probability measure φ ∈ P(B).So in view of Theorem 2.1, the constraint (2.7) can be relaxed to σ = D t • y with y 0 = 1 and M t−d j (g j y) 0 , j = 0, . . ., m .
Therefore, consider the following relaxation of ( Lemma 2.3.Let B ⊂ R n be as in (1.1) and let µ be a finite Borel (reference) measure whose support is exactly B and with associated sequence of orthonormal polynomials (T α ) α∈N n .The semidefinite relaxation (2.8) of (2.3) reads: (2.9) inf y { f , y : y 0 = 1 ; M t−d j (g j y) 0 , j = 0, . . ., m } , which is the dual of (1.2) . Finally, as the T α 's form an orthonormal basis, the criterion to minimize in (2.8) reads: which yields that (2.8) is exactly (2.9).Next, that (2.9) is a dual of (1.2) is a standard result in polynomial optimization [3,5].
Of course by reverting the process of the above proof, the semidefinite program (2.9) can be transformed to (2.8) once a reference measure µ with support exactly B is defined with its associated orthonormal polynomials (T α ).Indeed, once µ and the T α 's are defined, one may use the change of basis matrix D in (2.6) to pass from (2.9) to (2.8).
Corollary 2.4.Let B ⊂ R n be as in (1.1) and let µ be a finite Borel (reference) measure whose support is exactly B and with associated sequence of orthonormal polynomials (T α ) α∈N n .Let f * be the global minimum of f on B.
Discussion.Observe that the formulation (2.8) does not require that the set B is a "simple" set as it is required in (1.3).Indeed the orthonormal polynomials (T α ) are only used to provide an interpretation of the hierarchy of lower bounds (2.9) (and its dual (1.2)).On the other hand, for the hierarchy of upper bounds (1.3), B indeed needs to be a "simple" set for computational purposes.This is because one needs the numerical value of the moments of µ for a practical implementation of (1.3).
Lemma 2.3 shows that the Moment-SOS hierarchy described in [3,5] amounts to compute a hierarchy of signed polynomial densities with respect to some reference measure µ with support exactly B. When the step-t relaxation is exact (which takes place generically [7]) the resulting optimal density σ in (2.8) is nothing less than the polynomial x → K t (ξ, x) where ξ is a global minimizer of f on B, K t (ξ, x) is the celebrated Cristoffel-Darboux kernel in approximation theory, and σ(ξ, ξ) is the reciprocal of the Christoffel function evaluated at a global minimizer ξ.

Conclusion
We have shown that the Moment-SOS hierarchy that provides an increasing sequence of lower bounds on the global minimum of a polynomial f on a compact set B, has a simple interpretation related to orthogonal polynomials associated with an arbitrary reference measure whose support is exactly B. This interpretation strongly relates polynomial optimization (here the Moment-SOS hierarchy) with the Christoffel-Darboux kernel and the Christoffel function, fundamental tools in the theory of orthogonal polynomials and the theory of of approximation.
It is another item in the list of previous contributions [6,1,9,10] that also link some issues in polynomial optimization with orthogonal polynomials associated with appropriate measures.We hope that such connections will stimulate even further investigations in this direction.