Provable Advantage in Quantum Phase Learning via Quantum Kernel Alphatron


 The use of quantum computation to speed-up machine learning algorithms is among the most exciting prospective applications in the NISQ era. Here, we focus on the quantum phase learning problem, which is crucially important in understanding many-particle quantum systems. We prove that, under widely believed complexity theory assumptions, quantum phase learning problem cannot be efficiently solved by machine learning algorithms using classical resources and classical data. Whereas using quantum data, we prove the universality of quantum kernel Alphatron in efficiently predicting quantum phases, indicating clear quantum advantages in such learning problems. We numerically benchmark the algorithm for a variety of problems, including recognizing symmetry-protected topological phases and symmetry-broken phases. Our results highlight the capability of quantum machine learning in efficient prediction of quantum phases of many-particle systems.

spaces, and give the definition of the quantum phase recognition learning problem. We give the hardness 48 results for classical learning algorithms in Sec. 2. Sec. 3 gives a quantum learning algorithm for quantum 49 phase recognition problem. Sec. 4 gives the numerical results for it, and we also provide the complexity class 50 of the QPL problem in Sec. 5. We give a discussion in Sec. 6. In this section, we review the definitions of supervised learning and kernel methods, and introduce the 53 quantum phase recognition problem. 54 Supervised learning with Quantum feature space 55 Here, we denote (a, b) (or (x, y)) as a pair of the datum a (x) and the corresponding label b (y) in the training set S (testing set T ). Generally, the task of supervised learning is to learn a label y of the testing datum x ∈ T ⊂ X from a distribution D(x) defined on the space X according to some decision rule h. The decision rule h is assigned by a selected machine learning model from the training set S = {(a i , b i )} N i=1 , where a i ∈ X follows distribution D(a i ), the label b i = h(a i ), and N is the size of the training set. Given the training set S, an efficient learner needs to generate a classifier h in poly(N ) time, with the goal of achieving low error or risk (1) Here, we assume that the datum x is sampled randomly according to D(x), in both training and testing 56 procedure, and the size N of the training set is polynomial in the data dimension.

57
The kernel method has played a crucial role in the development of supervised learning [44,45,46], 58 which provides an approach to increase the expressivity and trainability of the original training set. We can 59 describe a kernel function K : X × X → R as K(x, x ′ ) = Ψ(x) T Ψ(x ′ ), where Ψ : X → H is the feature map 60 which maps the datum x ∈ X to a higher-dimensional space H (feature space Here we leverage the quantum kernel as our kernel function, which is defined as Q(x, x ′ ) = | φ(x)|φ(x ′ ) | 2 , 65 where |φ(x) is a quantum state associated with x. 66 Quantum Phase Learning (QPL) problem 67 In this section, we introduce the quantum phase learning (QPL) problem, which is the key prerequisite to 68 investigate a large number of behaviours in many condensed-matter systems [18,19]. 69 Given an n-qubit Hamiltonian H(a) with interaction parameters a and an order parameter M ∈ C 2 n ×2 n , phases. In general, it would be hard to recognize quantum phases of an arbitrary many-body quantum 75 system, owing to the hardness of obtaining the ground state and the fact that the order parameter is 76 generally unknown. Nevertheless, there may also exist cases where the problem is exactly efficiently solvable 77 for very specific choices of parameters. Then, a natural question is, based on the solvable or known phases, 78 whether we could learn or predict quantum phases for other cases. Therefore, it is natural to consider the 79 learning version of the quantum phase recognition problem. 80 Definition 1.1 (QPL Problem). Given training data S = {(a i , b i )} N i=1 for which a i , b i indicate the classical coupling weight and phase value observed from the i-th experiment associated with Hamiltonian H (a i ), the target is to learn a prediction model h(a) to minimize the risk for some fixed distribution D(a) defined on the datum space X .

81
In the following, we consider solving the QPL problem using classical and quantum computing methods.

82
2 Classical hardness for QPL problem 83 In this section, we show the hardness for quantum phase computation and QPL problems using classical 84 computers.

85
Here we assume that the order parameter M is a general n-fold tensor product of local Pauli operators.

86
Therefore, the quantum phase computation is an instance of the mean-value problem which is the central ). There exists an n-qubit quantum circuit U such that the following task is # 93 P-hard: approximate | 0 n |U |0 n | 2 to additive error ǫ c /2 n with probability 3 4 + 1 poly(n) .

94
Here, a candidate of the worst-case U ∈ C 2 n ×2 n is a size m ≤ poly ( We provide detailed proof in the Appendix B. This lemma also serves for the hardness of the QPL prob-103 lem. Following the "worst-to-average-case" reduction [63], we can construct a testing set with associated feature states {|φ(x i ) } M i=1 (ground states). Given classical training data S (from classical 105 method), we prove that no classical learning algorithm can efficiently learn the hypothesis h * such that 106 R(h * (x)) is close to zero for x ∈ T , as shown in the following theorem.
the classical coupling weight and phase value associated with the Hamiltonian H(a i ), there exists a testing set be the training data set. Q : X × X → R be a quantum kernel with the kernel space H. Consider a strictly monotonic increasing regularisation function g : [0, ∞) → R, and regularised empirical risk Then any minimiser of the empirical riskR L (h * ) admits a representation of the form h * ( where α i ∈ R for all i ∈ {1, 2, ..., N }, x and a are drawn from the same distribution.  Here we give a learning approach for all of α i , which has a polynomial speedup to the above method.

154
In particular, we introduce the quantum kernel into the Alphatron algorithm [45], as depicted in Alg. 1. 155 We find that the quantum kernel perfectly fit into Alphatron algorithm and hence the QPL problem can be 156 solved in O(N 1.5 ) time if the kernel matrix Q is provided, as depicted in Theorem 3.

157
Theorem 3. Let quantum kernel Q(a i , is a strictly monotonic increasing regularisation function such that E[g 2 ] ≤ ε g and ij α i α j | φ (a i ) |φ (a j ) | 2 < B. Then for failure probability δ ∈ (0, 1), O N 5/2 copies of quantum states to estimate Q(a i , x), Alg. 1 outputs a hypothesisĥ * such that Algorithm 1: Quantum Kernel Alphatron , Hamiltonian H(a) with coupling weight a, parameterized quantum circuit U (θ), quantum circuit approximationQ(a i , x) for quantum kernel Q(a i , x), learning rate λ > 0, number of iterations T , testing data

Quantum Alphatron Coupling
Predict phase y( ) Figure 1: The procedure of the proposed quantum learning algorithm.
by the definition of w and |Ψ(a i ) . Therefore, E [b j |a j ] = w, Ψ(x) + g (a j ) and . Hence, this theorem followed by substituting the quantum kernel Q and feature map Ψ into Theorem 1 of  The main idea to boundR L (ĥ * ) in Theorem 3 is to prove that with polynomial copies of training states, we can bound where h t is the ideal hypothesis with exact quantum kernel function Q(·, ·). We defer to Appendix B.2 for 165 the tedious calculations.

166
The regularisation function g is used to avoid the overfitting problem, and a general selection is to bound 167 the norm of w , that is g(·) = 2L w 2 − G which is determined by all a i ∈ S and the positive parameter Here we test the capability of the quantum kernel Alphatron algorithm for several instances of QPL tasks.

173
Firstly, we consider a warm-up case that detects the appearance of the staggered magnetization for the S = 1 2 XXZ spin chain in the Ising limit [69]. The Hamiltonian is defined as where S α i is the α-component of the S = 1/2 spin operator at the i-th site, and g is the strength of the   Secondly, we consider a Z 2 ×Z 2 symmetry-protected topological (SPT) phase P which contains the S = 1 Haldane chain. The ground states {|Φ(h 1 /J, h 2 /J) } belongs to a family of Hamiltonians phase value on a (see yellow points in Fig. 3 (a)). Our target is to identify whether a given, unknown ground 187 state |Φ(x) belongs to P. The simulation results for n = 16 are illustrated as Fig. 3 (b), which shows that 188 Alg. 1 can reproduce the phase diagram with high accuracy on M = 4096 testing points.

189
Remarkably, since the training data is only on the line with h 2 = 0, we cannot classically learn the 190 quantum phase only from the relationship between the phases and the h 1 , h 2 parameters. However, the 191 quantum kernel Alphatron algorithm works using more information of quantum kernels. We thus demonstrate 192 that even if the training data are all from classically solvable cases, quantum kernel Alphatron still works. 193 We also note that the quantum convolution neural network (QCNN) method [20] has been proposed to solve  Finally, we consider the bond-alternating XXZ model where  where the colored shading background represents the phase classification results on a 16-qubit system. The 202 data in phase diagram P 3 is post-processed by the averaging scheme. Before concluding the results, we discuss more on the difference between "classical" and "quantum" learning.

205
In terms of the method that produces the training data and learning algorithm being classical or quantum, 206 we consider four categories, as shown in Table 1. We discuss the relationship between the four categories 207 respectively.

208
• For the QPL problem that satisfies Lemma 1, Theorem 1 indicates that this problem is outside the 209 "C-Learning Alg. + C-Data" class, while Theorem 3 shows that it belongs to the "Q-Learning Alg. 210 + Q-Data" class. These results thus imply "Q-Learning Alg. + Q-Data" is strictly stronger than "C-

211
Learning Alg. + C-Data" with suitable complexity assumptions. In Table 1

215
• Our simulation results also indicate that some QPL problems are classically hard, yet they could be 216 solved by a quantum learning algorithm with "C-Data". Therefore, "Q-Learning Alg. + C-Data" could 217 also be strictly stronger than "C-Learning Alg. + C-Data".

218
• Another interesting class is "C-Learning Alg. + Q-Data". Whether it is stronger than "Q-Learning Alg.

225
We summarize the complexity relationship of these four categories for general problems in Fig. 5. In this paper, we study the quantum phase learning (QPL) problem using classical and quantum approaches. 228 We prove that with a reasonable conjecture in Ref.
[63] that approximate | 0 n |U |0 n | 2 to additive error ǫ c /2 n 229 is # P -hard, joint with the assumption that PH does not collapse, it is computationally hard to learn QPL 230 problem with classical tractable training data set. On the other hand, we also prove that we can learn 231 QPL problem efficiently if we have a quantum computer. We propose an effective algorithm to illustrate this 232 quantum learning process. The quantum learning algorithm is a quantization of the Alphatron algorithm [45] 233 by leveraging of the quantum kernel method.

234
Numerically, we apply the quantum learning algorithm to solving QPL problems, and numerical experi-235 ments corroborate our theoretical results in a variety of scenarios, including symmetry-protected topological 236 phases and symmetry-broken phases. The numerical results show that our quantum kernel Alphatron algo-237 rithm has a good learning performance for quantum properties even with classical training data. 238 We leave an open problem on whether our quantum learning algorithm has a better performance with a 239 more complicated quantum neural network instead of the quantum kernel. Since our numerical results hint 240 the possibility to efficiently solve QPL problem, is there any rigorous proof to show that QPL problem belongs      We first review several lemmas and assumptions which are closely related to our proof.
for the value if the function f can be computed efficiently given x.
Proof of Lemma 1. For a ground state |φ = U |0 n of H(a) satisfies Conjecture 1, we can project it to 438 any computational basis |j with probability p(j) = | j|φ | 2 . The hiding argument shows that if one can 439 approximate the probability p(j), then one can approximate p(0 n ) = | 0 n |φ | 2 . Therefore Conjecture 1 440 suggests that approximating the p(j) to additive error 2 −poly(n) is # P-hard.

441
Here, we can construct a series of observable M enabling Lemma 1 holds. Let the observable set {M(s)|M(s) = Z s1 1 ⊗ Z s2 2 ⊗ · · · ⊗ Z sn n }, where Z k denotes Pauli-Z operator acts on the k-th qubit, and s = s 1 s 2 ...s n ∈ {0, 1} n . Then we have and o s /2 n is the Fourier transformation of p(j). Based on the algebra symmetry between p(j) and o s /2 n , we have This inequality is obtained by leveraging of and for some t * ≤ T = O (N/ log(1/δ)). Nevertheless, if the quantum kernel Q is approximated via performing 453 quantum circuits, Eq. (14) should be replaced with , andQ is the approximation of Q.

458
For convenience, in the later proof we require for all i, δ i Q =Q (a i , x) − Q (a i , x) are the same, and δ t αi =α t i − α t i are also the same, denoted them as δ Q , δ t α respectively. Since δ i Q are in the same order for i ∈ [N ] (δ t αi similarly), hence it is reasonable for the assumptions. We will have the same upper bound for R (h * ) without the assumptions and with a more tedious proof. Then for any i, we have . We can also obtain the value of −δ t−1 α by leveraging of Eq. (17) and the recurrence relationship. The following equations follows by subtracting −δ t−1 hence with the fact that 0 ≤Q i ≤ 1 and A k ≤ k−1 N , the absolute value of δ t α satisfies the inequality By the recurrence of |δ t α |, we have for large where the second inequality holds since |α t i | ≤ t−1 N .

459
Therefore, where the firstly inequality holds by the definition ofR (h t * ) andR ĥt * (Recall thatR (h t * ) = 1 The additive error ǫ Q for the quantum kernel Q (a i , x) can be bounded to O √ log(1/δ) N 5/4 with O N 5/2 461 copies of the quantum states. , where x i is sampled from D n using D and y i 469 indicates the corresponding label. If x i ∈ L, one have y i = 1, else y i = 0. Specifically, one require:

470
(1) The probabilistic Turing machine M processes all inputs x in polynomial time.
In the main text, the proof of Theorem 1 is established on the classical hardness for random circuit sampling 495 problem, and the feature states are thus generated by random circuit states. On other hand, the QPL 496 problem has a similar structure to the ground state problem.

497
In the field of quantum computation, the Variational Quantum Eigensolver (VQE) is a popular method in approximating the ground state of H(a) [73, 74, 75]. The key idea of VQE is that the parameterized quantum state |Ψ(θ) is prepared and measured on a quantum computer, and the classical optimizer updates the parameter θ according to the measurement information. The ground state can be obtained by minimizing the energy E(θ, a) = Ψ(θ)|H(a)|Ψ(θ) following the variational principle. Basically, the selection of the ansatz |Ψ(θ) is flexible, which includes the 498 unitary coupled cluster ansatz [76], alternating layered ansatz [77] and hardware efficient ansatz [78].

499
The hardware efficient ansatz is composed of single-and two-qubit gates in each repeated layer and it is experimental friendly on near term quantum devices. Following the notation in the Ref. [78], a D-depth hardware efficient ansatz is formalized as in which U d (θ d ) is the tensor product of n single-qubit rotations and W d is the entanglement gate. Generally, 500 the construction of U h (θ) promises it will be close to a random unitary with the increasing of D [79, 77].

501
From the above discussion, it is reasonable to assume that the QPL problem involves the quantum random 502 circuit, and this is consistent to the condition in Conjecture 1.

503
D Implementation of quantum kernel with SWAP test 504 By leveraging of Chernoff bound, the quantum kernel can be approximated by independently performing the Destructive-Swap-Test [67] to O(log(1/δ)/ǫ 2 Q ) copies of 2n-qubit state |φ (a i ) ⊗ |φ (x) , with additive error ǫ Q and failure probability δ. The expectation of the measurement results of the Destructive-Swap-Test is where SWAP|φ(a i ) ⊗ φ(x) = |φ(x) ⊗ φ(a i ) denotes the 2n-qubit swap operator. For QPL problem, 505 |φ (a i ) and |φ (x) can all be generated with polynomial-size circuit, hence the Destructive-Swap-Test can 506 be performed efficiently.