Variational Quantum Optimization with Multi-Basis Encodings

Despite extensive research efforts, few quantum algorithms for classical optimization demonstrate realizable quantum advantage. The utility of many quantum algorithms is limited by high requisite circuit depth and nonconvex optimization landscapes. We tackle these challenges by introducing a new variational quantum algorithm that benefits from two innovations: multi-basis graph encodings and nonlinear activation functions. Our technique results in increased optimization performance, a factor of two increase in effective quantum resources, and a quadratic reduction in measurement complexity. While the classical simulation of many qubits with traditional quantum formalism is impossible due to its exponential scaling, we mitigate this limitation with exact circuit representations using factorized tensor rings. In particular, the shallow circuits permitted by our technique, combined with efficient factorized tensor-based simulation, enable us to successfully optimize the MaxCut of the nonlocally connected $512$-vertex DIMACS library graphs on a single GPU. By improving the performance of quantum optimization algorithms while requiring fewer quantum resources and utilizing shallower, more error-resistant circuits, we offer tangible progress for variational quantum optimization.


I. INTRODUCTION
NP-hard optimization problems, such as Traveling Salesman and MaxCut, are central to a wide array of fields, such as operational research, engineering, and network design [1]. Despite the classical nature of these problems, there is immense interest in identifying variational quantum algorithms (VQAs) which can solve them faster or more precisely than any classical method, a concept known as quantum advantage [2][3][4][5].
One common approach is the variational quantum eigensolver (VQE), where energy minimization yields the ground state of a problem-encoded Hamiltonian through gradient descent update of the quantum circuit parameters [6][7][8]. The quantum approximate optimization algorithm (QAOA) is a related protocol in which unitary evolutions using both an initial and a problem encoded Hamiltonian are alternated in order to find a solution encoded ground state [9][10][11][12][13]. Novel VQA encoding strategies have also been considered in [14][15][16]. While the approximation ratio of VQE can surpass those of polynomial complexity classical algorithms (e.g., Goemans-Williamson [17][18][19]) [8], this guarantee requires between polynomially and exponentially many gates in the number of qubits n. Such circuit depths limit the algorithms' potential to demonstrate quantum advantage, rendering them not only computationally inefficient, but also highly susceptible to quantum noise [10,11,20] and barren plateaus [21][22][23][24][25][26]. Moreover, local VQAs, where quantum state update is limited to only explicitly connected degrees of freedom, have demonstrably poorer performance than classical methods on particularly challenging and large graph instances [27,28]. * taylorpatti@g.harvard.edu The difficulty of classically simulating large-scale quantum circuits is a central challenge to algorithm development. This is because the traditional mathematical formalism of quantum mechanics automatically represents the full Hilbert space and thus scales exponentially in the number of qubits n, with matrix operators of size 2 2n operating on state vectors of size 2 n . When a quantum system does not occupy the full Hilbert space, these intractable dimensions for quantum network simulation can be remediated by employing a factorized tensor formalism [29]. While many varieties of decomposed tensors exist, tensor rings have proven particularly popular in the quantum sciences due to their modularity and rank structure, which have close parallels to quantum entanglement. In the tensor ring formalism, both quantum states and quantum operators are represented in factorized form by matrix product states (MPS) and matrix product operators (MPOs), respectively [30][31][32]. However, tensor formalism is often unsuitable for high-depth and connectivity regimes, which are most commonly used in quantum optimization, since tensor rings quickly become prohibitively large (high-rank/bond-dimension) when simulating deep or complicated circuits [33]. Moreover, they are limited to only nearest-neighbor interactions.
Due in part to these limitations, no simulation of more than ∼ 100 qubits has demonstrated quantum optimization rivaling that of classical methods for nonlocally connected graph instances, even in [34] where exact representations of general tensor architectures with optimal contraction schemes are used. Other large-scale implementations have focused on more restrictive problems. For instance, QAOA MaxCut optimization with up to 210 qubits has been achieved for 3-regular graphs with nonlocal edges [35]. QAOA MaxCut optimization has also been implemented with several thousand qubits when exploring only local edges of nonlocally connected graphs, a method which did not yield high average per-(a) Multi-Basis Encoding (MBE) of a graph. An n-vertex graph (blue) is represented as an Ising model. We reassign n/2 vertices from σ z (blue) to σ x (red) operators, allowing us to map the graph to just n/2 qubits (here a nearest-neighbors connected, blue/red tensor ring). The MaxCut is obtained by optimizing this state via single-qubit measurements. Although only locally connected, tensor rings effectively solve the MaxCut graphs with highly nonlocal connections.
(b) Multi-Basis Encoding (MBE) with two distinct n-qubit graphs. Each graph is mapped to the classical Ising model, with G 0 (blue) encoded along the z-basis (as is traditional) and G 1 (red) utilizing the x-basis, resulting in an n-qubit quantum state (blue/red). This encoding is similar to MBE with a single graph, except that the x and z-bases independently encode two separate graphs and thus no cross-terms between the z and x-bases are required. formance [36]. Moreover, large-scale optimization on NPhard problems (e.g., MaxCut) have not been explored using VQE.
Quantum Computing Contribution -This manuscript introduces a novel method for quantum algorithms that not only outperforms traditional VQAs, it also requires fewer quantum resources and lower computational complexity. In particular: • We propose Multi-Basis Encodings (MBEs), a new quantum optimization algorithm that introduces additional constraints (regularization) that are beneficial to the algorithm's performance, reducing its susceptibility to local minima in the training landscape.
• By doubling the amount of optimization features encoded into a single qubit, MBEs halve the number of qubits required for a given optimization task, a valuable asset for a developing field which has invested millions of dollars and spent multiple decades to achieve ∼ 50-qubit registers and where additional coherence limitations emerge at scale [37]. Moreover, by utilizing single-qubit measurements, these algorithms yield up to a quadratic reduction in runtime.
• By combining our MBEs with non-linear activation functions and an exact factorized tensor network approach, we solve MaxCut graph optimization problems with nonlocal edges using shallow quantum circuits. Furthermore, sampling ∼ 5 initializations of our MBE experiments on shallow circuits (depth L = 7 for 100vertex graphs, such that L approximately logarithmic in the number of vertices) leads to optimal cut convergence with near unit probability. This shallow-circuit, multi-shot procedure is both more coherent and timeefficient than deterministic convergence with deep circuits, which require up to an exponential number of parameters.
Large-Scale Simulation Contribution -This work utilizes tensor networks, developing new software in order to simulate practical quantum algorithms at unprecedented scale. Specifically: • The strong performance of our MBE with relatively shallow circuits enables us to work with tensor networks with lower rank (bond dimension). As the rank of a tensor structure determines the time and memory complexity of its contraction, we can simulate highaccuracy implementations of MBE at large scales.
• We develop TensorLy-Quantum [38,39], a new software package for simulating efficient quantum circuits with decomposed tensors on CPU and GPU. TensorLy-Quantum is based on the TensorLy software family [40].
• Using TensorLy-Quantum on a single NVIDIA A100 GPU, we simulate solving a 512-vertex MaxCut problem using MBE, which demonstrates superior performance than comparable classical algorithms. This sets a new record for the large-scale simulation of a successful quantum optimization algorithm.
By introducing a new variety of algorithms that improve optimization performance, require fewer quantum resources, and operate on shallower, more error-resistant circuits, we offer tools to increase the utility of variational quantum algorithms.

A. MaxCut Optimization Problems
The Maximum Cut problem, most commonly referred to as MaxCut, is a partitioning problem on unidirected graphs G = (V, A), where V is a set of vertices (blue orbs FIG. 2: Overview of traditional MaxCut encoding and VQE using tensor ring factorizations, which are tensor train networks with periodic boundary conditions. (Left) A graph G with n vertices v i , v j and weights w ij is mapped into an n-qubit Hamiltonian H in MPO form. The MPS ground state |ψ g of H encodes the solution to MaxCut(G). (Right) To find MaxCut(G) variationally, the null input state |0 (an MPS) is evolved under a parameterized quantum circuit U (an MPO), producing an output state |ψ . U encodes a circuit of depth L (here L = 4, red box) in this manuscript's layer (block) pattern: one layer (block) of single-qubit y-axis rotations R y followed by a layer of control-Z gates which alternate between even and odd qubits. The energy expectation value L = E is minimized via gradient descent. The global minimum of L corresponds to |ψ = |ψ g .
in Fig. 2, left) connected by edges A (black lines connecting orbs) [41]. The objective is to optimally assign all vertices v i , v j ∈ {−1, 1}, so as to maximize the edge weights w ij ∈ A, where any such assignment is referred to as a "cut". In this work, we will consider a generalized form of the problem known as weighted MaxCut, in which w ij take arbitrary real values.
Two formulations of MaxCut exist: the NP-complete decision problem and the NP-hard optimization problem [42]. The former seeks to determine if a cut of size c or greater exists for a given graph G, whereas the latter attemps to identify the largest cut of G possible. We here focus on the more general optimization problem formulation, the ground truth of which we denote MaxCut(G). It is common practice to express the objective function in its binary quadratic form [41]:

B. VQE Framework and Tensor Network Formalism
To find the MaxCut of a given graph on a quantum computer, it is convenient to minimize the equivalent summation, j<i w ij v i v j . For a graph with n vertices v i , this reduces the problem to finding the n-qubit wavefunction |ψ that minimizes the energy expectation value E = ψ|H|ψ of the classical Ising Model Hamiltonian: H is obtained by substituting vertices v i for the Pauli-Z spin operators σ z i , as depicted in Fig. 2, and w zz ij = w ij is a relabeling to specify the zz-spin interactions. As H contains only terms in the z-basis, its eigenvectors are classical (zero-entanglement product states), such that |ψ i = s |s , where |s ∈ {|0 , |1 }. We here denote the lowest eigenvalue or "ground state" solution as |ψ g , the qubits of which form a bijection with the optimal v i of MaxCut(G). As Eq. 2 has Z 2 symmetry, |ψ g is degenerate with the state X ⊗n |ψ g . Fig. 2 (right) depicts the VQE framework [6][7][8]. Eq. 1 is optimized by defining the loss function L = E and varying the parametersθ of a quantum circuit with unitary U (θ), which acts on the input quantum state (Fig.  2, right). Without loss of generality, we define the input state as the n-qubit zero state |0 = n |0 , such that We decompose this unitary matrix U as Λ subunitaries U (θ) = Λ k U k (θ k ), whereθ k is the corresponding subset ofθ and U k (θ k ) = n j=1 exp(−iθ j W j )M k for generic Hermitian operators W j and unitary matrices M k . Thus, the gradient g l (Ô) = ∂ Ô ∂θ l of operatorÔ with respect to any parameter θ l ∈θ is where U L and U R are the compositions of unitaries U k with k ≥ l and k < l, respectively. Rather than using circuits with extensive connectivity, we instead focus on 1D tensor ring circuits of n qubits. In particular, tensor rings have periodic boundary conditions such that qubit n − 1 is connected to qubit 0. Such nearest-neighbor connectivity makes the circuit amenable to both nearterm quantum hardware [10,12] and simulation via decomposed tensors. We accomplish this simulation with TensorLy-Quantum [38,39]. A nascent and expanding software package, TensorLy-Quantum strives to leverage the structure of decomposed tensors in order to simulate quantum machine learning in the most efficient, nonapproximate manner possible. While tensor ring-based tensor networks are typically used for approximate inference and obtained by applying tensor decomposition to dense state vectors and operators, we build a lowrank but exact factorized representation of the simulated quantum circuits. When judiciously constructed, tensor simulations yield a low-rank quantum formalism that permits enormous compression of state and operator spaces. Although in the quantum sciences tensor methods are most frequently associated with state approximations and truncations, like the density matrix renormalization group [43], we here advocate for their use in exact quantum simulation. Similarly, due to their nearest-neighbor connectivity, tensor ring factorizations in quantum computing have traditionally been employed for locally connected optimization problems, such as 3regular MaxCut [44], however we here emphasize their utility for general purpose optimization tasks.
To analyze VQE with tensor formalism, the Hamiltonian of Eq. 2 is represented as an MPO H {β,γ} , with physical indices β and γ. The energy L = E is then calculated with a single large contraction (Fig. 2 where is an n-qubit MPS of m cores and is the corresponding MPO unitary. As we work in the absence of quantum noise, states |ψ display time-reversal symmetry and can be fully expressed with real numbers [45]. We thus restrict our rotations to those of the Pauli-Y generator σ y and implement a simple, repeating subunitary pattern of two layers, also known as blocks. The pattern is illustrated in Fig. 2 (right): a row of parameterized single-qubit rotations R y (θ) (W = σ y ) is followed by a row of control-z (CZ) gates, with the latter alternating control between even and odd qubits. As each single qubit rotation is a 2 × 2 dense matrix and each two-qubit control-z gate is a rank-2 MPO of two, eight-element cores, the memory requirements of the uncontracted circuit representation scale only linearly in both n and L, an exponential reduction in resources compared to circuits described in traditional quantum formalism. Likewise, a factorized representation of the input state |0 in tensor ring form requires exponentially fewer terms, as it is represented by a rank-n i=0 1 MPS with just n, two-element cores.

II. MULTI-BASIS ENCODING (MBE)
Intuition -Our MBE protocol uses a loss function which is inspired by, but not equivalent to, the longrange, ZX Hamiltonian The key difference between Eq. 6 and MBE is that MBE utilizes the product of single-qubit measurements and nonlinear activation functions to encode separate vertices into the z and x-bases (further explained in Eqs. 7 and 9). The utilization of two, rather than a single, quantum basis has proven useful in other quantum machine learning algorithms [46]. Algorithm -MBE for weighted graphs is depicted in Fig. 1a. An n-vertex graph G is expressed similarly to the Ising model Hamiltonian in Eq. 2, save that only the first ceil(n/2) vertices are mapped to the z-axis (blue), while the second floor(n/2) vertices are mapped to the x-axis (red), thus enabling n vertices to be encoded into only ceil(n/2) qubits. If n is odd, then the x-axis of the nth qubit is unneeded. It is absent from the loss function and can go unmeasured. In future work, more sophisticated vertex partitionings can be explored, such as mappings that reflect graph topology. MBE halves the number of qubits required for a given optimization, providing a meaningful decrease in quantum hardware overhead.
In order to optimize both axes as independent vertices, we must make several alterations to standard VQE. To begin, H zx itself is an unsuitable loss function, as the quantum ground state it encodes does not correspond to classical MaxCut of G. We instead focus on the products of single-qubit measurements σ x i and σ z i , such that σ x i and σ z i operators are simultaneously optimized. This yields the MBE loss function where tanh(x) is trivially implemented on the classical computer controlling gradient descent. For example, the four-vertex graph with four-qubit Ising model encoding H = ω 12 σ z 1 σ z 2 + ω 34 σ z 3 σ z 4 + ω 13 σ z 1 σ z 3 , would be optimized with the two-qubit MBE loss function We again emphasize that, as Eq. 7 is comprised of distinct Pauli strings that are independently measured on separate circuit preparations, the uncertainty principle is not violated for w zx ij with j = i. The projection of high-dimensional quantum data into a lower-dimensional representation has also been explored in [47,48]. The inclusion of the non-linear activation function tanh(x) disincentives the extremization of one basis at the expense of another, which could otherwise occur because the optimal values of both σ x i and σ z i cannot be linearly encoded by a single quantum state due to the normalization condition of the Bloch sphere of each qubit i where equality holds for real-valued pure states. As the gradient of tanh(x) reduces near the ±1 poles (inset Fig.  3a), full optimization of one axis at the expense of the other is discouraged and optimal cuts are deduced by a rounding procedure (detailed below), which assigns integer vertex values but does not affect parameter update or the normalization condition of Eq. 8. In this manner, MBE is a dual-axis quantum analog to linear programming relaxations [49]. Furthermore, the normalization constraint of Eq. 8 means that L MBE can only ever partially descend into local minima and is better equipped to escape their regions of attraction. The robustness of MBE against local minima can be understood through its use of global optimization [27,28], including the global optimization of single-qubit states and the dependence of the x-encoded vertex on a generally unconnected z-encoded vertex. Finally, we note that we have for simplicity neglected both external fields and ybasis interactions in Eq. 7, however the addition of ybasis terms could immediately be used to both improve the algorithm's performance, as well as to simultaneously optimize three (rather than two) graph vertices. As minimizing Eq. 7 under the constraints of Eq. 8 cannot yield classical solutions to Eq. 1, we define a rounding proceedure for the classification and scoring of a cut C for a graph G: where the classically implemented function R rounds the measured expectation values to ±1. We note that this scoring is our true, or computational MaxCut estimate, as it is the MaxCut assignement which results from projecting the qubit measurements of our quantum state  . We note the significantly increased performance for n = 8, 100 with MBE over VQE. While VQE with n = 512 was prohibitively memory inefficient to simulate for comparison, MBE with n = 512 outperforms VQE with n = 8, a system 1/64th of its size, as well as the leading single-shot classical algorithm (Table II)  L/E (b) Average cut C convergence (left) and raw loss function L (right) with both two-graph MBE (solid lines) and traditional VQE (dashed) for n = 8, 20, and 100. MBE improves calculated MaxCut convergence C, although its ability to satisfy by the two encoded Ising models is limited by the normalization condition of Eq. 8. This is remedied by the rounding proceedure of Eq. 13.    [50], with comparable parameters, for n = 512 vertex graphs. MBE produces improved solutions (higher cut C), both on average and in the most successful run.

III. RESULTS
In this section, we empirically validate our approach's performance by solving the MaxCut problem on a divese set of nonlocally connected graphs with up to 512 vertices. We first introduce the experimental settings and implementation details before presenting the results for two scenarios: i) using MBE to solve n-vertex MaxCut problems with only n/2 qubits, and ii) using MBE to encode two separate MaxCut graph instances in a single circuit. In addition to having an inherently lower quantum hardware overhead and measurement complexity, both implementations of MBE demonstrate superior optimization performance. Fig. 3a illustrates the average performance (ratio of cut obtained with largest known solution) of both MBE and VQE circuits for graphs of n = 8, 100 vertices and the MBE circuit alone for n = 512. The n = 512 graph with traditional VQE was too memory inefficient for evaluation on a single NVIDIA A100 GPU. The simulations were completed using TensorLy-Quantum, which runs on a PyTorch [51] backend and implements tensor contractions with Opt-Einsum [52]. The n = 8 instances are complete (all-to-all, n(n − 1)/2-edge) graphs for which we calculated the exact ground truth through brute force computation, the n = 100 graphs are the first three 0.9 density weighted (4455-edge) MaxCut graphs (cataloged as the w09-100 instances) from the extensively studied Biq Mac library [53], and the n = 512 graph is the pm3-8-50 instance of the DIMACS library [54]. While the pm3-8-50 graph is relatively sparse (1536 edges), it is nonlocally connected. Like other recent works [22,55], we implement simple entanglement-based pre-training prior to the MBE algorithm (details in the Supplementary Information [56]). Shallow circuits of depth L = 7 (n = 8 and n = 100 graphs) and L = 13 (n = 512 graph) are selected in order to adopt a protocol suitable for near-term quantum devices, however the performance of the larger graphs (n = 100, 512) increases with moderately deeper circuits.
MBE consistently demonstrates a 5%-7% average per-formance increase across all n, as seen in Fig. 3a. We emphasize that not only is the MBE algorithm more accurate than traditional VQE, it simultaneously solves MaxCut(G) with half the required qubits and parameters, as summarized in Table I. As quantum state space scales exponentially in n, this factor of two reduction in required qubits remains significant for quantum computing at scale. Even with very shallow circuit-depth (L increasing only sublogarithmically in n compared to the 100-vertex BiqMac graphs), MBE outperforms the leading single-shot classical algorithm (Table II) for the 512-vertex DIMACS graph, achieving an average cut of ∼ 95% of the largest known solution [57]. MBE also outperforms the classical algorithm in terms of the largest cut obtained for any given run, with ∼ 98% accuracy from just thirty total runs compared to ∼ 97% accuracy from one-hundred total runs. These performance increases would be even greater for deeper circuits, however our current contraction algorithm yields a maximum MBE circuit depth of L = 13 for 512-vertex graphs on a single GPU. As the simulation of these networks are ultimately memory-bound, with memory requirements growing exponentially with circuit-depth, effective implementations of the algorithm are not classically tractable atscale. The simulation of deeper circuits could be provided by tensor contraction backends with improved memory management, such as the cuTensor library, while implementations of this scale on quantum hardware is consistent with the projections for moderate-term quantum devices. Although computational benchmarking for optimization problems has been demonstrated for thousands of qubits [36], to our knowledge, MBE with n = 512 is the largest simulation of successful quantum optimization algorithms on nonlocally connected graphs yet conducted.
MBE's improved performance on optimization problems is due to the two-axis constraint on each qubit, which only permits convergence to local minima that are bistable points for both the z and x-axes. This is in contrast with the monostable condition of traditional VQE. Convergence to a local minima with bistability requires the concurrence of a zero gradient for both independently parametrized axes at a single, non-optimal point in parameter space. As L MBE is best extremized by larger σ ζ , the circuit will tend towards satisfying the equality in Eq. 8. As this corresponds to entanglement-free qubits, there is a systematic disentanglement of the circuit into product states throughout training (Fig. 3a, right). To understand this process, note that for the general wavefunction |φ = α|0 i 0 r + β|0 i 1 r + γ|1 i 0 r + δ|1 i 1 r describing any two qubits i and r, the lefthand side of Eq. 8 for qubit i can be written as In this form, we note that Eq. 8 is maximized when the concurrence (entanglement [58,59]) is minimized and vice versa, driving the wave function towards product states as training progresses. Once disentanglement nears completion, the equality in Eq. 8 begins to hold and for any θ t and qubit i, such that where g t are the gradients as given by Eq. 4. As σ ζ q = 0 is unfavorable for the optimization of L MBE , both axes of each qubit i must be bistable with respect to each angle θ t in order for update of that parameter to halt.
In this manner, MBE is a sort of quantum analog to alternating minimization in classical algorithms [60], but which uses both quantum superposition and classical nonlinearity to minimize two cost functions simultaneously, rather than one sequentially. Alternating minimization has also proven useful in QAOA protocols [15,[61][62][63], as has other perturbations, such as filtered measurements [64]. Because L MBE is calculated from singlequbit measurements, it is a form of measurement-based quantum computation [65][66][67]. Moreover, as the number of possible single-qubit measurements scales linearly with circuit width, L MBE represents up to a quadratic reduction in the number of observables required to solve complete graphs from ∼ n 2 (specifically n(n − 1)/2 twooperator Pauli strings) to ∼ 2n (two single-qubit measurements per qubit), lowering the measurement complexity and runtime of the algorithm on real quantum hardware [68,69].
MBE can also encode two distinct n-vertex graphs into a single register of n-qubits and solve their two MaxCuts in parallel. This is equivalent to the simplified case of w zx ij = 0 ∀i, j ≤ n in Eqs. 7 and 9 using n qubits, yielding 1 The average performance of MBE for solving two nvertex graphs in parallel vs that of traditional VQE with a single graph is displayed in Fig. 3b for graphs of n = 8, 20, and 100 vertices with L = 7. For n = 8 and n = 20, we generate exact solutions to complete (allto-all) graphs through brute force computation, whereas the n = 100 graphs are again the first three 0.9 density weighted MaxCut graphs from the Biq Mac library [53]. While for this fixed L, both VQE and two-graph MBE suffer decreasing performance with increasing n, two-graph MBE consistently demonstrates a 5%-7% average performance increase across n. We again note that the performance for large-n graphs increases with greater L. Finally, we emphasize that not only is the MBE algorithm more accurate than traditional VQE, it simultaneously solves MaxCut(G) for two graphs G, rather than only one as with traditional VQE.
Although much emphasis is placed on the development of quantum algorithms that deterministically obtain optimal cuts, studies have indicated that this requires up to an exponential number of parameters with traditional VQE [8]. This is an unfeasible quantity, reaching ∼ 2 99 (∼ 2 511 ) parameters for the n = 100 (n = 512) graphs considered here. Conversely, the cumulative effects of probabilistic sampling (that is, running the randomly initialized circuit multiple times) lead to high-confidence convergence with markedly few repetitions r. In what follows, we reason that a probabilistic sampling of various shallow MBE circuit initializations is a more efficient alternative. As larger values of C are a direct certificate of superior optimization, there should be no preference for less efficient single-shot techniques. Furthermore, shallow implementations are particularly important for nearterm quantum devices, which are prohibitively susceptible to noise at even moderate circuit-depth. Fig. 4a displays the probability that an optimal cut, which we define as C > T = 0.97 × MaxCut(G), will be found for n = 100 graphs with both MBE and VQE. For depth L = 7, MBE produces an optimal cut with upwards of 50% probability for both the single-graph (n vertices in n/2 qubits, Fig. 4a left) and double-graph (two n vertex graphs in n qubits, Fig. 4a right) protocols. In contrast, traditional VQE with L = 7 produces optimal cuts with just 12.5% probability. Furthermore, the likelihood of obtaining an optimal cut with MBE increases considerably with moderate circuit depth, rising to approximately 80% for L = 13 (left). We note that L = 1 circuits (right) obtain optimal cuts with probability 0.36, tripling the convergence rate of standard VQE with 1/7th the resources. As circuits with L = 1 are comprised of only local rotations without control gates, the totality of the performance is due to mutual constraints on multi-basis superpositions, and not due to quantum entanglement. Like other entanglement-free formulations [70][71][72], this renders the circuit efficient for classical simulation and indicates that algorithms for simulated superposition with multi-basis constraints may hold promise as "quantum inspired" classical algorithms. However, we note that quantum implementations are still of interest, because other entanglement-free relaxations are known to suffer decreased performance with increasing circuit width n [8]. Furthermore, MBE with even modest entanglement and circuit-depth markedly increases the probability of optimal convergence. Fig. 4b (left) shows the probability of obtaining at least one optimal cut for n = 100 graphs with L = 7 and r = 5, which nears 97% in fewer than 100 training steps for two-graph MBE circuits. For r = 10, convergence is greater than 99.9% and the 4nr = 4000 parameters utilized for ten repetitions still pale in comparison to the exponentially many required by deep-circuit techniques. As traditional VQE with L = 7 and n = 100 produces optimal cuts only 12.5% of the time, MBE is four times more effective than VQE for probabilistic optimization.
MBE also offers superior performance over traditional VQE in terms of the diversity of tenable graphs (Fig.  4b, right). For r = 10, not only does two-graph MBE find optimal solutions for all of the complete n = 20 graphs tested (compared to 90% for VQE), its parallel implementation doubles the number of MaxCut instances optimized.
Simulation Considerations -Numerically, L MBE is more compact for large or dense graphs, where the MPO H quickly becomes cumbersome. However, for the singlequbit measurements required for L MBE , contraction with a simple, single-qubit operator needs to occur n times. In order to efficiently compute n single-qubit measurements on large, exact tensor networks without either reconstructing an exponentially large (2 n/2 ) space or contracting over the full network ∼ n times, we use an efficient partial trace-based contraction scheme in which we construct k distinct reduced density matrix operators where K is the kth set of kept indices. K should be sufficiently small so that the 2 |K| elements of ρ k remain numerically tractable. For each ρ k , |K| smaller partial traces are done to isolate single-qubit density matrices ρ q , with which we take the single-qubit expectation values of Eq. 12 σ ζ q = Tr σ ζ q ρ q , where ζ = z, x.

IV. DISCUSSION
In this manuscript, we introduced Multi-Basis Encoding (MBE), a novel technique for quantum optimiza-tion algorithms. MBE's performance on a diverse set of graphs exceeds that of traditional VQAs. MBE also provides meaningful efficiency improvements over similar VQAs, potentially closing the gap between near-term implementations and quantum advantage by reducing the overhead of quantum algorithms. These efficiency improvements include up to a quadratic reduction in circuit measurements, as well as a factor of two decrease in required qubits, which can readily be extended to a factor of three with the inclusion of the y-basis. While simulated using classically tractable ansatze, the performance of our algorithm benefits from increased circuitdepth. As the classical simulation complexity increases exponentially in circuit-depth, this indicates that MBEs may enjoy meaningful quantum advantages at-scale. Furthermore, when we extend our definition of accuracy to encompass probabilistic sampling of various circuit initializations, we find that remarkably few quantum resources are requisite for classical optimization problems.
MBE can be expanded to a broad framework of multiaxis qubit encodings, which would include any nonlinear quantum loss function that permits the optimization of multiple, mutually regularizing observables on a single qubit. These findings are likely to spur additional research in efficient qubit encodings and the application of our techniques to related algorithms. These include algorithms with high circuit-depth or high circuitconnectivity, which are intractable on classical hardware and thus represent clear opportunities for quantum advantage. Since deeper circuits are attainable with more efficient tensor contraction methods or distributed computing efforts, this work encourages further development of large-scale quantum simulation with tensor methods. Most critically, as these simulations are ultimately memory-bound, the implementation of MBE atscale constitutes a strong and novel candidate for quantum advantage.
We also leverage the powerful tensor techniques packaged in TensorLy-Quantum to complete large-scale simulations of effective optimization algorithms on a single, consumer-grade GPU. To our knowledge, we have produced the largest to-date simulation of a quantum algorithm for a nonlocally connected optimization problem that rivals classical performance.
Such a successful and large-scale implementation demonstrates that simple and low-rank tensor representations are sufficient to model various techniques in quantum machine learning, and to do so without truncation or approximation. Finally, through the use of large-scale nonlocally connected graphs, we demonstrate that the global qubit connectivity and high entanglement capacity lacked by both the MPS formalism and linearly connected near-term quantum devices do not preclude quantum optimization routines.