Improving quantum-to-classical data decoding using optimized quantum wavelet transform

One of the challenges facing current noisy-intermediate-scale-quantum devices is achieving efficient quantum circuit measurement or readout. The process of extracting classical data from the quantum domain, termed in this work as quantum-to-classical (Q2C) data decoding, generally incurs significant overhead, since the quantum circuit needs to be sampled repeatedly to obtain useful data readout. In this paper, we propose and evaluate time-efficient and depth-optimized Q2C methods based on the multidimensional, multilevel-decomposable, quantum wavelet transform (QWT) whose packet and pyramidal forms are leveraged and optimized. We also propose a zero-depth technique that uses selective placement of measurement gates to perform the QWT operation. To demonstrate their efficiency, the proposed techniques are quantitatively evaluated in terms of their temporal complexity (circuit depth and execution time), spatial complexity (total gate count), and accuracy (fidelity/similarity) in comparison to existing Q2C techniques. Experimental evaluations of the proposed Q2C methods are performed on a 27-qubit state-of-the-art quantum computing device from IBM Quantum using real high-resolution multispectral images. The proposed QHT-based Q2C method achieved up to 15×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$15\times$$\end{document} higher space efficiency than the QFT-based Q2C method, while the proposed zero-depth method achieved up to 14% and 78% improvements in execution time compared to conventional Q2C and QFT-based Q2C, respectively.


Introduction
Quantum computers can take advantage of unique quantum mechanical properties, i.e., superposition and entanglement, to achieve speedup in computation [1] over classical computers for specific problems such as large integer factorization and unstructured database search [2,3].Nevertheless, existing noisy intermediate-scale quantum (NISQ) devices have limited practical applications [4] due to critical challenges [5], such as decoding meaningful classical data from the quantum domain.For example, in applications like quantum image processing, where information is usually encoded as quantum state amplitudes [6], repeated sampling of the quantum circuit is required to generate a probability distribution from which the processed image data can be recovered [7].The process of obtaining data from the quantum domain, henceforth called quantum-to-classical (Q2C) data decoding, introduces significant overhead in the circuit execution time, necessitating further investigation of time-efficient data decoding methods.
The main contributions of this paper are summarized as follows: • We propose and evaluate techniques for efficient Q2C data decoding based on the multidimensional, multilevel-decomposable quantum wavelet transform (QWT) [8][9][10][11].• We investigate and optimize the quantum Haar transform (QHT) for performing multidimensional and multilevel decomposition in either packet or pyramidal form.Multilevel-decomposable QHT has been proven to be effective for reducing the dimensionality of high-resolution spatio-spectral data while maintaining spatial and temporal locality [12].It is also reported that sampling a lowerdimensional space reduces execution time, thus improving the Q2C decoding process [8].By applying QHT to the output of a quantum circuit, we show that the resulting quantum state can be represented with fewer qubits, reducing dimensionality from a higher-dimensional space to a lower-dimensional space.• We also present the quantum circuits and accompanying circuit depth analysis corresponding to the proposed QHT-based approach, demonstrating its space and time efficiency.• We introduce a highly depth-optimized technique, called 'measurementbased' QHT decomposition, which eliminates the need for a supplementary quantum circuit.In this approach, the measurement of a select subset of qubits allows us to sample the representative output data in a lower-dimensional space.• We evaluate the proposed quantum methods and circuits for Q2C on the Qiskit SDK from IBM Quantum [13] using their general-purpose Aer simulator and ibmq_toronto quantum device.By experimentally determining circuit depth, calculating data correlation, and measuring execution time, a quantitative comparison of the proposed Q2C methods with state-of-the-art techniques is presented.Additionally, the proposed Q2C methods are compared with a reported Q2C readout technique based on the quantum Fourier transform (QFT) [14].The experimental results show that our proposed methods are more time and space efficient compared to existing methods.
The rest of the paper is organized as follows: Section 2 discusses background concepts and related work.Section 3 presents the proposed method and quantum circuits.Section 4 shows the experimental work and results with accompanying analysis.Finally, Sect. 5 concludes our work and discusses potential future work.

Background and related work
In this section, we discuss basic quantum concepts in addition to the fundamental quantum gates used for Q2C data decoding.Related work will also be discussed.
In this paper, we will utilize the following mathematical notation to describe leveraged quantum concepts.An n-qubit quantum state � n ⟩ can be represented by a normalized statevector of N = 2 n complex state amplitudes/coefficients c j ∈ ℂ where 0 ≤ j < N , as shown in (1).When encoding d-dimensional data of a total size N data points, each dimension i requires n i = ⌈log 2 N i ⌉ qubits to encode N i data points of dimension i, where 0 ≤ i < d.

Quantum gates
This subsection details the function, matrix representation, and gate representation for the various quantum gates that are used in our proposed circuits.

Hadamard gate
The Hadamard gate [14] is a single-qubit gate, as described by (2), that can be used to create a superposition of the �0⟩ and �1⟩ basis states. (1) Improving quantum-to-classical data decoding using optimized…

SWAP gate
The SWAP gate is a two-qubit quantum gate, as described by (3), that exchanges the states of the two input qubits, e.g., applying the SWAP operation on the �q 1 q 0 ⟩ state would result in the state �q 0 q 1 ⟩.

Quantum rotate-left (RoL) and rotate-right (RoR) operations
We define the Rotate-Left (RoL) and Rotate-Right (RoR) gates as specialized permutation operations that perform a cyclic rotation, i.e., perfect-shuffle, of the input qubits, as shown in Fig. 1.Each gate can be constructed of SWAP gates, where a perfect-shuffle operation over n qubits necessitates n − 1 SWAP gates in series, see Fig. 1.

Measurement gate
Measuring (observing) qubits is a non-unitary (irreversible) operation.A measurement (readout) gate is a single-qubit operation that observes a quantum state � 1 ⟩ with respect to a computational basis [14] and assigns the corresponding single value to a classical register.In other words, a measurement gate projects the quantum state to one of its basis states, i.e., �0⟩ or �1⟩ for a single-qubit state, with a probability equal to the square of the magnitude of the basis state coefficient, i.e., p 0 = |c 0 | 2 , and In general, when all qubits of a quantum state � n ⟩ in an n-qubit quantum cir- cuit are fully measured, the probability of finding the qubits in a given state �j⟩ is given by |c j | 2 , and the full-measurement probability distribution of finding the qubits in all possible states can be expressed as P( n ) , see (5a) and Fig. 2a.When excluding a partial subset m qubits of the n qubits from measurements, the partialmeasurement probability distribution can be expressed as a conditional probability P( n−m | q m−1 ...q 0 ) , where each qubit of the unmeasured m qubits could arbitrarily be in either a 0 or 1 state.In other words, �q m−1 ...q 0 ⟩ could be in any one of the possi- ble 2 m states.Equation (5b) and Fig. 2b show an example of one qubit, i.e., the least- significant qubit q 0 , being excluded from the partial-measurement of the remaining n − 1 qubits.It is worth mentioning that for every m qubits that are excluded from the partial-measurements, the number of measured basis states and consequently the size of the partial-measurement probability distribution is reduced by a factor of 2 m , i.e., being equal to N∕2 m = 2 (n−m) = 2 k where k = n − m is the number of measured qubits, see (5b) and Fig. 2b.

Circuit depth
The depth of a quantum circuit is calculated from the critical path that has the largest propagation delay accumulated from cascaded gates through the circuit [15].Quantum circuits also accumulate gate errors throughout their runtime [16] which compound with deeper circuits.Therefore, circuit depth determines the total execution time of the quantum circuit on a physical device and is often used as a metric for quantitatively evaluating the speed and performance of quantum circuits.In addition, it could be utilized as a useful indication for the quality of results (fidelity) of quantum circuits.Therefore, minimizing/optimizing circuit depth would result in performance and fidelity improvements [16].However, the magnitude of gate delay and error vary depending on the type of gate operation, e.g., H, SWAP, etc.Thus, without considering those differences, depth alone can only provide a speculative analysis of a circuit's execution time and result fidelity.In a previous work [11], we described how to use circuit depth analysis to calculate the expected execution time on a physical quantum device.In this work, we extend our analysis to further optimize the depth of the proposed circuits where different operations are executed in parallel on the same circuit layer.

Wavelet transforms
In the classical domain, a wavelet transform decomposes signals/data into its spatiotemporal spectral components using non-sinusoidal functions called mother wavelets [17].Unlike other transforms, the wavelet transform preserves the spatial locality of the data, i.e., the transformed data provides time and frequency information about the input data.Wavelet transforms in general perform computationally better than other transforms [17] and, therefore, are popular in image processing applications.The general expression for a continuous wavelet transform is given by (6), where Ψ is the mother wavelet function in complex conjugate form, and a, b are the time dilation and displacement factors. (5b) , where Wavelet transforms can be of continuous and/or discrete forms.For the purposes of this paper, we will discuss the discrete wavelet transform (DWT), specifically the Haar transform, which is the first and simplest DWT.Haar transform utilizes a mother wavelet that can be constructed using a basic unit step function u(t) [11].
The Haar transform can be performed in either packet decomposition or pyramidal decomposition form, differentiated by how multiple levels of decomposition are performed.After the initial level of decomposition, packet decomposition performs subsequent levels of decomposition on both the low-frequency and high-frequency components, while pyramidal decomposition restricts further decomposition to only the low-frequency components [17,18].The Haar transform can be applied to perform dimension reduction, as shown in Fig. 3, by separating multidimensional data into its low-frequency and high-frequency components [17,19].The isolated low-frequency terms are usually used to represent a compressed/decomposed output where the size of each dimension of the output data is reduced by a factor of 2 , where is the number of decomposition levels [19].If the high-frequency terms are preserved, a complete reconstruction of the original input can be accomplished via the inverse operation, see Fig. 3. Similar to the classical Haar transform, quantum circuits can be developed to perform the so-called quantum Haar transform (QHT) [9][10][11].For QHT circuits, the input data samples are generally encoded as the amplitudes of a superimposed input n-qubit quantum state � n ⟩ , as shown in (7a).The Haar function is then applied on the state amplitudes, resulting in the state � n ⟩ QHT represented by (7b), where Ψ D is the discrete Improving quantum-to-classical data decoding using optimized… Haar mother wavelet [11], Δt is the sampling period, K is the Haar window size in sam- ples, and N is the number of data samples.The specific quantum circuits are discussed in further detail in Sect.3.2.

Related work
Conventional quantum-to-classical (Q2C) data decoding for a given quantum circuit, as shown in Fig. 4, obtains the complete quantum state of a circuit by performing repeated circuit sampling, also known as 'shots'.The measurements are used to construct a probability distribution of the possible discrete basis states, where the normalized frequency of measurements represent the square of the magnitudes of the output quantum state coefficients.The number of repeated measurements correlates with the accuracy of the data relative to the expected output quantum state.Generally, a large number of repeated circuit sampling is required to improve the accuracy of measurements and minimize the effects of statistical noise, which adds a significant overhead to the total circuit execution time.
To minimize the overhead of repeated circuit sampling, algorithms can be appended to a circuit immediately prior to measurement, which typically will attempt to decrease either the number of measured qubits or the number of required shots to sample the quantum state.In [14], the authors proposed a Q2C data decoding technique leveraging the quantum Fourier transform (QFT) algorithm to sample the quantum circuit output in the Fourier basis and extract a collective property of the amplitude data, see Fig. 5.The QFT-based technique uses fewer circuit samples than the conventional approach, since a comprehensive probability distribution is not reconstructed but only the Fourier basis states are measured.Data decoding using QFT is particularly relevant for image or audio processing applications, where spectral bandwidth, as an example of a collective property, is useful for analyzing the output data [14].However, a drawback of the technique is that it does not decode the actual data from its quantum state and only reveals the sought collective property or feature of data.Moreover, the complexity and poor parallelism of the QFT algorithm also results in deep circuits and large overall timing overhead in the circuit.
In our previous work [8], we introduced packet and pyramidal decomposable quantum Haar transform (QHT) circuits for performing quantum-to-classical (Q2C) data decoding.By applying multilevel-decomposable QHT, data represented by n qubits can be transformed to a form represented by a fewer number of qubits k = n − ( ⋅ d) , where 0 ≤ k ≤ n , 0 ≤ ≤ (n∕d) is the number of decompo- sition levels, and d ≥ 1 is the dimensionality of the data.For example, d = 1 for 1-D data of N 0 data points, d = 2 for 2-D data of (N 0 × N 1 ) data points, and d = 3 for 3-D data of (N 0 × N 1 × N 2 ) data points, etc.In this work, we extend and opti- mize the packet and pyramidal circuits and propose a new measurement-based decomposable QHT technique of zero gate depth.We also present comprehensive experimental evaluations of all proposed quantum circuits using real, high-resolution RGB images.In addition, we apply multilevel inverse QHT to reconstruct the decomposed data and evaluate the result fidelity of the Q2C methods in terms of similarity metrics such as data correlation.

Proposed methodology and circuits
This section outlines our proposed and optimized QHT-based methods and circuits for data decoding in context of the general Q2C approach discussed previously.We first describe the basic QHT circuit for single-level, d-dimensional decomposition.Following that, we present three methods that extend the single-level operation over multiple decomposition levels and discuss their corresponding quantum circuits.

3
Improving quantum-to-classical data decoding using optimized…

Methodology
The quantum Haar transform (QHT) provides a number of benefits for our quantum-to-classical (Q2C) data decoding method.More specifically, QHT preserves the spatial and temporal locality of data such that the decomposed data possesses a spatial and temporal resemblance to the original data [17].Additionally, QHT is generalizable for multidimensional data, decomposable for multiple levels, and can be implemented with relatively shallow and parallel circuits.
By leveraging multidimensional multilevel-decomposable QHT, we can inherently perform dimension reduction (decompression) of data while preserving its general spatial and temporal characteristics.In other words, QHT allows us to decode data at a decreased qubit cost/count from n qubits to k = n − ( ⋅ d) qubits, where 0 ≤ k ≤ n , 0 ≤ ≤ (n∕d) is the number of decomposition levels, and d ≥ 1 is the dimensionality of the data.Reducing the number of qubits used in data representation will subsequently reduce the measurement and data decoding time.The proposed methodology for QHT-based Q2C data decoding is shown in Fig. 6.

Proposed quantum circuits
The QHT algorithm can be represented by a generalized d-dimensional operation denoted as U d−D−QHT henceforth, as depicted in Fig. 7.When encoding multidimen- sional data as the state amplitudes, a contiguous subset of n i qubits is used to rep- resent the ith dimension of data, where 0 ≤ i < d .As shown in Fig. 7, U d−D−QHT performs a single level of decomposition over all d dimensions in parallel.It applies a Hadamard (H) gate at the least-significant qubit of every dimension to extract both the low-frequency (slow-changing) and high-frequency (fast-changing) components of the input data followed by a RoR (perfect-shuffle) operation to spatially separate the low-frequency components from the high-frequency components [8].It is worth mentioning that the low-frequency components constitute a compressed and an approximate version of the original data represented at a lower-resolution, i.e., using less number of data samples.To decode both the low-frequency and highfrequency components of data, all n qubits must be fully measured.However, the low-frequency components are usually desired and it is sufficient to partially measure only the n i − 1 least significant qubits for each dimension, which now contain the low-frequency components after the perfect-shuffle operation, see Fig. 7.
As shown in Fig. 7, every contiguous n i qubits, that are used for encoding the i th data dimension, contain one H gate followed in series by n i − 1 SWAP gates that perform the RoR gate.Therefore, the depth of the U d−D−QHT operation can be determined by the depth of the critical path across all dimensions, as shown in (8a).The execution time t of the U d−D−QHT operation on a physical quantum hardware can be estimated using the gate delays H and SWAP of the H and SWAP gates, respectively, as expressed by (8b).As a metric for space complexity (cost) of our proposed circuits, the total gate count can be derived from Fig. 7 and expressed as shown in (9).
Improving quantum-to-classical data decoding using optimized… It is useful to determine the maximum number of levels max of lossless decomposition.Assuming that decomposition is symmetrically performed on all data dimensions, max is bound by the number of qubits n min that are used to encode the data dimension of the least amount of data samples, see (10).

Interleaved packet decomposition
The multilevel packet decomposition variant of QHT repeatedly applies the U d−D−QHT operation over all qubits for each level of decomposition, as shown in Fig. 8. Here, we leverage and extend our previous work [8,11] where we presented equations for deriving the circuit depth and the hardware execution time (9) of the packet decomposition circuit when the U d−D−QHT operations are applied in series.However, it is possible to further minimize the circuit depth by interleaving the U d−D−QHT operations, i.e., overlapping the H and SWAP gates among multiple decomposition levels, which enables concurrent execution of these gates resulting in reduced overall circuit depth.The optimized circuit for packet decomposition incurs only two additional layers of SWAP gates for every additional interleaved level of decomposition, which is reflected in the expressions of (11a) and (11b) for the circuit depth and execution time, respectively.The total gate count pkt for the multilevel packet decomposition QHT circuit is derived from Eq. ( 9) and Fig. 8 to be given by the expression in (12).

Interleaved pyramidal decomposition
In pyramidal decomposition, U d−D−QHT is applied on d fewer data qubits (1 qubit per each dimension) for every level of decomposition, as shown in Fig. 9a.While reducing the size of U d−D−QHT would present tangible benefits to overall circuit size and depth compared to packet decomposition, additional interlevel permutations are required to preserve data locality among the different levels of decomposition, see Fig. 9b.
Similar to packet decomposition as discussed in Sect.3.2.1,we could interleave (overlap) the operations of pyramidal decomposition.When interleaved, the second level of decomposition, i.e., = 2 , adds n − n max − d + 2 additional gate layers to the depth of the first level of decomposition that is comprised of the U d−D−QHT operation and the first set of interlevel permutations.Each follow- ing level of decomposition, i.e., > 2 , adds an additional d gate layers to the overall circuit depth.It is worth mentioning that for one level of decomposition, i.e., = 1 , the circuit for pyramidal decomposition is identical to the circuit for packet decomposition.In other words, the circuit depth pyr = pkt = n max when = 1 , see (11a).Accordingly, the total depth of the interleaved pyramidal QHT decomposition could be expressed by (13a), and consequently, the execution time is given by (13b).
Improving quantum-to-classical data decoding using optimized…

Fig. 9 -level, d-dimensional pyramidal decomposition
The total gate count pyr for the multilevel pyramidal decomposition circuit is derived from Fig. 9 and is given by the expression in (14a), where n 0 is the number of qubits required to represent the first dimension.While the pyramidal structure reduces the gate count needed for the packet decomposition circuit by a factor of

2
, as shown in Fig. 9a, it requires additional gates for interlevel permutations as shown in Fig. 9b and expressed by (14b).

Measurement-based decomposition
The packet and pyramidal circuits are well-optimized for performing a generalized QHT operation: decomposing and spatially separating low-frequency and high-frequency components of multidimensional data as an inherent quantum operation.In the broader context of QHT-based Q2C data decoding, however, additional optimizations are also feasible, and hence, we propose our measurement-based decomposition technique.
As discussed in Sect.3.2, the RoR (perfect-shuffle) operation in U d−D−QHT is use- ful for spatially separating the low-frequency from high-frequency components in the decomposed quantum state while preserving the data locality.After applying the Hadamard (H) gate in the U d−D−QHT operation, see Fig. 7, the state amplitudes alter- nate between low-frequency (even indices) and high-frequency (odd indices) terms.For every dimension, applying the RoR operation to the qubits, i.e., moving the leastsignificant qubit to the most-significant qubit as shown in Fig. 7, spatially combines/ clusters similar frequency terms together during measurements.Therefore, optimizing out all perfect-shuffle gates would not affect the overall data transformation.However, it reduces the overall depth of the packet and pyramidal QHT circuits resulting in the circuit shown in Fig. 10a.The resulting circuit is composed of  ⋅ d parallel H gates spanning the least-significant qubits in each dimension for an -level, d-dimensional decomposition.The simplified circuit is noteworthy for having a constant circuit depth of 1 H gate independent of the number of decomposition levels.
As shown in Fig. 11a, when an H gate is applied to the least-significant qubit of an n-qubit state � n ⟩ as described by (1), the resultant state could be represented by � H n ⟩ whose full-measurement probability distribution P( H n ) is given in (15b).Furthermore, (15c) and Fig. 11b display the partial-measurement conditional probability distribution of � H n ⟩ when the least-significant qubit q 0 is excluded from measurements after apply- ing the H gate. (13b) , where Improving quantum-to-classical data decoding using optimized… It could be concluded based on (15c) and (5b) that the circuits shown in Fig. 11b and Fig. 2b are equivalent where the H gate is effectively non-existent.As such, when performing QHT-based Q2C data decoding and only measuring the low-frequency qubits, it is possible to ignore the H gates and create a (15a) , where 0 ≤ j < N 2 (15b) , where , where Improving quantum-to-classical data decoding using optimized… circuit that can perform decomposition using only measurement gates as shown in Fig. 10b.Therefore, such a zero-depth circuit allows us to perform dimensionally reduced Q2C data decoding using -level, d-dimensional QHT by conducting partial-measurements while excluding the least-significant qubits per every d dimension of the data, see Fig. 10b.Note, however, that the zero-depth circuit is restricted only to decomposition, i.e., partially measuring k qubits from an n-qubit state, where 0 ≤ k ≤ n .When performing reconstruction via inverse-QHT, the Hadamard gates will be necessary to restore the high-frequency components in accurate and full data reconstruction/recovery.

Experimental results
The efficacy of our proposed QHT-based Q2C data decoding methods was verified by encoding various sizes of 3D data (RGB images) on both quantum simulators and actual quantum hardware followed by applying QHT for various levels of decomposition.The circuits ranged in size from 8 qubits to 26 qubits to encode multispectral, high-resolution images of ( 8 × 8 × 3 ) to ( 4096 × 4096 × 3 ) pixels.The QHT operation was restricted to two dimensions (length and width) to facilitate the maximum possible number of decomposition levels, see (10).In other words, QHT was performed only on the spatial dimensions of the images, not the color bands.Note that with only three color bands (red, green, blue), the statevector was padded with zeroes to comprise a fourth color band, since 2 qubits were required to represent the color dimension, i.e., n 2 = ⌈log 2 3⌉ = 2 .The QHT-based Q2C methods were evaluated for their circuit depth and execution time as reported by the Qiskit SDK from IBM Quantum [13,15].The Pearson correlation coefficient [20] is then used to compare the original images x with the reconstructed images y (reconstructed from the decomposed images using inverse-QHT), see (16), where x and ȳ are the mean values of x and y, respectively.
In addition, experiments using conventional and QFT-based Q2C data decoding were performed on the same dataset for comparison against the QHT-based techniques.Conventional Q2C data decoding was implemented by measuring all qubits in each circuit and was evaluated in terms of Pearson correlation and hardware execution time, see Figs. 15, 16, 17, 18.Using the QFT implementation built into Qiskit [21], we were able to evaluate QFT-based Q2C in terms of circuit depth, see Tables 1 and 2, execution time, see Fig. 17, and total gate count, see Tables 3  and 4.
All Q2C methods were implemented on Qiskit version 0.39.4 [13].Simulation results were collected using the Aer simulator on a dedicated node of a ( 16)   Improving quantum-to-classical data decoding using optimized…  Improving quantum-to-classical data decoding using optimized… high-performance computing (HPC) cluster at the University of Kansas (KU).The cluster node used for our experiments is configured with two 12-core Intel Xeon E5-2680 v3 CPUs operating at a base clock of 2.50GHz, PCIe Gen 3.0 connectivity, and 503GB of available memory configured as 8 ×64GB physical DDR4 DIMMs operating at 2,133MHz.Experiments on actual quantum hardware were performed on ibmq_toronto, an IBM Quantum Falcon r4 processor equipped with 27 qubits [22].The quantum device has a median CNOT error of 1.065 × 10 −2 , median readout error of 2.360 × 10 −2 , median T1 of 105.97 μ s, and median T2 of 101.9 μ s [22].

Accuracy of quantum Haar transform
During decomposition, information degradation arises from the loss of high-frequency components after each level of QHT, compounded by additional losses due to typical gate noise and statistical errors of quantum circuits.Experimental correlation results were gathered to quantify information loss for 32, 000 shots (the maximum available on ibmq_toronto) and 1, 000, 000 shots (the maximum available for simulation), see Figs. 12 and 13, respectively.The decomposed Table 1 Packet circuit depth in terms of H, SWAP, and Controlled-Phase gates images were reconstructed to calculate their correlation with the original images at the same resolution.Reconstruction was performed classically using a kernelbased method of inverse 2D-QHT to mitigate the introduction of further errors.As such, execution times are not considered for reconstruction.
Differences in correlation among the QHT-based techniques, i.e., packet, pyramidal, and measurement-based, were negligible and therefore they were represented as 'QHT-based Q2C' in Figs. 14 and 15.Two additional plots were included to distinguish between two sources of information loss in QHT-based Q2C: a) sampling/ statistical errors and b) errors from data compression and/or partial measurement.First, we plotted the correlation from conventional Q2C, see Fig. 4, to represent the effect of sampling error alone.Next, we repeated these experiments on a classical computer using the classical Haar wavelet transform to represent the information loss due to algorithmic data compression.Pearson correlation, as a metric for similarity, could not be calculated for QFT-based Q2C data decoding due to the fact that QFT does not preserve the spatial and/or temporal locality of the data.
For conventional Q2C, the correlation coefficient monotonically decreases as the number of states increases and the number of shots is fixed, see Figs. 14 and 15.For a given number of states, there exists a threshold where the number of shots is sufficient to characterize the quantum state and taking additional samples has a negligible impact on similarity, as shown in Figs.14a and 15a when N ≤ 2 12 , where Improving quantum-to-classical data decoding using optimized… increasing the number of shots from 32, 000 to 1, 000, 000 only marginally affects the value of correlation coefficient.'Undersampling' is observed when the number of shots is insufficient to sample a quantum state, which can be visually seen in Figs. 12 and 13 when the measured image appears black.Note how at = 3 in Figs.12b and f (sampled with 32, 000 shots), the state is undersampled, but increasing the number of shots to 1, 000, 000 in Figs.13b and f is able to properly sample the quantum state.
The behavior of classical Haar DWT, see Figs. 14 and 15, shows a monotonic increase with respect to the number of states and a monotonic decrease with the number of decomposition levels.From Figs. 14a and 15a, it is evident that performing decomposition on a smaller images has a greater impact on the change of the correlation coefficient than performing decomposition on a larger images.As it could be seen in Figs.14b and 15b, the information loss, represented by decreasing values of the correlation coefficient, becomes larger as the number of levels of decomposition increases, where each level of decomposition decreases the resultant image size.
From Figs. 14a and 15a, as the image/data size increases for a fixed number of decompositions, we observe that the correlation coefficient of QHT-based Q2C closely aligns with the classical wavelet plot, demonstrating how the algorithmic component of information loss dominates for small data size.The information loss from sampling a larger image/data dramatically outweighs the relative gain in correlation from performing the Haar transform on a larger image.Similar behavior extends to applying different levels of decomposition to fixed-size images as shown in Figs.14b and 15b.Beyond a certain number of decomposition levels, only few qubits are being measured such that the correlation aligns with the expected behavior from the classical Haar transform.However, for lower levels of decomposition, the comparatively small information loss from the Haar transform helps ameliorate the dramatic information loss from measuring large images.
Given a large enough image for a fixed number of shots, the QHT-based Q2C method can outperform conventional Q2C data decoding in terms of the quality/ similarity of results (fidelity) using the correlation coefficient as a metric.Figure 16 reflects the improvement in terms of the correlation difference Δ (in percentage) between the QHT-based and conventional Q2C methods, see (17).It is worth mentioning that as the number of shots increases, for any given image/size data and level of decomposition, conventional Q2C benefits more than QHT-based Q2C from the increased number of shots, resulting in decreased correlation improvement, see Figs. 16a and b.As discussed earlier, there exists a threshold where the number of shots is sufficient to characterize the quantum state and taking additional samples has a negligible impact on similarity for QHT-based Q2C.
As the image sizes increase, the rate of correlation improvement increases until all levels of decomposition outperform the conventional decoding technique once n ≥ 18 for 32, 000 shots and n ≥ 22 for 1, 000, 000 shots.The largest improvement is seen when n = 26 , = 7 for 32, 000 shots, where the QHT-based Q2C method (17) Δ = QHT-based − conventional achieves a 91.18% correlation coefficient compared to a 4.12% correlation coeffi- cient for conventional Q2C, see Figs. 14b and 16a.

Performance of quantum Haar transform on hardware
On quantum simulators, quantum circuits are often preset to their initial state, and accordingly, the overhead associated with state synthesis (preparation) is usually ignored.However, on actual quantum hardware, state synthesis requires a deep quantum operation to be applied to the ground �0⟩ ⊗n state.IBM Qiskit uses the Initialize API [23] to implement state synthesis leveraging a method of depth O(2 n ) [24,25].Including state preparation in hardware execution would introduce significant overhead to execution time, obfuscate performance differences between Q2C methods, and restrict experiments to at most 14-qubit states, i.e., images of size ( 64 × 64 × 3 ) pixels, due to constraints of the IBM Quantum platform.Therefore, excluding state preparation allows us to leverage the full capabilities of the 27-qubit ibmq_toronto processor from IBM Quantum [22] to compare the execution times for conventional Q2C, QFT-based Q2C, and QHT-based Q2C methods on 8-26 qubit circuits, see Fig. 17.
Taken together, our results demonstrate QHT-based Q2C data decoding, particularly the measurement-based technique, exhibits significant speedup compared to contemporary Q2C techniques on hardware.Speedup is shown in Fig. 18, where it is calculated as the ratio between the execution time of a contemporary Q2C technique for a given image size and the execution time of the corresponding -level measurement-based decomposition.
Figure 18 shows near-universal speedup of the measurement-based QHT technique compared to conventional and QFT-based Q2C data decoding on hardware.In general, we observe higher speedup for larger circuits and more decomposition levels as expected, since the measurement-based QHT technique measures  ⋅ d fewer qubits than either the conventional or QFT-based Q2C techniques while being significantly space efficient using no additional quantum gates.Moreover, these results include circuit-independent overhead from resetting qubits to the ground state between shots.If executions were performed restlessly, we should expect to see even greater speedup from the proposed measurement-based Q2C technique over QFTbased Q2C.

Comparison of packet and pyramidal circuits
The depth analysis provided in [8,11] for the packet and pyramidal circuits assumes serial execution of each level of decomposition to provide a pessimistic prediction of execution time.While the new analysis in (11a) to (13b) is more physically accurate, quantum devices also possess unique qubit coupling restrictions requiring additional SWAP operations which are not considered in our analysis.Tables 1 and 2 present the circuit depths of the packet and pyramidal decomposition techniques, respectively, before and after interleaving in terms of H and SWAP gates.These values 1 3 Improving quantum-to-classical data decoding using optimized… were collected from the QuantumCircuit.depth()[15] API in Qiskit and align with theoretical expectations from (11a) and (13a).Similarly, Tables 3 and 4 present the total gate count of the packet and pyramidal decomposition techniques, respectively, in terms of H and SWAP gates.These values were collected from the  QuantumCircuit.count_ops()[26] API in Qiskit and align with theoretical expectations from ( 12) and (14).
Both the packet and pyramidal variants of our proposed QHT-based Q2C techniques have identical circuits at = 1 and only become distinct for higher levels of decomposition, i.e., when > 1 .For the circuits from [8, 11], the pyramidal cir- cuit depth increases quadratically with increasing levels of decomposition, while the packet circuit depth increases linearly.As a result, the pyramidal circuit depth intersects with the packet circuit depth at max , see (10), and would be expected to become shallower if further decomposition levels were possible.By contrast, the proposed packet circuits for multilevel decomposition are strictly shallower than the proposed pyramidal circuits.Overall, the overlapping optimization to the QHT circuits were critical to achieve shallower circuits than QFT for any image size and level of decomposition.
The circuit execution times as modelled by (11b) and (13b) do not include the overhead associated with the measurement operations (gates), resulting from repeated qubit resets among circuit samples (shots).Accordingly, we conducted experiments to determine that overhead and accounted for it in our results by reporting the per-shot execution times, as shown in Fig. 19.After accounting for measurement-gate overhead, the per-shot execution time on hardware for both packet and pyramidal decomposition was upper-bounded by the execution time predictions of the pessimistic sequential model from [8,12] and lower-bounded by the interleaved/overlapped model presented in this work, see Fig. 19.Such behavior should be expected, since additional SWAP gates from hardware transpilation were not considered.
The performance of the packet and pyramidal circuits in Figs.17c and 19 reflect expected behavior for < 10 from (11b) and (13b), due to how the interlevel per- mutations in pyramidal decomposition undermine the parallelism seen from overlapping levels of packet decomposition, in spite of reducing the size of the U d−D−QHT operator every level of decomposition.However, quantum devices have varying topologies and usually are not fully connected; therefore, additional SWAP gates are included during hardware transpilation to compensate for the mismatch between the algorithmic requirements and the target topology of the quantum device.As a result, at higher levels of decomposition, the packet and pyramidal circuits on actual hardware were close to following the reported model in [8,11], as shown in Figs.17c  and 19.
Finally, as discussed in Sect.3.2, we used the total gate count as a metric for the spatial complexity of our proposed methods.Compared to existing methods (QFTbased Q2C), the space efficiency of our QHT-based Q2C (packet and pyramidal) in terms of the required total gate count is demonstrated through the results shown in Tables 3 and 4. For 26-qubit circuits and 12 levels of decomposition ( = 12 ), our QHT-based Q2C requires 1.26× and 1.64× fewer gates for packet and pyrami- dal decomposition, respectively, compared to QFT-based Q2C.Moreover, our QHTbased Q2C demonstrates up to 15.17× higher space efficiency than QFT-based Q2C for 26-qubit circuits and 1 level of decomposition ( = 1) , see Tables 3 and 4.

3
Improving quantum-to-classical data decoding using optimized…

Conclusions and future work
Contemporary methods of quantum-to-classical (Q2C) data decoding incur significant time overhead from repeated sampling of the quantum state, making it difficult to practically implement time-efficient quantum algorithms.This work proposed Q2C data decoding methods based on the multidimensional, multilevel-decomposable quantum Haar transform (QHT), including a 'measurement-based' method that requires no additional quantum gates.All methods were implemented on IBM Quantum's Qiskit SDK, executed both on a simulator and actual quantum hardware.The experimental results reveal the efficacy of the proposed techniques to improve time efficiency up to 14% and 78% in execution time compared to conventional Q2C and QFT-based Q2C, respectively, while simultaneously improving measurement accuracy.Moreover, the proposed QHT-based Q2C method achieved up to 15× higher space efficiency than the QFT-based Q2C method.In our future work, we will leverage our proposed QHT-based Q2C techniques for data-intensive applications such as quantum machine learning (QML).Comparisons with quantum mixed state measurement techniques, such as density matrix reconstruction [27] and quantum state tomography [28], along with quantum compression techniques [29] will be investigated.We will also investigate the effect of different topologies of quantum devices on the performance of our proposed quantum algorithms.

Fig. 2
Fig. 2 Measurements of an n-qubit quantum state � n ⟩

Table 2
Pyramidal circuit depth in terms of H, SWAP, and Controlled-Phase gates

Table 3
Packet gate count in terms of H, SWAP, and Controlled-Phase gates

Table 4
Pyramidal gate count in terms of H, SWAP, and Controlled-Phase gates