Scalable image compression algorithms with small and fixed-size memory

The SPIHT image compression algorithm is characterized by low computational complexity, good performance, and the production of a quality scalable bitstream that can be decoded at several bit-rates with image quality enhancement as more bits are received. However, it suffers from the enormous computer memory consumption due to utilizing linked lists of size of about 2–3 times the image size. In addition, it does not exploit the multi-resolution feature of the wavelet transform to produce a resolution scalable bitstream by which the image can be decoded at numerous resolutions (sizes). The Single List SPIHT (SLS) algorithm resolved the high memory problem of SPIHT by using only one list of fixed size equals to just 1/4 the image size, and state marker bits with an average of 2.25 bits/pixel. This paper introduces two new algorithms that are based on SLS. Like SLS, the first algorithm also produces a quality scalable bitstream. However, it has lower time complexity and better performance than SLS. The second algorithm, which is the major contribution of the work, upgrades the first algorithm to produce a bitstream that is both quality and resolution scalable. As such, the algorithm is very suitable for the modern heterogeneous nature of the internet users to satisfy their different capabilities and desires in terms of image quality and resolution.


Introduction
Conventional image compression permits to recover the image at just one quality (bit-rate) and resolution (size). As the current users have diverse capabilities in terms of bandwidths, display resolutions, processing power, and memory, this compression paradigm will not fit all users. On the other hand, with a scalable image compression system, the quality or/and the resolution of the recovered image can be controlled in such a way that permits to the end user to decode the image B Ali Kadhim Al-Janabi Fig. 1 The 2D-DWT with three decomposition levels LL M−m subband during the inverse 2D-DWT process [5]. Thus, a resolution scalable bitstream can be easily attained if the resolution levels are encoded successively. Figure 1 illustrates the 2D-DWT for three decomposition levels (M 3). It also depicts the corresponding resolution levels. See [3,6] for more details.
The set partitioning in hierarchical trees (SPIHT) [7] is one of the benchmarks of the QSIC algorithms [8,9]. It builds trees denoted as the spatial orientation trees (SOTs) by exploiting the correlation between the pixels across the different resolution levels of the dyadic 2D-DWT. SPIHT has relatively low computational complexity and good performance. Unfortunately, it needs an enormous computer memory (about 2-3 times the DWT image) due to using three linked lists, namely the list of insignificant pixels (LIP), the list of significant pixels (LSP) to store the (i, j) coordinates of the coded image pixels, and the list of insignificant sets (LIS) to store the (i, j) coordinates of the roots of the generated trees. In addition, the memory management of the linked lists is complex and time-consuming because they must be accessed randomly due to adding/removing elements to/from them continually [10]. Lastly, the spanning of the SOTs across the different resolution levels together with the random access of the lists, inhibit to exploit the multi-resolution features of the dyadic 2D-DWT for producing a resolution scalable bitstream [3].
Most reduced memory SPIHT algorithms that exist in the literature adopt the linear indexing technique to map the DWT image into a 1D array of the same image size [9,11]. However, this technique demands either storing the DWT image into the main memory and then writing it into a 1D array or both DWT image and the 1D array must be available in the RAM at the same time. Unfortunately, the former solution is time-consuming while the latter one demands extra memory equals to the DWT image [6,10]. These constrains prohibit to use this approach for low memory and/or low power processing units such as wireless sensors [12].
Al-Janabi proposed a reduced memory SPIHT called the Single List SPIHT (SLS) without using the linear indexing technique [13]. It employs one list of fixed size equal to 1/4 the DWT image, and an average of 2.25 bits/pixel markers. It was demonstrated that the SLS algorithm kept nearly the same complexity and had better performance than the original SPIHT algorithm, with memory saving of about 75%.
Monauwer et al. [14] mitigated the high memory requirements and the lack of resolution scalability in SPIHT in his listless highly scalable-SPIHT (LHS-SPIHT) algorithm. It replaced the lists by state marker bits with average memory of 4 bits/pixel. Unfortunately, LHS-SPIHT must test all the pixels and all the roots of the SOTs two times in each bit-plane coding pass. Thus, the complexity of LHS-SPIHT rises significantly compared with the original SPIHT. Equally important, the algorithm also adopts the linear indexing technique. Hence, it has the same cons mentioned before. Al-Janabi et al. proposed the highly scalable listless SPIHT (HSLS) algorithm [6]. HSLS has lower memory, lower complexity, and better performance than LHS-SPIHT. However, HSLS also suffers from some complexity increments and performance decrements due to removing all lists.
The rest of the paper is ordered as follows: Sect. 2 summarizes the SLS algorithm. Section 3 introduces the proposed MSLS and HS-MSLS algorithms. Section 4 provides the simulation results of our work, and other related works for the purpose of comparison. Finally, Sect. 5 concludes the paper.

Overview of the SLS algorithm
Like other wavelet-based scalable image compression algorithms, the image is first transformed to the wavelet domain using an M levels (M 3-6) of the dyadic 2D-DWT. The wavelet image is then quantized by approximating every wavelet coefficient to its nearest integer. In what follows, the wavelet coefficient is called a "pixel" for simplicity. SLS (and all other algorithms) uses a threshold T 2 b max , where b max is the maximum bit-plane in the quantized DWT image (W ), which is equal to: SLS encodes the image by multiple bit-plane coding passes with halving T (T T /2) at each coding pass until T 1. At any coding pass, a pixel c(i, j) is considered insignificant (ISG) if |c(i, j)| < T , and it becomes significant (SG) when |c(i, j)| ≥ T . Similarly, an SOT is considered ISG if all of its pixels are ISG, and it becomes SG when one or more of its pixels become SG. SLS employs the same SOTs used by SPIHT. The SOTs are constructed by grouping the pixels in LL M subband (level R 0 ) into (2 × 2) pixels, with the topleft pixel in each group is excluded. Each one of the other three pixels is considered as a root (r) to four pixels located in LH M , HL M , and HH M subbands (level R 1 ) according to its orientation. Every set of these four pixels are referred to as the offspring (O) of r. Then, every O is in turn considered as a root to four offspring in LH M−1 , HL M−1 , and HH M−1 subbands (level R 2 ). This recursive linking of the roots continues till the LH 2 , HL 2 , and HH 2 subbands (level R M−1 ) are reached. That is, the pixels in LH 1 , HL 1 , and HH 1 (level R M ) cannot be roots since they are the leaves of the trees.
The central idea of SLS is that the pixels stored in LSP and LIP are the four offspring of a root that has a SG SOT. These offspring can therefore be inferred from the SG root itself rather than being kept in these lists. Eliminating these lists will reduce the memory greatly as LIP and LSP occupy about 75% of the total memory. In order to distinguish between these pixels, we use a 2 bits/pixel state marker referred to as σ (i, j) to specify the pixel's significance type as follows: 2: c(i, j) is a visited SG (VSG) pixel that is found SG in one of the previous coding passes.
In addition to σ (i, j), SLS employs a single list labeled the list of root sets (LRS). Each entry in LRS stores the (i, j) coordinates for each SOT's root r (i, j), and a one-bit marker termed δ(i, j) that is initialized to 0 to indicate that r (i, j) is ISG and updated to 1 when r (i, j) becomes SG. The LRS is initialized by all the roots located in LL M except the excluded pixels. As stated previously, the roots of the SOTs do not lie in the subbands HL 1 , LH 1 , and HH 1 (level R M ). So, the maximum size of LRS is 1/4 the image size, and the average memory consumption of the overall marker bits (σ and δ) is 2.25 bits/pixel. It is worth noting that when a root is added to LRS, it would not be removed. This coding paradigm enables to implement LRS as an ordered simple 1D array that is sequentially accessed using the first in first out (FIFO) method which is widely known to be the fastest and the simplest access method [10]. In contrast, the LIS used by SPIHT is implemented as a linked list with random access due to adding/removing roots to/from it continuously. Clearly, by making memory management simpler, the algorithm's complexity will be reduced further.
Each bit-plane coding pass consists of the sorting and the refinement sub-passes (except the 1st pass which performs the sorting sub-pass only). The sorting sub-pass starts by coding all pixels in LL M subband as follows: if c(i, j) is untested or yet ISG (σ (i, j) 0)), it is tested for significance. If it becomes SG, then 1, and its sign bit is sent to the bitstream, its marker bit σ (i, j) is updated to 1 to indicate that c(i, j) becomes SG. If c(i, j) is still ISG, a 0 is sent to the bitstream. On the other hand, if c(i, j) is already SG (i.e., σ (i, j) 1)), it is marked as VSG by setting σ (i, j) 2, in order to be refined latter on in the current coding pass. This step is necessary to distinguish between these pixels and the pixels that will become SG during the current pass. Next, every root r (i, j) in LRS is tested and coded accordingly. If r (i, j) is yet ISG (δ(i, j) 0)), its SOT is constructed and its significance is checked with respect to T . If the SOT is yet ISG, a 0 is sent to the bitstream. If it becomes SG, a 1 is sent to the bitstream, and δ(i, j) is updated to 1. Then, its four offspring O(i, j) are coded as pixels as given above.
are added to LRS as ISG roots to be coded in the same manner later on during the current coding pass. On the other hand, if the r (i, j) is found to be SG in one of the previous passes (δ(i, j) 1), then its O(i, j) are recomputed, and only the ISG and SG offspring (i.e., except the VSG ones) are coded as pixels as given above.
In the refinement sub-pass, all pixels in LL M subband that are marked as VSG pixels are refined. Then, the LRS scanned for the SG roots only. For each SG root, its O(i, j) are also recomputed, and only the VSG ones are refined. A VSG pixel c(i, j) is refined to a more bit precision by sending its bth bit to the bitstream. The coding pass terminates by updating T to T /2 to begin a new coding pass. The encoder keeps going until T 1. On the other hand, the decoder can stop when the target bit-rate is attained.

The proposed algorithms
This section first introduces the modified SLS (MSLS) Then, we present the HS-MSLS algorithm, which upgrades the MSLS to produce a highly (quality and resolution) scalable bitstream.

The MSLS algorithm
The MSLS algorithm introduces the following two modifications to SLS that lower its computational complexity and improve its performance, especially at low bit-rates: The first modification is the elimination of the need to recompute the offspring that belong to every SG root during the sorting subpass and during the refinement sub-pass. The idea is based on the fact that when a root that is lied in the levels R 0 to R M-2 becomes SG, its O(i, j) will be added to LRS. So, at the next coding pass, there is no need to recompute them again as they are already stored in LRS. That is, we need to recompute the offspring of the SG roots that are lied in level R M−1 only instead of recompute the offspring of the SG roots that are lied in levels R 0 to R M−1 as done by SLS. Evidently, this will reduce the algorithm's complexity.
The second modification is based on the observation that the maximum pixel value in the LL M subband is 4-8 times the maximum pixel value in all the other subbands. This is because LL Hence, as the threshold T depends on b LL M max , then in the first 2 or 3 (or more) coding passes, only the pixels in LL M may be SG, and the pixels in all other subbands will be ISG. Consequently, all the SOTs will also be ISG. Therefore, there will be a waste of processing power, processing time, and transmitted bits in attempting to test and code these ISG SOTs.
The proposed solution is to employ two thresholds T 1 and The initial threshold T is set to T 1 . Then, the algorithm performs several mini coding passes that encode the pixels in LL M subband only until T T 2 . Starting from T T 2 , the algorithm implements the same coding passes as SLS, taking into account the first modification.
The decompression algorithm undoes the same steps as the encoder. However, it does not need to do the significance test for the pixels and the SOTs. That is, when the decoder receives 0/1, this means that the corresponding pixel or SOT is ISG/SG, respectively. So, the decoder does not need to build the SOTs and check their significance, as the corresponding received bit determines this. This makes the decompression algorithm runs faster than the compression one. This feature is very useful for scalable image compression schemes since images are compressed once but may be decompressed several times [3,4].

The HS-MSLS algorithm
A QSIC algorithm like MSLS can be upgraded to be a HSIC algorithm, by encoding the resolution levels (R 0 -R M ) successively in each coding pass. That is, each coding pass must encode all the data that belong to R m before proceeding to the next level R m+1 .
It can be shown that the coding paradigm of the MSLS algorithm permits attaining this object partially. Consider the first coding pass. For every new SG root that belongs to R 0 , four roots (its offspring) that belong to R 1 will be added to LRS, and each new SG root of them will in turn add four roots that belong to R 2 , and so on. Hence, we can see that at the first pass, the resolution levels are encoded successively. Let K m be the number of roots stored in LRS that belong to the resolution level R m , 0 ≤ m ≤ M − 1. Since LRS is initialized by the roots of R 0 , then, K 0 is fixed and equal to 3 4 |R 0 | 3 4 |LL M |. However, for m ≥ 1, K m is not fixed as it depends on the number of roots that will become SG. This problem can be solved simply by counting and storing K m . Unfortunately, the problem becomes more complicated starting with the second coding pass. This is because for every new SG root that belongs to R 0 , its offspring that belong to R 1 will be added to the end of LRS (i.e., after the last root that belong to R M−1 ). And for every new SG root that belongs to R 1 that is added to LRS during the previous pass, its offspring that belong to R 2 will be added after the last root that belong to R 1 that is added during the current pass, and so on. Hence, LRS is no longer arranged according to the resolution levels. The previous solution is not sufficient, as we must know the (separated) portions where the data of the different resolution levels are located in addition to their sizes.
The proposed solution to this problem is to preserve contiguous fixed-size portions within the LRS for the data of the different resolution levels referred to as resolution-dependent portions. The resolution-dependent portion P m is the portion of the LRS that is devoted to storing the data that belong to resolution level R m . In order to guarantee that P m can support all the roots of R m , its size must equal to the maximum size of R m (k max m ). Since K 0 is fixed, so k max 0 K 0 . It can be shown that for m ≥ 1, k max m is equal to the total number of pixels in the three subbands HL M−m+1 , LH M−m+1 , and HH M−m+1 that constitute R m . As these subbands are of equal sizes, so, k max m 3|HL M−m+1 |. To track these portions, we use two pointers referred to as the portion start pointer PS m and the portion end pointer PE m . P S m stores the index of the first root, and PE m stores the index of the last root in P m . Evidently, PS 0 0, and PE 0 k max 0 − 1. For m ≥ 1, PS m PS m−1 + k max m−1 . PE m is initialized by PS m , and it is updated each time a root is added to P m . The adopted structure of the LRS facilitates sorting and tracking the roots and their offspring according to the resolution level to which they belong. This is simply achieved: if a root r (i, j) stored at the portion P m , 0 ≤ m ≤ M − 2, becomes SG, then its O(i, j) are added at the end of the next portion P m+1 . Notice that for the roots that are lied at the portion P M−1 , their offspring are not added to LRS. Instead, these offspring will be deduced from their parent roots that are lied at the portion P M−2 .
The HS-MSLS algorithm works exactly as MSLS except that it makes use of the resolution-dependent coding passes with the associated LRS resolution-dependent portions. Every mini or normal coding pass starts by coding the ISG and SG pixels, and then refining the VSG pixels in LL M .
Every normal coding pass first performs the resolutiondependent sorting sub-pass, followed by the resolutiondependent refinement sub-pass. In each one of these subpasses, the corresponding pixels and roots of LRS, of every resolution portion P m , 1 ≤ m ≤ M − 1, are encoded successively as done in MSLS.

Simulation results and discussion
The proposed MSLS and the HS-MSLS algorithms are evaluated by MATLAB Package using a laptop furnished by Intel Core i3 µprocessor with 1.8 GHz CPU and 2 GB RAM. We employed the conventional grayscale test images Lena, Barbara, Mandrill, and Goldhill, each of size (512 × 512) pixels. The images are first transformed using the dyadic (9, 7) 2D-DWT. The results are represented by the algorithm's performance, its computational complexity, and its memory usage against the compression bit-rate which is the average number of bits per pixel (bpp) for the compressed image. The performance is measured by the mean squared error (MSE) between the original image (I o ) and the reconstructed image (I r ), each of size M × N pixels. MSE is defined as: However, the peak signal-to-noise ratio (PSNR), which is derived from the MSE, is more employed. It is defined as: where P max is the maximum pixel value in I o . For grayscale images, P max 255. Obviously, the lowest MSE or the highest PSNR for a given bpp is the target. Table 1 gives the PSNR against the bit-rate for the QSIC SPIHT [7], SLS [13], and the proposed MSLS algorithms. The results of SPIHT are gotten by executing the SPIHT Public License MATLAB program of Mustafa and Pearlman [15]. Every image is decomposed into M 6 levels since SPIHT is evaluated using this value of M. At each bit-rate, the PSNR of the MSLS is bolded if it is the highest. As clearly shown, the PSNR superiority of MSLS over SPIHT is apparent for all the test images and at all bit-rates. Moreover, MSLS is slightly better than SLS. This improvement is achieved mainly due to the adopted two thresholds method that reduced the number of transmitted bits especially at the early bit-planes coding passes. Tables 2 and 3 depict the PSNR against the bit-rate for the HSIC LHS-SPIHT [14], HSLS [6], and the proposed HS-MSLS algorithms when the decoder recovers the image at full resolution (m 5) and at 1/4 resolution (m 4), respectively. We used five decomposition levels as the other algorithms. At each bit-rate, the PSNR of the HS-MSLS is bolded if it is the highest. It is worth noting that in Table 3, the comparison is made between the original and the recovered LL M−m (LL 5−4 LL 1 ) subbands, which have the same sizes (see [16] for more details). The tables depicts that the proposed HS-MSLS algorithm has the highest PSNR among the other algorithms for nearly all cases. In addition, HS-MSLS has unnoticeable PSNR deterioration as compared to MSLS. Table 4 shows the complexity represented by the encoding time and the decoding time against the bit-rate for the QSIC and HSIC algorithms. The Lena image is selected for this purpose. The shortest coding and decoding times at each bitrate for the QSIC and for the HSIC algorithms are bolded. As it can be noticed, for all algorithms, the encoding time is longer than the decoding time. This is expected since the decoder does not require building the SOTs nor testing their significances. It can be also shown that for MSLS, both the encoding and decoding times are widely shorter than that of SPIHT for all bit-rates. This is mainly due to removing the lists, which in turn reduced the random access read/write memory operations. Additionally, these times are also shorter than that of SLS. This speed improvement is the result of eliminating the need of recomputing the offspring for every SG root that is lied in resolution levels R 0 − R M−2 twice in each coding pass. The table also depicts that our HS-MSLS algorithm is again the fastest one in encoding and decoding times among the other HSIC algorithms. In addition, a comparison between MSLS and HS-MSLS reveals that the later encompasses unnoticeable speed increment in encoding and decoding times. The reason for this negligible increment is separating the LRS into the M resolution portions (P m ) that are FIFO accessed by the associated M pointers instead of FIFO accessing the entire LRS by one pointer.

Memory requirements
The memory requirement is measured by the amount of computer memory needed by the algorithm to compress/decompress an image with (N × N) pixels. As mentioned previously, the memory of SPIHT is variable and depends on the bit-rate. However, in order to guarantee that SPIHT works properly for all bit-rates, the memory at full bit-rate must be used. At this rate, the total number of LSP and LIP entries is equal to two times the number of pixels (N × N), and the number of LIS entries is equal to N × N/4. So, the total memory of SPIHT measured in bits is [9]: where b is number of bits needed to store each one of the (i, j) pixel coordinates in LSP or LIP, and c is number of bits needed to store each one of the (i, j) pixel coordinates in LIS. So, b log 2 N bits, and c log 2 (N /2) bits. The LHS-SPIHT algorithm uses fixed-size memory of an average of 4 bits/pixel with total memory equal to 4(N × N) bits. The HSLS algorithm also utilizes fixed-size memory of an  Table 5 depicts the memory requirements in kilobytes (KB) of these algorithms for different image sizes. The column (%) represents the percentage of the total memory to the memory required to store the DWT image, which is equal to 16(N × N) bit, as each wavelet coefficient is represented by 16 bits. As shown, the memory of our HS-MSLS is greatly lower than that of SPIHT, and it is slightly higher than that of LHS-SPIHT and HSLS algorithms. However, as mentioned previously, the LHS-SPIHT algorithm utilizes the linear indexing technique that maps the DWT image to 1D array, which demands storing both in the memory. So, the image size should be added to the actual memory consumption of LHS-SPIHT.

Conclusion
The paper presented the MSLS and the HS-MSLS algorithms. The MSLS algorithm produces a quality scalable bitstream. As demonstrated from the simulation, MSLS has better PSNR, and has lower complexity than its predecessor the SLS algorithm. The proposed HS-MSLS algorithm upgraded the MSLS algorithm to produce a highly scalable bitstream that owns the quality and resolution scalabilities. As such, the image can be easily reconstructed at multiple qualities and resolutions using a simple bitstream parsing process. As given from the simulation results, the HS-MSLS algorithm has improved PSNR, and runs faster than the other HSIC algorithms. The proposed HS-MSLS is therefore very suitable for sending images over the Internet where the users to be serviced according to their capabilities and desires. Additionally, the high speed and reduced memory advantages of HS-MSLS make it very appropriate as a part of the real-time scalable video transmission systems, and for compressing super-resolution and 3D images.
Author contributions YJH prepared all the tables in the paper. MFH prepared all the Figs in the paper. All authors revised the paper.

Funding Not applicable.
Availability of data and materials The datasets used are available freely on the Internet, and can be accessed using the following link https://ccia. ugr.es/cvg/CG/base.htm

Declarations
Conflict of interest Not applicable.