Hyperspectral image super-resolution reconstruction based on image partition and detail enhancement

The hyperspectral image (HSI) super-resolution reconstruction has attracted much attention and been used widely in various study fields due to its low requirements on hardware in practice. However, most of the hyperspectral image super-resolution reconstruction studies apply one strategy for images with varying complexity of spatial information. This is not conducive to the improvement of Image processing efficiency and the extraction of complex details. Given the above, a new method named MSDESR (multilevel streams and detail enhancement) is proposed to reconstruct HSI by using partition reconstruction and detail enhancement. The MSDESR consists of a sub-map shunt block, a high-low-frequency information extraction with detail enhancement block, and a partition image reconstruction block. Firstly, the sub-map shunt block is designed to pre-classify hyperspectral images. The images are divided into complex and simple parts according to the spatial information distribution of the reconstructed sub-map. Secondly, the multiscale Retinex with detail enhancement algorithm is constructed to purify high-frequency noise-contaminated and enhance the image details by separating the samples into high- and low-frequency information. Finally, branching networks of different complexities are designed to reconstruct the images with high credibility and clear content. In this paper, datasets of QUST-1, Pavia University, Chikusei, the Washington DC Mal and XiongAn are applied in the experiments. The results show that MSDESR outperforms state-of-the-art CNN-based methods in terms of quantitative metrics, visual quality, and computational effort, with a 4.18% and 9.35% improvement in SRE and MPSNR metrics, and a 37% saving in FLOPs. Overall, the MSDESR performs well in hyperspectral image super-resolution reconstruction, which is time saving and preserves the details of spatial information.


Introduction
Hyperspectral images include dozens or even hundreds of tiny spectral channels, making them very spectrally discriminating due to their abundance of spectral information. As a result, HSI has been used for natural disaster warning, land cover identification, and change detection (Akgun et al. 2005). However, there is a trade-off between spatial and spectral resolution in hyperspectral remote sensing images due to the limitations of the imaging mechanism within the hardware device. Due to the narrow spectral slice, only a very limited fraction of the radiation reaches the sensor. So the spatial resolution is not satisfied while satisfying the spectral resolution imaging. This leads to a lack of assurance in the identification and interpretation of fine features. The acquisition of hyperspectral images with Communicated by Oscar Castillo. the high spatial resolution is difficult, which limits the application of hyperspectral images in various fields. In order to solve the difficulty of obtaining hyperspectral images with high spatial resolution, super-resolution reconstruction of hyperspectral images has come into view.
In the past years, many methods have been proposed to improve the spatial resolution of HSI, which are mainly divided into two categories: image fusion and single-image super-resolution (SR) (Gou et al. 2014). For the methods based on image fusion, it is necessary to use images with high spatial resolution in the same scene as a priori. This part of the a priori data are not always available due to its vulnerability to cloud cover and object motion in real situations. Therefore, most of the current studies use synthetic datasets to demonstrate the effectiveness of their methods. And image fusion is an ill-posed inverse problem. To obtain good performance, these methods usually require a lot of time and complex priors (Glasner et al. 2009). In contrast, single-image SR is more flexible and applicable without the need for additional data, and SR is a promising image processing technique designed to acquire high-resolution (HR) images from their low-resolution (LR) counterparts to overcome their inherent resolution limitations. In recent years, various reconstruction algorithms based on convolutional neural networks (CNNs) (Cordero-Martínez et al. 2022; Varela-Santos and Melin 2021) have been proposed for HSISR and have optimized the performance of HSISR. For example, Dong et al. (2022) designed a context-aware guided attention based on DenseNet to reduce the information redundancy of high-dimensional hyperspectral images in the feature transfer process. Subsequently, Li et al. (2022) proposed a multi-domain feature learning network combining 2D and 3D convolution to explore spatial and spectral features by sharing spatial information. Dong et al. (2021) proposed a model-guided deep convolutional network on this base, which significantly improves the reconstruction performance by exploiting the domain knowledge likelihood and deep image prior.
However, there are still many problems and challenges in HSISR work. On the one hand, due to the inherent characteristics of uneven spatial information distribution in the remote sensing images of hyperspectral images acquired using unmanned aerial vehicles, satellites and other platforms, using the same strategy for different regions of spatial information distribution not only consumes a lot of time but also limits the extraction of complex details and textures. Researchers have used image PATCH classification (Romano et al. 2016), image division into specific regions (Kong et al. 2021), and feature map classification (Wertheimer et al. 2021) to simply classify image spatial information to differentially capture spatial features and thus improve SR performance, but these models do not accurately classify remote sensing images with uneven spatial information distribution, limiting the representation capability of CNNs in the reconstruction process. On the other hand, the image edge detail information is not adequately modeled, resulting in image edges that may not correspond to the real image. To address this problem, recent studies have preserved image edge information by adding edge filters, such as edge-aware refiners (Guo et al. 2021), Rybak model (Qi et al. 2021), semantic segmentation (Li et al. 2020). Although the above methods can enhance the edge detail information of images, their use of single-scale edge modeling can only extract the edges of coarse or fine structures in images, resulting in less comprehensive edges extracted and obvious artificial artifacts and noise in each image region.
Given the above, we propose a super-resolution reconstruction framework for a hyperspectral image called MSDESR, which provides a reliable solution for the spatial reconstruction of hyperspectral remote sensing images. Hyperspectral remote sensing images are used as the input of the sub-map shunt network to shunt the spatial information according to the texture information of the image, which is divided into two data streams: simple and complex. The spatial information of the two data streams is separated by high-low-frequency information, and details are enhanced. Finally, the data are input into the regional reconstruction module (Mao et al. 2018), respectively, to achieve spatial hyperspectral remote sensing images superresolution reconstruction. This method can solve the problem that hyperspectral remote sensing images occupy the same amount of computation due to the uneven distribution of spatial information. In addition, the method can transform redundant computing operations into enhanced processing of complex detailed texture information, thus realizing resource conversion to improve reconstruction efficiency. The contributions of this article are summarized as follows: • A novel method named MSDESR is designed for HSISR which contains two main modules: sub-map shunt and multi-scale Retinex algorithm with detail enhancement. Compared to single strategy algorithm, our method is good at reducing the computational effort and enhancing the image details during image reconstruction. • A sub-map shunt network is proposed to reduce the computational effort. With the sub-map shunt network, the MSDESR can extract image features with different strategies to improve the image reconstruction efficiency. • A multi-scale Retinex algorithm is promoted to enhance the spatial details. It can simultaneously extract the edges in different clarities and enhance the details of the high-frequency information.

Related work
In recent years, there has been extensive research on the topic of super-resolution reconstruction for hyperspectral images. This chapter introduces the latest research methods related to hyperspectral remote sensing images and the latest work on regionalized reconstruction and detail enhancement.

Hyperspectral super-resolution methods for remote sensing images
Deep learning-based HSI SR is usually divided into two methods. One way is to improve the spatial resolution of each channel of HSI separately. For example, Jiang et al. (2020) achieved efficient computation of super-resolution reconstruction of single hyperspectral images by using a state-of-the-art single-channel SR method to reconstruct each channel of HSI separately. Chiman et al. (Kwan et al. 2018) used a single-image super-resolution algorithm combined with panchromatic (PAN) band images to improve the spatial resolution of HSI channel by channel. However, the channel-by-channel approach ignores the spectral correlation characteristics of HSI, which inevitably generates spectral distortion problems. Another approach is to use all channels of HSI together as input and fuse all features to learn end-to-end mapping (Wan et al. 2020). Wang et al. (2021) introduced a recurrent feedback network to model spectral correlation from a sequence perspective to take full advantage of the complementary and continuous information between the spectra of hyperspectral data. Based on this, Wang et al. (2022) further removed spectral distortion by retaining the spectral information of the HIS through an expanded projection correction network with an auto-encoder. These extremely deep networks achieve good reconfiguration performance. However, treating all areas of the whole image with only one strategy will reduce the effectiveness and efficiency of reconstruction.

Partition image reconstruction and detail enhancement
For different image regions, researchers are beginning to use different processing strategies. However, the key issue is how to allocate different image areas to different processing strategies. Using multipath CNN reconstruction with pathfinders, Ke et al. (Jiang et al. 2019) combined reinforcement learning and deep learning to find suitable paths to recover each region efficiently. RAISR (Romano et al. 2016) separated picture patches into clusters and creates a matching image enhancement strategy for each cluster. It also employs an efficient hashing approach to decrease the complexity of the clustering procedure. Inspired by the above methods, SFTGAN (Zhang et al. 2018) introduced a new spatial feature transformation layer that incorporates a high-level semantic prior before guiding image reconstruction in different regions, which is an implicit way to handle different parameter regions. These deeper networks achieve better image shunt performance but due to limited computing power and memory resources, they are difficult to train. In addition, in order to obtain clear detail information, the researcher further introduced a detail filter into the model. The EP-GAN ) model extracted detail information as priori labels from high-quality images to guide the network in inferring detailed details. Kui et al. (Zhang et al. 2019) built EESN network, removed noise pollution through mask processing, extracted detail image contour, and combined the extracted detail contour with the restored intermediate image to realize the enhancement of detail information. Compared with natural images, hyperspectral images are more spatially contaminated. If the details of hyperspectral images are extracted and enhanced directly, noisy results and false image details will appear instead. Therefore, the detail enhancement method for natural images is inapplicable to hyperspectral images.

Proposed methods
The framework of the MSDESR is shown in Fig. 1. The proposed framework mainly consists of three parts: a submap shunt block, a high-low-frequency information extractor with an enhancement block, and a regional image super-resolution reconstruction block. First, the input image is spatially decomposed . The data are shunted by a lightweight sub-map shunt network (based on the DenseNet ) model) according to the complexity of the image information. Second, improving the multiscale Retinex (Land 1964) with detail enhancement algorithm separates the high-low-frequency information of the image and enhances the high-frequency information of details in the image. Then, the processed high-low-frequency data are fed into large, medium and small networks of different complexity, and the network carries out super-resolution reconstruction of high-lowfrequency information, respectively. Finally, the completed super-resolution reconstructed images' high-low-frequency information is fused with linear weighting. And the clear images are reconstructed by stitching the sub-maps into complete images.

Sub-map shunts
The sub-map shunt network classifies images into two categories based on detailed texture information: easy and difficult parts. According to statistics, approximately 36% of the LR sub-maps (32 9 32) in the Pavia University dataset fall in the smooth region, while this percentage rises to 56% in the QUST-1 hyperspectral satellite dataset of GaoMi City. Based on this observation, large-scale remote sensing images are spatially decomposed into submaps. Smaller networks are used to deal with smooth regions of the image, and deep networks are used to deal with complex regions (Veganzones et al. 2015). Following spatial decomposition, different networks are used for super-resolution reconstruction of areas containing different morphological information. Sub-map decomposition is especially important for large remote sensing images because many areas are easy to reconstruct relatively. Submap shunting not only reduces computational effort and saves memory space in practice but also prepares for the subsequent extraction of high-low-frequency information.
As shown in Fig. 2, we create a lightweight sub-map shunt network inspired by the DenseNet model. The lightweight sub-map shunt network has four convolutional layers, three LReLU layers, an average pooling layer, and a fully connected layer, all of which are connected in a feedforward manner. Specific parameters are shown in Table 1. The feature extraction is handled by the convolutional layers, while the pooling and fully connected layers generate probability vectors. Specifically, this classification model allows a probability vector pðx i Þ to be generated for a subgraph x i that is decomposed from a large scale image X. The sub-map is identified by selecting the metric with the highest probability value to determine which class it belongs to. Experiments show that the structure is simple and achieves better performance than DenseNet.
When training with the baseline DenseNet network, it converges to an extreme point, and the resulting images are all classified into complex texture branches. For example, in the case of output vectors, a shunt result vector of [0.90, 0.10] is preferable to [0.54, 0.46]. As the latter appears to be random, the sub-map classification network loses its functionality. To avoid this issue, we design a lightweight sub-map shunt network based on the DenseNet model to carry out the sub-map shunt task, ensuring that both branches have equal chances of being selected. Some of the sub-map shunt network results are depicted in Fig. 3a, b. Figure 3a depicts the resulting map derived from the easily reconstructed shunt data. The result map shows that there is less texture information in the plots, and little difference between textures in terms of thickness, sparsity, and other easily distinguishable information. The results of the difficult reconstructed shunt data in Fig. 3b show that the textures are more complex, non-randomly arranged, and densely distributed. By comparing these two images, the sub-map shunt network can accurately classify the images into two categories: easy and difficult parts, enhancing shunt effectiveness.

Separation of high-low-frequency information
The Retinex algorithm is an image enhancement algorithm based on a human vision system. The traditional multiscale Retinex algorithm is used to extract high-low-frequency image information, but the local details of high-frequency information in the results are very poor. Therefore, we proposed a multiscale Retinex with detail enhancement algorithm to separate high-and low-frequency information from hyperspectral remote sensing images. It is assumed that the low-frequency information is a spatially smooth image. The improved multiscale Retinex with detail enhancement algorithm can be expressed as: where Sðx; yÞ and rðx; yÞ represent the original image and the output image. R (x, y) and Lðx; yÞ represent high-frequency information and low-frequency information, respectively. * represents the convolution sign. F i x; y ð Þ is the center surround function. K is the number of Gaussian center surround functions. The K value is usually taken as 3. W i is the weighting factor of the scale and W 1 =W 2 =W 3 = 1/3. Dðx; yÞ is the detail recovery section, which is responsible for improving the detail portion of the highfrequency information. The expression for low-frequency information extraction is: Figure 4 shows some of the results from extracting the image's high-low-frequency information from the original remote sensing data. The high-frequency information contains the remote sensing image's main features and detail information, whereas the low-frequency information comprises the remote sensing image with a large amount of smooth information. Therefore, during the reconstruction of super-resolution images, different processing strategies should be used for different frequency information.
The detail recovery section extracts local details in an image by utilizing DoG's multiscale differences (Kim et al. 2015). First, the Gaussian kernel function is applied to the image with the high-frequency information Rðx; yÞ to produce three distinct blurred images, and the blurred image of the reflection component is where G 1 , G 2 and G 3 are the Gaussian kernels of the three images with standard deviations of r 1 = 1.0, r 2 = 2.0 and r 3 = 4.0, respectively. Then, high-quality details D R1 , medium detail D R2 and coarse details D R3 are extracted for R x; y ð Þ. The details of high-frequency information can be expressed as: The detail information D of high-frequency information R x; y ð Þ is generated by merging three detail images. The expression is as follows: where z 1 , z 2 and z 3 are 0.5, 0.5 and 0.25, respectively. When high-quality detail is added to an image, D will extend the grayscale difference near the edges. But its excessive overshoot may cause grayscale saturation. To overcome this problem, the positive component of D R1 is reduced and the negative component of D R1 is enlarged in Eq. (10). Therefore, the detail is increased, while the saturation is suppressed in the experiments.  Figure 5 depicts the detail information extracted from the high-frequency information. The majority of the detail information may be extracted from the high-frequency data. Figure 5 shows the results of extracting high-frequency information before and after the Retinex algorithm improvement. In Fig. 5, image1 and image2 are the results of high-frequency information extracted from the easy and difficult reconstruction datasets, respectively. The results show that the details of the edge parts of the features and background in the improved high-frequency information are clearer. This makes the regional image super-resolution reconstruction module more sensitive to the extraction of detailed feature information and better to local feature extraction.

Partition image reconstruction model
To successfully conduct super-resolution reconstruction, different networks are used to process sub-maps of various levels of complexity, which is a divide-and-conquer strategy. Deep residual channel attention network (RCAN) employs a channel attention mechanism in the long and short skip residuals connections to adaptively reclassify the features of the channel to receive more information. In this work, we develop a RCAN-based regionalized image reconstruction method. As shown in Table 2 ( 0 -O 0 represents the original network), there is almost no difference in performance between RCAN(32) and  for the ''difficult low-frequency and easy low-frequency'' submap, while RCAN(50) can achieve roughly the same performance as . This suggests that we can reduce computational costs by using lightweight networks for simple sub-maps. As shown in Fig. 6, three RCAN models with the same network structure but a different number of channels are employed in this paper, details of the parameter settings are given in 4.2. In the first layer, three networks with 32, 50, and 64 channels are used for training of sub-map high-low-frequency information, specifically difficult low-frequency, easy low-frequency information, easy high-frequency information, and difficult high-frequency information. Because of the small difference in spatial complexity, sub-map shunt networks classify difficult low-frequency information and easy lowfrequency information into one group. As a result, we propose the method for reconstructing regional images that can differentially process different regional images.
Finally, the completed super-resolution reconstructed image's high-low-frequency information is fused and linearly weighted. The clear and complete image is reconstructed by stitching the sub-maps.
where Rðx; yÞ is the high-frequency information, Lðx; yÞ is the low-frequency information, and S is the final image after the fusion of high-low-frequency information using linear weighting. And l is the weighting factor and takes the value 0.5. To fully demonstrate the ability of the framework to encode spatial spectral information, tests were performed with 2-, 4-, and eightfold up-sampling factors. According to the data in Table 3, the MSDESR achieves better performance and lower computational cost than the original Fig. 4 High-low-frequency information separation effect network, realizing efficient encoding of spatial spectral information, with FLOPs reduced to 52-70%. Furthermore, the reduction in computation did not lead to a decrease in image quality, indicating that the performance improvement is not at the expense of computational burden. The results certify the significance of diverting the input submaps to their appropriate branches.

Experimental results and analysis
In this chapter, to evaluate the performance of the proposed MSDESR, extensive experiments are conducted on different real scenario datasets with other state-of-the-art methods.

Dataset information
To evaluate the effectiveness of our proposed method, experiments were conducted on five hyperspectral image datasets: the QUST-1 satellite dataset, the Pavia University dataset (Sun et al. 2021), the Chikusei dataset , the Washington DC Mall dataset , and the XiongAn dataset (Li et al. 2021). As shown in Fig. 7, the datasets have areas with typical ground cover types such as vegetation, suburban areas, and various buildings.
The QUST-1 satellite data are hyperspectral remote sensing images taken on July 2019 in Weifang City, Shandong Province, China. The spectral range is 400-1000 nm. The images are equally divided into 32 spectral bands with a spatial resolution of ten meters. The images are cropped to 128 9 128 pixels from approximately 5000 9 5000 pixels. The Pavia University data are part of the 2003 hyperspectral data from the city of Pavia, Italy, covering 103 spectral bands from 430 to 860 nm, with a spatial resolution of 1.3 m and 610 9 340 pixels. The Chikusei dataset was taken on July 29, 2014, by the Headwall hyperspectral imaging sensor in the agricultural and urban areas of Chikusei, Ibaraki Prefecture, Japan. The spectral range was 128 bands from 363 to 1018 nm with a spatial resolution of 2.5 m. The scene consisted of 2517 9 2335 pixels, and the images were cropped to a size of 128 9 128 pixels with 19 categories. The XiongAu dataset, a hyperspectral remote sensing image dataset, was acquired in October 2017 in XiongAu New Area (Matewan Village), China. The spectral range of it is 400-1000 nm with 250 bands and the spatial resolution is 0.5 m. The image with a spatial size of 3750 9 1580 pixels is cropped to a size of 128 9 128 pixels. The Washington DC Mall data set was collected by the HYDICE sensor, the spectral range of which is from 400 to 2400 nm, with 191 bands. The images with a spatial size of 1280 9 307 pixels were cropped to a size of 128 9 128 pixels.
For each image in each dataset, we randomly chose a 128 9 128 pixel region to test the performance of our proposed hyperspectral image super-resolution reconstruction framework MSDESR. Another 128 9 128 region was randomly selected for validation. And the remainder was selected for training. To create the super-resolution reconstruction training set, the hyperspectral remote sensing images are used as the base HR image. The LR images are created by simulating the blur method of the remote sensing image, adding Disk blur, and using double triple interpolation. To achieve varying blur levels, various disk blur kernels are performed on the LR images (Jiang et al. 2022).

Experimental details
The framework was trained on an ubuntul10. 4 system using an NVIDIA GTX1080Ti GPU device with 28 G of memory. The network was implemented by the opensource PyTorch deep learning framework. In our proposed network, zero padding is used to keep all feature maps the same size. The network parameters were optimized using the Adam correction method, with the initial learning rate set to 0.001. Attempts were made to set the learning rates to 0.1 and 0.01. The experimental results show that when the initial learning rate is 0. 001, the best results are obtained compared to the other.

Sub-map shunt network parameter setting
The designing of lightweight sub-map triage network is inspired by the DenseNet model. Because hyperspectral images have the characteristics of wide range but limited spatial resolution compared with natural images, a smaller number of network channels are usually adopted for feature extraction of HSI in research process. To ensure the accuracy of sub-map shunt, we choose to use a small number of 128 channels for feature extraction in the Fig. 7 Partial data presentation for each dataset nonlinear mapping part of the sub-map shunt network. A kernel size of 3 9 3 is used for the convolutional layer. However, too many nonlinear mapping layers produce an overfitting phenomenon, and too few nonlinear mapping layers lead to insufficient feature mapping, which makes the sub-map triage less accurate. So the number of convolutional layers of the nonlinear mapping part of the submap divergence network is set to 0, 1, 2, 3 and 4 for the experiments. Table 4 shows the overall accuracy (OA) and kappa coefficients of different nonlinear mapping layers. The best performance of the sub-map shunt network was obtained when the number of nonlinear mapping layers of the sub-map shunt network was set to 2. Therefore, in the next experiments, the number of nonlinear mapping layers of the sub-map shunt network was set to 2.

Parameter setting of partition image reconstruction
The Partition image reconstruction part of MSDESR consists of 3 independent branches consisting of Simple, Medium and Complex; each branch is based on the cascaded residual channel attention network RCAN superresolution network. The kernel size of the convolutional layer is 3 9 3. The number of channels per branch increases the nonlinearity, but also introduces too many weighting parameters. By reducing the number of channels per convolutional layer, the minimum number of layers and convolutions that can complete all training tasks is found as the maximum branch. The starting channel numbers of Simple, Medium and Complex branches were set to (30, 32, 34, 36), (48, 50, 52, 54) and (62, 64, 66, 68) to perform the experiments. Table 5 shows the PSNR for the different number of channels in the Partition image reconstruction section. The best performance was obtained for each branch when the starting number of channels of Simple, Medium, and Complex branches were set to 32, 50, and 64, respectively. Therefore, in the next experiments, the starting number of channels of Simple, Medium, and Complex branches in the region image reconstruction section was set to 32, 50, and 64.

Evaluation metric
To comprehensively evaluate the performance of the proposed method, a combination of currently accepted evaluation metrics in SR algorithms and visual effects is used for validation. The evaluation metrics include signal to reconstruction error (SRE), mean peak signal-to-noise ratio (MPSNR), mean structural similarity (MSSIM), and mean root mean square error (MRMSE) (Xue et al. 2021). SRE (signal reconstruction error) is a global indicator that reflects the image quality based on the signal error between the reconstructed image and the live image of the scene, which can be expressed as where I SR is the real image, I HR is the image after hypersegmentation reconstruction, l is the number of bands of the input image, l I SR is the average value of I SR , and the larger SRE means the smaller reconstruction error and the higher reconstruction accuracy. MPSNR is an objective criterion for evaluating images by calculating the mean square error of the hyper-segmented reconstructed image and the real image, which can be expressed as where Max k is the maximum intensity in the kth band, w 9 h is the number of pixels of the image, and the higher value of MPSNR indicates that the reconstruction quality of the image is higher and the difference with the original image is smaller. MSSIM emphasizes the structural consistency of the generated image with the ground-truth image. It can be expressed as where l SR and l HR denote the mean values of the real image and the hyper-segmented reconstructed image, respectively, r SR and r HR denote the standard deviation of the real image and the hyper-segmented reconstructed image, and c 1 and c 2 are constants. The higher MSSIM value indicates the higher quality of the generated image. MRMSE is a global metric that reflects the image quality based on the signal error between the generated image and the live image of the scene.
A smaller value of RMSE indicates a higher quality of the generated image.

Results and discussion
In this section, a series of experiments are conducted on five benchmark datasets, namely the QUST-1 satellite dataset, the Pavia University dataset, the Chikusei dataset, the Washington DC Mall dataset, and the XiongAn dataset, to evaluate the performance of the proposed framework in this paper. Five state-of-the-art deep learning methods with public codes were also selected for comparison as a baseline at a fourfold upsampling factor, which are FSRCNN (Dong et al. 2016), LapSRN (Lai et al. 2017), RCAN (Zhang et al. 2018), ClassSR (Kong et al. 2021), and SSPSR (Jiang et al. 2020). Among them, FSRCNN, LapSRN, and RCAN are CNN-based natural image SR methods representing different levels of depth and network structure; ClassSR is an advanced reconstruction method using different processing strategies for images, while SSPSR is a CNN-based HSISR method. The experimental results are shown in Table 6, and the corresponding visualization results are shown in Figs. 8,9,10,11,and 12. The QUST-1 satellite dataset is shown in band 1, the Pavia University dataset in band 49, the Chikusei dataset in band 93, the Washington DC Mall dataset in band 157, and the XiongAn dataset in band 188. The red rectangular area in the bottom right corner of the image indicates that the image has been magnified by a factor of three.
The results of experiment from the QUST-1 satellite data in Table 6 show that the proposed hyperspectral image super-resolution reconstruction framework MSDESR achieves the best performance in terms of the SRE, MPSNR, MSSIM, and MRMSE indexes, which are 41.78, 33.67, 0.64, and 7.78, respectively. The SRE, MPSNR, and MSSIM are improved by 2.42, 5.80, and 0.06, respectively, compared with the FSRCNN. The results from the Pavia University data show that MSDESR still has better reconstruction results compared to other methods. In comparison with the FSRCNN, the SRE increased by 2.41, the MPSNR increased by 3.16, the MSSIM decreased by 0.07, and the MRMSE decreased by 1.35. For the Chikusei dataset, compared with FSRCNN, the proposed model also achieves the optimal reconstruction results, with SRE, MPSNR, and MSSIM improved by 2.92, 2.40, and 0.05, respectively, the MRMSE was reduced by 1.61. On the Washington DC Mall dataset, the proposed MSDESR works best on SRE, MPSNR, and MRMSE, except for the MSSIM metric. MSDESR also achieves better results on the Washington DC Mall dataset. On the SRE and MPSNR results, MSDESR improved by 0.2 and 0.39 dB compared with the second best model on SSPSR.
When dealing with relatively simple, irregular natural pattern data, such as the QUST-1 satellite dataset and XiongAn dataset, which is dominated by natural scenes such as plains, fields and small towns, the sub-map shunt block of the proposed method divides most sub-map into a category that is easy to reconstruct. The subtle texture and spectral information are preserved properly by the multiscale Retinex structure with detail enhancement, which is transmitted to the simple branch of the regional image reconstruction module for reconstruction. The local image information is smoother and achieves better results with less computation. The Pavia University dataset and Washington DC Mall dataset consist of urban scenes in various frequency bands, which are mainly regular artificial patterns. When dealing with this kind of data, the sub-map shunt structure divides most of the sub-maps into the category of difficult-to-reconstruction. Through the structure of multiscale Retinex with detail enhancement and the complex branches of regional image reconstruction module, the edge and texture detail features of different frequency bands are extracted more comprehensively and involved in the reconstruction calculation, so as to avoid the jagged boundary of the image and depict a more realistic shape. Of course, when dealing with moderately complex data, such as the Chikusei suburban-urban combination, the proposed method can also make flexible processing and achieve the optimal computation and performance.
The visualization results are shown in Figs. 8,9,10,11 and 12. And the MPSNR curves for all the band spectra are shown in Fig. 13. These results further demonstrate the superiority of the proposed framework. Figure 8 depicts the FSRCNN effect plot of the experimental results of the QUST-1 satellite dataset. The FSRCNN fails to extract and recover the gaps between small buildings. Although these gaps can be seen in the LapSRN and RCAN reconstruction results, they are not obvious and difficult to detect. The proposed framework preserves the details and makes them   Hyperspectral image super-resolution reconstruction based on image partition and detail… 13471 more visible by comparison. In the experimental results of Pavia University datasets in Fig. 9, there is a long black outline building in the ground truth image, which is difficult to distinguish from the results of other methods. In contrast, this long black outline can still be observed in the result generated by the proposed method. Figure 10 shows the experimental results of the Chikusei dataset. The two intersecting field paths in the real image are difficult to distinguish in the results of the other methods, or only one of them can be recognized with difficulty. In contrast, these two paths are still visible in our results, with clear details and a consistent structure. This means that the proposed MSDESR can take advantage of the detailed texture features of hyperspectral remote sensing images and reconstruct delicate textures more fully. The results indicate that our network can better preserve spectral information and maintain detailed features. In Fig. 11, we additionally chose two building domains with relatively obvious boundary information to emphasize the spatial differences. The boundary of the gap reconstructed by FSRCNN, LapSRN, RCAN, ClassSR, and SSPSR is fuzzy, while the gap reconstructed by the proposed MSDESR is the closest to the ground truth. The proposed MSDESR method in this paper consistently recovers more detailed edge structure information in the magnified image and obtains better spatial enhancement. As shown in the red rectangles in Fig. 12, the proposed MSDESR performed better in recovering irregular natural structure information and surrounding details than the others. Figure 13 shows the MPSNR performance for each band for a randomly selected point in each data set, the proposed method consistently outperforms the other methods. When local spatial details are considered, the results show that the algorithm in this paper can better learn and strengthen spatial features and improve the coherence of structural features. As a result, it can be concluded that the proposed algorithm performs better on datasets acquired by both CMOS (complementary metal-oxide-semiconductor) and ROSIS (reflective optics system imaging spectrometer) sensors.
In the practical application of hyperspectral remote sensing data, in addition to spectral data, corresponding spatial morphological information is also necessary. The spatial resolution of hyperspectral remote sensing data cannot meet the demand for spatial morphological information of the features. Although researchers investigated the use of cascaded residual methods to improve the spatial resolution, treating data with uneven information distribution equally cannot make symptomatic reconstruction for spatial morphological information with different features. The results of experiment show that the proposed method is not only capable of preserving the true spatial morphological information of features for symptomatic reconstruction without compromising the spectral information, but also effectively makes use of the computational load. It allows for further improvements in spatial resolution and improves the accuracy of feature recognition.

Discussion on the proposed framework: ablation study
The importance of each component of the proposed framework is validated through ablation experiments. Without the sub-map shunt network branch, without the high-low-frequency information extraction branch, and without the improved high-low-frequency information extraction branch are the settings of the various modules in the ablation experiments. Except for the ablation module, all three comparison methods use the same settings as the proposed model.    using the improved detail enhancement algorithm improved the MPSNR values by approximately 6.4% compared with that of the unimproved algorithm, demonstrating the effectiveness of the detail enhancement algorithm. The super-resolution reconstruction metrics of the proposed framework outperform that of other ablation experimental models, demonstrating the importance of each component.

Conclusion
In this study, a hyperspectral image super-resolution reconstruction framework MSDESR is proposed to improve the spatial resolution of hyperspectral remote sensing images. For the characteristics of uneven distribution of spatial information in hyperspectral remote sensing images, we adopt the strategies of subgraph triage network for triage, multi-scale Retinex algorithm with detail enhancement, and partitioned image reconstruction blocks. With this approach, the issue of a high computational effort in HSISR work is resolved, and the challenge of feature extraction and reconstruction is diminished. To improve the efficiency of image super-resolution reconstruction, we proposed the sub-map shunt network. It accurately classifies sub-maps of different spatial complexity according to the detailed texture information between images. In addition, the high-and low-frequency information extraction and enhancement module fully preserves texture detail information and eliminates unnecessary noise. It first separates high-and low-frequency information, then enhances the details of highfrequency information using a multi-scale approach. The effectiveness of each part of the proposed method in terms of spatial resolution enhancement is verified by a series of ablation experiments. The experimental results show that the proposed MSDESR outperforms the other state-of-theart methods in terms of quantitative metrics and visual effects. Overall, MSDESR provides a reliable solution for the spatial reconstruction of hyperspectral remote sensing images, which can reconstruct realistic textures and maintain high computational efficiency. MSDESR is good at spatial resolution improvement and detail enhancement, but the processing of spectral information is not enough. During the hyperspectral super-resolution reconstruction, it is also important to ensure the integrity and continuity of spectral information. Therefore, our future work would mainly focus on the correlation between the spectral bands, which is also an important character for hyperspectral image, to improve the quality of image reconstruction.