Automated detection and statistical study of solar radio spikes

The most typical observational features of solar radio spikes are their short duration and narrow bandwidth. We have improved the YOLOv5s network model for these characteristics by adding inclined bounding frames and attention and feature fusion mechanism modules. The decimeter- and meter-wavelength spikes observed by the Solar Broad-band Radio Spectrometer in Huairou and the Chashan Solar Radio Observatory spectrograph are used to carry out experiments, respectively. The results demonstrate that the AP value obtained by the improved network is 74%, which is almost 14% higher than the original network. The improved network detects 9709 (1379) decimeter- (meter-) wavelength spikes in two events with durations, bandwidths, relative bandwidths, and frequency-drift rates. The spikes at decimeter and meter wavelengths are again categorized based on their frequency-drift rates, such as positive, negative, and no measurable frequency-drift rates. We have carried out a statistical study on these categorized spikes. These statistical results and findings constrain solar radio spikes’ formation.


Introduction
Solar radio spikes are most typically characterized by short duration and narrow bandwidth. They appear on the solar radio dynamic spectrogram as a large number of narrow-band type-III bursts, spikes, dots, sub-second clumps, groups, and chains, as well as other narrow-band structures from decimeter to decameter wavelengths (Benz and Simnett 1986;Fleishman and Mel'nikov 1998;Chernov et al. 2012;Bouratzis et al. 2016;Feng et al. 2018;Feng 2019;Tan 2013aTan , 2013bTan et al. 2019;Cliver et al. 2011;Clarkson et al. 2021). Solar radio spikes are generally considered to be radio emissions produced by energetic electrons at or near their acceleration region, and the corresponding emission mechanism may be plasma or Electron Cyclotron Maser (ECM) emissions (Benz and Simnett 1986). Their observational parameters reveal the temporal/spatial structure of particle acceleration processes in solar flares and/or plasma/magnetic field characteristics of acceleration/radiation regions (Cliver et al. 2011). Therefore, solar radio spikes have been attracting researchers' attention, and statistical studies on their observational parameters have also come one after another (Chernov et al. 2001;Rozhansky et al. 2008;Antonov et al. 2014).
Solar radio spikes can be observed in wavelengths from decimeter to decameter, and statistical studies on their observational characteristics have been performed in all these wavelengths. In decimeter wavelengths, their durations, bandwidths, and relative bandwidths ( lnf) are 5-18 ms, 10-30 MHz, and <0.7-3.5%, respectively (Chernov et al. 2001;Dąbrowski et al. 2005;Tan 2013aTan , 2013b, the distribution of relative bandwidths shows significant asymmetry (Rozhansky et al. 2008), and they have nearly the same probability of positive and negative frequency-drift rates not only in the rising flare phase but also in the peak and decay phases. In meter wavelengths, the durations, bandwidths, and relative bandwidths are 10-100 ms, 0.5-15 MHz, and ∼2%, respectively, and the frequency-drift rates can be positive, negative, and no measurable (Benz and Simnett 1986;Feng et al. 2018;Feng 2019). In decameter wavelengths, the durations, bandwidths, and relative bandwidths are 0.8-2.2 s, S.W. Feng winfeng@sdu.edu.cn 1 40-100 kHz, and ∼0.2%, respectively, and the bandwidths increase linearly with frequency (Fleishman and Mel'nikov 1998). It can be seen from the above studies that as the observational wavelengths become longer, the durations increase, the bandwidths decrease, and the relative bandwidths remain basically unchanged. The relationship between duration and frequency is the power law of the form, τ ∼f n , where the size of n is -1.34±0.14 (Güdel and Benz 1990) and 1.29±0.08 (Sirenko and Fleishman 2009).
For the works as mentioned earlier, the used data have a small frequency range, for example, only at decimeter or meter wavelengths, and the adopted methods to detect the spikes are also different from one another. It is difficult for them to provide parameters of spikes over a wide range of frequencies to statistically investigate these parameters' variation laws. These studies also needed to have analyzed in detail the similarities and differences of spikes at decimeter and meter wavelengths and to statistically study the parameter changes of the positive, negative, and no measurable frequency-drift spikes in the two wavelengths, respectively.
The number of spikes is vast, and spikes occur intensively in a short period. It needs to be simpler to process the observational data of spikes manually. Automatic identification technology and methods can be used to solve the problem. With the development of artificial intelligence, deep learning has been widely used in several research fields in solar physics (e.g., Abduallah et al. 2022). Recently, Hou et al. 2020 identified and extracted the solar radio spikes using the method of Faster Region-based Convolutional Neutral Network (Faster R-CNN) successfully, giving the AP value 91%.
The object detection method automatically detects and extracts multiple targets in complex scenarios. The YOLOv1 network divides the entire image into n×n grids as input. Each grid predicts the location and category probability of its bounding frame, i.e. [x, y, w, h, conf ], where, x and y are the position coordinates of the target center point within the normalized bounding frame, w and h are the width and height of the bounding frame, and conf is the confidence score. The network enables end-to-end object detections and identifications, but it reduces the detection accuracy when improving the detection speed. To improve the detection accuracy of the YOLOv1, the researchers focused on improving the detection head on its basis and proposed the YOLOv2-YOLOv5 network in turn. YOLOv5, with various sizes of the network model, is more accurate than the Faster R-CNN, with the highest accuracy of the two-stage model. Moreover, the practice has shown that YOLOv5s is more approachable for small object detection than the Faster R-CNN. 1 We chose the YOLOv5s network to automatically detect solar radio spikes and improve its network structure depending on their typical characteristics: adding inclined bounding frames, and attention and feature fusion mechanism modules.
The used data and built datasets are given in the next section, and the third section is the improved YOLOv5s network, the fourth section gives the statistical analysis of solar radio spikes, and the last section is the summary and discussion.

Data and datasets
The solar radio spikes at the decimeter wavelengths are observed by SBRS (Solar Broadband Radio Spectrometer)/Huairou. SBRS/Huairou provides the microwave broadband dynamic spectra of the burst at a frequency of 1.10-1.34 GHz (cadence of 1.25 ms and spectral resolution of 4 MHz), with dual circular polarization (left and right circular polarization). The observation sensitivity is S/S ≤2%, where S is quiet solar background emission (Fu et al. 2004). A total of six solar radio spike events are recorded between 2004 and 2005. Their occurrence dates and accompanying soft X-ray flares are listed in Table 1.
Solar radio spikes at the meter wavelengths are from the Chashan Solar Observatory (CSO) solar dynamic radio spectrograph. The station is located at 122.3°E and 36.8°N. The newly built solar radio observational system includes a 6-meter parabolic dish fed by a dual-polarized log-periodic antenna. The time and frequency resolutions are 10 ms, and 160 kHz, the observing frequency range is 150-500 MHz, and the dynamic range is ≈50 dB (Du et al. 2017;Feng et al. 2018). The solar radio spike event was observed on 2016 July 18.
To uniformly detect and analyze the data from these two stations, they are segmented, thresholded, and normalized. The processed images contain 400×400 pixels, as shown in Fig. 1(a). After putting them into the YOLOv5s network, they are automatically enlarged to 416×416 pixels. For CSO (SBRS/Huairou) data, each pixel in the images represents a time interval and a frequency range of 4.25 ms (1.25 ms) and 0.18 MHz (0.6 MHz). Finally, 479 images with solar radio spikes are obtained as the training and testing sets. Fig. 1 The processed image of solar radio spikes (a) and the representation method of the inclined bounding frame in OpenCV (b). (x, y) represents the center coordinate of the inclined frame, w is the first side encountered by the X-axis when the X-axis rotates counterclockwise, h is the other side, θ is the angle that the X-axis rotates counterclockwise to the w side, the upper left corner of the inclined frame is the origin of coordinates To augment the datasets, the processed images need to be rotated. Looking at the pictures of spikes, it is found that the vast majority of spikes are tilted at an angle of ≤10°a way from the upward direction. Therefore, the images are rotated around their centers by 20°and -20°, respectively. Subsequently, the images are divided into training and testing sets in a ratio of 3:1. It should be noted that the number of spikes with different tilted angles is expanded after the rotation transformation. This effectively prevents the problem of uneven distribution of inclination angles of spikes in the datasets.
In the YOLOv5s network, the position and size of the vertical bounding box are usually represented by four parameters of x, y, w and h. Considering that the frequency of spikes drifts with time, we add a rotation angle into the network to characterize the magnitude of the frequency-drift rates and replace vertical bounding frames with inclined bounding frames. In the process of establishing the testing sets, Labelme is used as the data set labeling tool. By labeling polygons, a .json file recording each vertex of the polygons is obtained. Through the minAreaRect() function in the OpenCV tool library, the coordinates of detected vertical bounding frames are converted into inclined bounding frames in OpenCV, i.e. [x, y, w, h, θ ] form. Among them, the horizontal direction is the X-axis, the vertical direction is the Y-axis, (x, y) represents the center coordinate of the inclined frame, w is the first side encountered by the X-axis when it rotates counterclockwise, h is the other side, θ is the angle that the X-axis rotates counterclockwise to the w side, and the upper left corner of the inclined frame is the origin of coordinates, as shown in Fig. 1(b). It should be noted if the long side is encountered first when the X-axis rotates counterclockwise, the values of w and h are exchanged. Finally, [x, y, w, h, θ ] is saved as a .txt file required by the label in YOLOv5s.
In the process of labeling datasets, the Circular Smooth Label (CSL) is used to set the angled label. The angle range of 180°is regarded as multi-label classification, the Gaussian function is the window function, and the window function with a size of 6°is used to smooth the labels. In order to reduce the number of parameters, the interval size of the ring angle in the range of 71°-130°is set to 1, and the rest is 10°. In this way, the number of prediction frames is reduced from 180 to 72, and the total number of parameters of a YOLOv5s network is reduced by about 3 million.
Based on the marked parameters, the representation and calculation of spike parameters can be given. The duration (τ ) is k 1 ·w, the bandwidth ( f) equals to k2·h, the relative bandwidth ( lnf) is f/(f 0 +k 2 ·y), and the frequency-drift rates is equal to tan(θ )·k 2 /k 1 , where, f 0 refers to the frequency at the upper side in Fig. 1(b), and f 0 , k 1 and k 2 are 1.1 GHz, 1.25 ms and 0.6 MHz (500 MHz, 4.25 ms, and 0.18 MHz) for SBRS/Huairou (CSO) data, respectively.

Improved YOLOv5s network
In order to increase the detection accuracy and speed, the YOLOv5s network is improved for the typical observational characteristics of spikes. In view of various morphologies, short durations, and narrow bandwidths, attention, and feature fusion mechanisms are added to the network.

Attention and feature fusion mechanisms
The morphology of spikes is not always square-like, often in the shapes of "J," "U," or "V," so the attention mechanism is introduced. Feature maps with the attention mechanism are superimposed by channel and compared to them without the attention mechanism. Figure 2(a) shows one of the feature images without the attention mechanism, and Fig. 2(b) is a feature map after adding the attention mechanism. It can be seen that after adding the attention mechanism, the spikes Spikes have a short duration and narrow bandwidth, and their size is small in the images. Therefore, the feature fusion mechanism is introduced. In general, a shallow network contains smaller target information. To extract small target features as much as possible, the Neck layers are changed. The shallow and the deep structures belonging to the same level are merged together. The changed network structure is shown in Fig. 3.

Estimation
Once detect the inclined bounding frames of spikes are present, it is necessary to evaluate and analyze the effectiveness of the detection. AP value is used as an evaluation indicator. It's determined by the Precision-Recall curve and is equal to the area enclosed by the curve. Among them, Precision is TP/(TP+FP), representing what proportion of the target detected by the model is real, that is, the real spikes. The recall is equal to TP/(TP+FN), which means what proportion of the real target can be detected by the model. In the formula, TP, FP, and FN are the spikes numbers with positive predictions and labels, positive predictions but negative labels, and negative predictions but positive labels, respectively.
The datasets are input into the original and improved YOLOv5s network. 2 The models are trained according to the official pre-training weights file of YOLOv5s.pt 3 with a training time of 1200. The detection results from these two models are compared. The results show that the AP value of the improved YOLOv5s network increases from 61% to 74%. The relationship between Precision and Recall is shown in Fig. 4(a). In the figure, when Recall is 0.64, Precision equals 0.85. Figure 4(b) shows a comparison of the detection results. The figure contains four sub-images; the two left  columns show the labeled spikes in datasets, and the two right columns are detected spikes by the improved network. It can be seen that the improved network performs well in terms of small targets detection accuracy and inclined bounding frames regression precision. Finally, the trained improved YOLOv5s network has a size of 15 MB, which realizes the detection with the lightweight network. In addition, when using the improved network to detect radio spikes, the FPS (frames per second) reaches 10.6, which means the average detection time per image is ∼0.094 seconds.

Statistical analyses of solar radio spikes
Based on the improved YOLOv5s network, we have carried out automatic detections of the two solar radio spike events that occurred on January 20, 2005, and 2016 July 18. The improved network detected 9709(SBRS/Huairou) and 1379(CSO) spikes in these two events. It should be noted that the observational data of SBRS/Huairou and CSO are at decimeter and meter wavelengths, respectively. These detections provide convenience for comparative and statistical studies on the parameters of solar radio spikes at the decimeter and meter wavelengths. Table 2 lists the statistical results of spike parameters at decimeter and metric wavelengths for the two events, including the minimum, maximum, and mean values of the duration, bandwidth, and relative bandwidth. Among them, the average values of duration, bandwidth, and relative bandwidth at the decimeter wavelengths are 11.7 ms, 22.9 MHz, and 1.9%, respectively. The three parameters at the meter wavelengths are 48.9 ms, 4.3 MHz, and 1.2%. It can be seen that the durations at the decimeter wavelengths are about one-quarter of those at the meter wavelengths, the bandwidths at the decimeter wavelengths are 4-5 times as large as those at the meter wavelengths, and the relative bandwidths are almost the same at the two wavelengths. Table 3 shows that, for the duration, bandwidth, and relative bandwidth, the minimum, mean, and maximum values with positive frequency-drift rates are comparable to the negative ones. The mean duration (11.2 ms) and bandwidth (21.8 MHz) of no measurable frequency-drift spikes are slightly smaller than those of the positive (12.1 ms and 25.1 MHz) and negative (12.2 ms and 23.7 MHz) frequency-  . 5 Histograms of spikes at decimeter (the left column) and meter (the right column) wavelengths. The first, second, and third rows are the duration, the bandwidth, and the relative bandwidth distributions, respectively. The curves are from the Gaussian fitting drift rates. This is the reason why their frequency-drift rates are not measurable.
In Table 4, all the same types of parameters are similar to each other.
In order to analyze the distribution of spikes, we plot histograms of their durations, bandwidths, and relative bandwidths, as shown in Fig. 5. The figure shows that at the decimeter wavelengths, the durations with a good symmetrical distribution mainly lie in 11-12 ms, and the numbers larger and smaller than this time period are similar. The bandwidths and relative bandwidths show obvious asymmetry, and are mostly at 20 MHz and 2%, respectively. Their numbers less than the two values are smaller, and more than them are larger.
At the metric wavelengths, the durations also show a good symmetrical distribution, and most of the spikes are within 40-50 ms. For the bandwidths and the relative bandwidths, the values are not equally distributed. Figure 6 shows the duration declines with the increase in frequency. When using the power law relationship of τ ∼f n to fit the change of duration with frequency, an n of -0.949±0.004 is obtained. The bandwidth increases with frequency and declines with duration, and the relative bandwidth remains basically unchanged with frequency and duration.

Summary
The YOLOv5s network is improved with respect to the typical characteristics of spikes. In the improved network, to detect their frequencies drifting with time, inclined bounding frames are added, and in view of various morphologies, short durations, and narrow bandwidths, attention, and feature fusion mechanism modules are added. The decimeterwavelength and meter-wavelength spikes observed by the SBRS/Huairou and the spectrograph in the CSO are used to carry out experiments, respectively. The results demonstrate that the AP value obtained by the improved network is 74%. The value is almost 14% higher than that of the original network.
The improved network detects 9709 (1379) decimeter-(meter-) wavelength spikes in two events that occurred on 2005 January 20 (2016 July 18) with durations, bandwidths, relative bandwidths, and frequency-drift rates. According to the frequency ranges and frequency-drift directions, the spikes are classified. We carry out a statistical study on the categorized spikes and find the following main results: (1) The duration declines with the increase of frequency, the bandwidth increases with frequency and declines with duration, and the relative bandwidth remains basically unchanged with frequency and duration. The duration (band- Fig. 6 Changes of spike parameters with frequency and duration. The first, second, and third rows are the relationships of duration-frequency, bandwidth-frequency (duration), and relative bandwidth-frequency (duration), the lines are obtained from linear fitting width) at the decimeter wavelengths is about one-quarter (4-5 times) of that at the meter wavelengths. The duration at the decimeter (meter) wavelengths has a good symmetrical distribution and mainly lies in 11-12 (40-50) ms. The bandwidth and relative bandwidth at the two wavelengths show asymmetrical distributions.
(2) At the decimeter wavelengths, spikes with no measurable frequency-drift rates have the largest number, the negative is the second, and the positive is the last. The mean duration (11.2 ms) and bandwidth (21.8 MHz) of no measurable frequency-drift spikes are slightly smaller than the means of positive (12.1 ms and 25.1 MHz) and negative (12.2 ms and 23.7 MHz) frequency-drift spikes. At the meter wavelengths, spikes with positive and negative frequency-drift rates are almost similar, and no measurable frequency-drift rates are the least.
The duration-frequency dependence is usually expressed by a phenomenological power law of the form τ ∼f n . The n is equal to -0.949±0.004 from our calculation. The results are consistent with the empirical relations proposed by Güdel and Benz 1990, Mészárosová et al. 2003, and Rozhansky et al. 2008. It should be noted that these values are remarkably similar to -0.97±0.03 of metric and decametric type III radio bursts presented by Kontar et al. 2019, Zhang et al. 2021. This implies like type III radio bursts, the emission mechanism of the spike is plasma emission.