In this paper, the causal relationship of PM2.5 between Wuhan and other prefecture-level cities in Hubei Province was studied Hubei Province is located in the central region of China, between 29 02 '~ 33 07' North Latitude and 108 22 '~ 116 08' East Longitude, with a length of about 740 kilometers from east to west and a width of about 470 kilometers from north to south, with a total area of 185,900 square kilometers. Wuhan is located at the intersection of Yangtze River and Han River in the east of Hubei Province, with a geographical position of 29 58 '~ 31 22' north latitude and 113 41 '~ 115 05' east longitude. Hubei Province is located in the transitional zone between north and south climate, which has the characteristics of monsoon climate. Four seasons are distinct, rain and heat are abundant in the same season, and the distribution of time and space is uneven Its annual average temperature is 16.7 ℃, and the annual average precipitation is 1200.7 mm. The overall trend of rainfall decreases from south to north. Most of southwestern Hubei reaches 1300 ~ 10 mm in southeastern Hubei, and the lowest in northwestern Hubei is 770 ~ 935 mm. Precipitation varies greatly from year to year The western region of Hubei Province is basically dominated by mountainous terrain, which is rugged and undulating The area east of Jingmen and Jingzhou in Xiangyang is basically dominated by plain and hilly terrain, in which Jianghan Plain, which is alluvial by Yangtze River and Han River, is the most important component, located in the south-central part of Hubei Province. Climatic characteristics and topographic distribution characteristics affect the distribution pattern of PM2.5 and other pollutants to a certain extent.

## Source of data

The PM2.5 concentration data used in this paper comes from the data published on the website of China Meteorological Administration in real time. Through the statistics of monitoring stations in Hubei Province, the hourly average value of monitoring stations is calculated every 8 hours, and the winter data from January 3, 2015 to February 28, 2023 is selected as a group every three years to process the valid data after eliminating the invalid data.

## Research method

In this paper, cross-convergence mapping (CCM) is used to study the temporal and spatial variation characteristics of PM2.5 concentration and its influencing factors CCM is a method for causal relationships between weakly or moderately coupled variables with nonlinear characteristics18. The assumption is that if variable X is the influence factor of variable Y, then Y will contain the information of X, and there is a causal relationship between variable X and variable Y The causal relationship between variable X and variable Y can be measured by the relationship between the reconstructed flow patterns and the correlation coefficient Specifically, the discrete time series of two samples of length L from the same dynamic system are:

{X} = {X (1), X (2), …, X (L)}, {Y} = {Y (1), Y (2), …, Y (L)}

Assuming that the embedding dimension of reconstruction space is E and the time interval is τ (default τ = 1)20, the time-lag embedding vector of X (t) Y (t) is:

$$\begin{array}{c}X\left(\text{t}\right)=\left\{\text{x}\left(\text{t}\right),\text{x}\left(\text{t}-{\tau }\right),\cdots ,\text{x}\left[\text{t}-\left(\text{E}-1\right){\tau }\right]\right\} \left(1\right)\end{array}$$

$$\begin{array}{c}Y\left(\text{t}\right)=\left\{\text{y}\left(\text{t}\right),\text{y}\left(\text{t}-{\tau }\right),\cdots ,\text{y}\left[\text{t}-\left(\text{E}-1\right){\tau }\right]\right\} \left(2\right)\end{array}$$

X(t) and Y(t) respectively correspond to the shadow flow patterns MX and MY of the reconstructed space at time t.

When variable X is the influencing factor of variable Y (X→Y), Y will contain the information of If there are L neighboring points in the shadow flow pattern MX, L points in the sequence Y(t) can be identified through the estimation of cross-convergence mapping, then the accuracy of the estimated values of these L neighboring points can reflect the variable X. The degree of causation on variable Y. CCM uses the Pearson correlation coefficient ρ to indicate the accuracy of the estimation of the causal relationship between X and Y21, 22 (the higher ρ, the more accurate the estimation).

$$\begin{array}{c}\rho =\frac{\sum _{\text{i}=1}^{\text{n}}\left({\text{x}}_{\text{i}}-\stackrel{-}{\text{x}}\right)\left({\text{y}}_{\text{i}}-\stackrel{-}{\text{y}}\right)}{\sqrt{\sum _{\text{i}=1}^{\text{n}}{\left({\text{x}}_{\text{i}}-\stackrel{-}{\text{x}}\right)}^{2}}\sqrt{\sum _{\text{i}=1}^{\text{n}}{\left({\text{y}}_{\text{i}}-\stackrel{-}{\text{y}}\right)}^{2}}} \left(3\right)\end{array}$$

The value of ρ only indicates the estimation accuracy of the causal relationship between X and Y, while the causal relationship between X and Y needs to be further judged by convergence. The convergence of CCM shows that with the increase of time series length L, the estimation accuracy improves continuously and does not change after reaching a certain value. If there is an obvious causal relationship between variable X and variable Y, theoretically the value of ρ will converge with the length of time series.

In addition, the correlation coefficient of CCM needs significance test. If it passes the significance test, it can be proved that the convergence value of the correlation coefficient of causality between X and Y is the measure value of the intensity of causality between X and Y In 2015, Clark et al.23 introduced bootstrap resampling technology and iterative technology into CCM for significance test and proposed multispatial CCM By repeatedly calculating the CCM correlation coefficient of each iteration and comparing the ρ value in the longest period Lmax and the shortest period Lmin, it is determined whether the P value varies with the time series L, and then the statistical significance probability of CCM is calculated. See the formula for calculation:

$$\begin{array}{c}Pro=\frac{\left(\text{N}-\text{M}\right)}{\text{N}} \left(4\right)\end{array}$$

Where N denotes iteration M denotes that the value of P is greater than that of the shortest time series Lmin in the longest time series Lmax. The calculated Pro value indicates the probability that Y is not the cause (Y⇏X) of X (the higher the Pro value, the greater the probability that Y has no cause relationship with X). It is worth mentioning that different iterations will produce different significance test results24. The more iterations, the higher the accuracy of significance test of CCM correlation coefficient, and the higher the accuracy of estimation. Therefore, the more iterations in experimental research, the more beneficial it is to the accuracy of data Clark et al.23 proposed that iterations usually need more than 100 times. Therefore, this article sets the iteration to 750 times. At the same time, the confidence interval can be estimated by normal distribution. In this paper, the significance level is 0.2, and the confidence interval is 80%.