Heterogeneous selective ensemble learning model for mill load parameters forecasting by using multiscale mechanical frequency spectrum

A ball mill is a heavy mechanical device and its safe operation affects the entire grinding process. Mill load is a key index in the optimum operation of the grinding process, but it cannot be measured directly. In industrial practice, operational experts normally estimate its value based on their experiences and the mechanical signals produced by the ball mill. In this paper, we proposed a heterogeneous selective ensemble method by using a multiscale mechanical frequency spectrum. The multicomponent adaptive decomposition algorithm is first used to decompose the original shell vibration and acoustic signals into sub-signals with different timescales. Then, selective ensemble (SEN) kernel projection to latent structure algorithm is used to model the spectral data of these sub-signals. Furthermore, the latent features of multiscale spectra are extracted to construct SEN models based on fuzzy inference. Finally, the two types of heterogeneous SEN models are fused by using information entropy. The main contribution of this study is that the proposed soft-sensing model has a dual-layer ensemble structure that can fuse multi-source information in different mechanical sub-signals with physical meaning. Moreover, the proposed model can simulate the fuzzy cognitive behavior of domain experts in the mineral grinding process. The effectiveness of the method is verified by the shell vibration and acoustical data of a laboratory-scale ball mill.


Introduction
Mill load is directly related to quality, efficiency, energy consumption, material consumption, and safe operation of the grinding process (Tang et al. 2018a). Overloading can cause splitting of ore, coarsening of mill output, blocking of the mill, and belly being full. Conversely, the low load can cause the ''running with only ball,'' which leads to energy and steel consumption, and may even cause damage to milling devices. All of these conditions can disrupt the safety of the production process. Thus, accurate forecasting of the mill load is one of the key factors in realizing the optimum operation of the grinding process (Zhou et al. 2009). As ball mills are a type of continuous rotatory closed mechanical device, the mill load (i.e., material, ball, and water loads inside the mill) cannot be measured directly. Moreover, the material and water loads are continuously changed with loop, and corrosion and wear mass of the ball load are not known. Therefore, estimating the mill load with an indirect approach is a necessary task.
In the actual mineral process, mechanical vibrations and acoustic signals generated during mill grinding are typically used by domain experts to estimate the mill load. The reason is that there is a non-deterministic nonlinear mapping relation between multi-source mill shell vibration/ acoustic signals and the mill load. Therefore, operational experts in the mill grinding process can fuzzy estimate the mill load for their familiar specific mill based on the ''human brain model'' by virtue of multi-source information on the industrial practice and experiential knowledge they have accumulated over the years. However, most of the former methods cannot simulate the fuzzy cognitive behavior of the domain experts. In nature, human ears function as a pair of adaptive band-pass filters (Rajaraman et al. 2013;Robinson et al. 2016). Normally, the reasoning recognition of experts through ''hearing the sound'' can be understood as a layer-by-layer cognition process that consists of frequency band selection, feature extraction, and knowledge rule-based reasoning (Tang et al. 2018a). However, this operating mode can be easily affected by the different experts' experiences, their limited energy, and other subjective factors. Consequently, in practical milling process, the mill would operate under uneconomical conditions for a long time, thereby causing high energy consumption and low efficiency. Moreover, the reasoning recognition of experts in this field by hearing the sound cannot utilize mill shell vibration signals with high sensitivity and reliability. Thus, combining the different modeling methods is necessary to integrate the fuzzy cognition of domain experts and the data-fitting ability of the former methods.
Early studies based on mechanical signal were conducted in the mid-1990s (Zeng and Forssberg 1993) for soft-sensing models of pulp density (PD) and granularity inside the mill in which frequency spectrum sub-band features were used. Previous research indicated that mill acoustic spectrum contained more valuable information than the axis seat vibration spectrum. Then, soft-sensing models of three mill-load parameters (MLPs), namely, material-to-ball volume ratio (MBVR), PD, and ballcharge volume ratio (BCVR), are established based on external signals such as vibration, axis pressure, and mill current (Wang and Chen 2002;Li and Shao 2006). In practice, the BCVR inside the ball mill changes less within a short period and the grid ball mill may block within 60 s. Thus, charge volume ratio (BCVR) is used as an MLP to represent the volume of all loads inside the mill (Tang et al. 2012a). Furthermore, a soft-sensing MLP model based on feature selection and extraction, as well as model learning parameter combinational optimization, is proposed, which can overcome the high-dimensional collinearity of the shell vibration spectra (Tang et al. 2012b). However, the aforementioned soft-sensing models are all traditional single models, which cannot solve the problems of pattern recognition or regression modeling effectively in terms of small sample modeling, and have not fully utilized the rich information contained in multi-source and multiscale signals.
Ensemble learning can improve modeling performance and stability by fusing multiple different single sub-models. The generalization ability of an ensemble model should be balanced between the accuracy and diversity of ensemble sub-models (Granitto et al. 2005;Wang et al. 2014). A selective ensemble (SEN) model based on branch-and-bound (BB) and adaptive weighted fusion (AWF) algorithms was proposed by Tang et al. (Tang et al. 2013) to address problems such as the redundancy and complementarity of the shell vibration/acoustic frequency sub-bands spectrum, and limitation of the information contained by a single-sensor signal. Thus, an MLP model in terms of the selective fusion of single-scale spectrum feature subsets from multi-source signals is established. However, a mill grinding mechanism shows that the shell vibration/acoustic signals have non-stationary and multicomponent characteristics. Thus, applying fast Fourier transform (FFT) to these original mechanical signals to gain a single-scale spectrum for modeling is unsuitable in theory (Lei et al. 2009). Thus, decomposing these original signals into stationary sub-signals is necessary. Huang et al. (1998) and Wu and Huang (2009) proposed empirical mode decomposition (EMD) to effectively decompose the original non-stationary signal into stationary sub-signals with different timescales, i.e., intrinsic mode function (IMF). This method has also been extensively applied in the fault diagnosis of rotary machines (Rai and Mohanty 2007).To assess the mill load the signal was first detrended by a complete ensemble EMD (Twa et al. 2021). Aiming at the co-linearity and nonlinearity of high dimensional data, kernel projection to latent structure (KPLS) algorithm is suitable to address such issues (Nicolaï et al. 2007;Bastien et al. 2015;Wang et al. 2015). Tang et al. introduced EMD, power spectral density (PSD), and PLS algorithm to analyze shell vibration (Tang et al. 2011) and established the MLP model with multiscale shell vibration spectrum features (Tang et al. 2012c). The change in IMF spectra under different grinding conditions is investigated (Zhao et al. 2012). Hilbert decomposition method can also decompose the vibration into multiscale sub-signals, and can explain the mapping relation between MLP and shell vibration from another perspective (Tang et al. 2015). However, the aforementioned models are all based on linear/nonlinear PLS algorithms. Recently, the MLP modeling method based on selective fusion multicondition samples and multi-source features are constructed by using shell vibration and acoustic frequency spectra (Tang et al. 2018b). Dual-layer optimization strategy is employed to select the ensemble sub-models and learning parameters optimally (Tang et al. 2020). A method based on complete ensemble EMD with adaptive noise (CEEMDAN) is proposed to identify the mill load (Lya and Jc 2021). Selective ensemble modeling approach based on multi-modal feature subsets is proposed for mill load parameter forecasting (Liu et al. 2021) However, the ensemble sub-models are almost constructed by using the KPLS algorithm. Thus, deficiencies in fitting modeling data with poor inference performance were observed. Moreover, the aforementioned models cannot simulate the intelligent cognition mechanism of industry experts based on auditory perception to estimate MLPs.
The fuzzy system provides an effective means of complicated industrial modeling with mechanism complexity, strong coupling, uncertainty, and other comprehensive characteristics. For dry mill load detection, a patent for the load detection of a mill that integrated shell vibration and acoustic is applied (Si et al. 2007). A cloud model based on experimental ball mill axis vibration signal is built to infer and measure mill loads (Yan et al. 2014). For industrial wet mill during mineral grinding, rule-based reasoning is used to monitor and control mill overload status based on the current and process variables of mills (Zhou and Chai 2008). Furthermore, data fusion and case-based reasoning algorithm are combined to estimate mill loads (Bai and Chai 2009). However, these methods cannot use mill shell vibration signals with high sensitivity and reliability. Thus, a SEN fuzzy model based on multiscale frequency spectrum features must be constructed. In terms of simulating expert cognition behavior, this model has good inference, but its fitting ability to the modeling data is weak.
Therefore, we proposed a novel heterogeneous SEN (HSEN) modeling strategy by using SEN KPLS and SEN fuzzy models. The contribution of the proposed strategy includes the following: (1) combining two types of heterogeneous models with the error information entropy to obtain the new HSEN model, (2) compensating the cognitive behavior of the domain experts in the practical process with a latent feature-based fuzzy model, and (3) selectively fusing with multiscale mechanical frequency spectrum and the multiple experts' knowledge. To show the advantages of the novel soft-sensing method, we used the shell vibration and acoustical data of a laboratory-scale ball mill to verify it.

Mill load in grinding process
The objective of the grinding process is to crush the broken ore through the ball mill into a qualified mineral pulp. It provides selected materials for subsequent sorting operations. Closed grinding circuit (GC) of two stages is widely used in mineral grinding in China. The MLP of Stage I is examined and shown in Fig. 1. Figure 1 shows that the fresh ore is fed to the conveyor belt by the feeder, and then transported to the wet preconcentration, which enters Stage I of GC. The wet preconcentration selects the useful ore and discards the tailings. Then, the recycled slurry from the hydrocyclone is mixed with the useful core. The steel balls and the mill water are added periodically. Thereafter, the mixture enters the ball mill. The rotating ball mill drives the steel ball to impact and crush the ore. The slurry drops down into the sump and is mixed with sump water pumped into the hydrocyclone. The hydrocyclone divides the slurry into overflow with fine particle size and coarse grit. The former part is transported to stage II of GC. The latter is the recycle to the wet preconcentration.
Grinding process is a ''bottleneck'' operation in mineral processing. The mill load cannot be measured directly. In real application, the operation experts identify the familiar load and internal parameters of a specific mill by their own listening experience. It is shown that the human ear is essentially a set of adaptive band-pass filters. From a certain point of view, the process of expert ''listening'' reasoning and recognition can be understood as a layer-bylayer cognitive process of human ear band-pass filtering ability, human brain feature extraction ability, and expert experience rule reasoning ability.
The acoustic signals used by operation experts are adaptively decomposed into different sub-signals through the human ear band-pass filter. After feature extraction, the fuzzy values (such as high, low, and moderate) of a certain band signal are obtained. The fuzzy identification of mill load is realized based on the knowledge of different experts. Obviously, this is a selective information fusion process based on human expert experience for uncertain reasoning, but the differences in expert experience and their limited energy causes difficulty in ensuring that the mill can be operated in the optimal load status for a long time. Thus, one focus of this paper is how to simulate the identification process of operational experts based on existing technology.
The vibration signal has higher sensitivity and reliability, but the operation experts in the industrial field cannot use the signal directly and effectively with human ears. Therefore, with the modern signal analysis technology and modeling technology, the vibration and acoustic signals of the ball mill can be effectively fused to realize the simulation of operation expert cognitive process.

Heterogeneous SEN-based MLP forecasting
A new heterogeneous SEN (HSEN)-based modeling strategy is proposed by using multiscale mechanical frequency spectrum for soft-sensing MLP, which is shown in Fig. 2. This strategy includes two modules, namely, multiscale frequency spectral transformation and HSEN forecasting. The latter consists of a frequency spectrum-based SEN KPLS model, a latent feature-based SEN fuzzy model, and error information entropy-based combination sub-modules.
In Fig. 2, superscripts t and f represent the time domain and frequency domain, respectively; and subscripts V and A represent the shell vibration and acoustical signals. An explanation of the nomenclature used in this study is given in Table 1.
The functions of different modules are described as follows: (1) Multiscale frequency spectral transformation module: the adaptive decomposition of shell vibration and acoustic signals is conducted to simulate the band-pass filtering function of the human ears for multicomponent mechanical signals, and then transform them into multiscale frequency spectra. EEMD algorithm is used to adaptively decompose the original vibration and acoustic signals into stationary sub-signals (IMFs) with different timescales. Although FFT is not suitable for decomposing nonstationary and multicomponent original signals, it is good at transforming the IMFs into frequency spectra.
(2) HSEN forecasting model module: frequency spectrum-based SEN KPLS model sub-module establishes candidate sub-models based on different frequency spectra by using the KPLS algorithm, and sub-models are selected and combined to obtain SEN KPLS models. The latent feature-based SEN fuzzy model sub-module extracts the latent features of multiscale spectra, establish candidate fuzzy submodels based on these latent features, and select and combine fuzzy sub-models to obtain SEN fuzzy models. Finally, the error information entropy-based combination sub-module fuses the aforementioned SEN model based on the information entropy of prediction errors.

Multiscale frequency spectral transformation
The EMD algorithm lacks theoretical foundation and has an end effect, and has difficulty determining the decomposition termination criterion. Losing the physical meaning of the IMF sub-signals because of mode mixing effect is a prominent problem. Ensemble EMD overcomes this problem through auxiliary noise analysis technology by using two parameters, namely, additional noise A noise and ensemble number M. Their relationship can be described as follows (Tang et al. 2018b): where e EEMD is the error between the original signal and corresponding IMFs.  Fig. 1 Grinding circuit I (GC I) and expert cognitive process of MLP EEMD can be described as follows: (1) M and A noise are initialized, (2) A noise is added to the original signal, (3) the EMD of the new signal for M times is applied, (4) the average EMD value for M times is calculated as the final EEMD result.
The EEMD result of the shell vibration signal can be expressed as The relationship between EEMD and EMD can be expressed as where X t VEMDj V is j V th IMF of mth EMD, and X t VEEMDJ V is the residual error after decomposition.
The decomposition of the mill shell vibration and acoustic signals can be expressed as ::: ::: ::: ::: These decomposed signals are sorted according to their frequency, that is, from high to low. Frequency domain analysis is required because extracting the valuable information within the time domain is difficult. Thus, each IMF is transformed into the frequency domain through FFT. The relationship between time and frequency domain sub-signals can be expressed as  ::: For convenience of description, the frequency spectra of shell vibration and acoustic signals are renumbered and expressed uniformly in the following formula: :; X f VEEMDj V ; ::::; X f VEEMDJ V ; f AEEMD1 A ; :::; X f AEEMDj A ; :::: where J ¼ J V þ J A is the number of multiscale spectra of vibration and acoustic signals after the combination. j sel ¼1 ) should be selected optimally, and the w fuzzy entropy and w latent entropy in the second layer should be determined.

HSEN forecasting model
Therefore, by maximizing the optimization objective, the solution process can be expressed as the following optimization problems: Frequency spectrum after recombination j ¼ 1; :: From the perspective of global optimization, the HSEN forecasting problem is transformed into optimizing the learning parameters of the latent structure model and fuzzy reasoning model. To ensure better diversity and modeling performance for SEN models, we select the same learning parameters for various candidate sub-models to simplify the selection process of these model learning parameters.

Frequency spectrum-based SEN KPLS model
Here, J multiscale frequency spectra acquired through the aforementioned method are used to establish J candidate KPLS sub-models. Taking jth spectrum fðX j Þ l g k l¼1 as an example, we use the following to realize nonlinear mapping: where Ker is the kernel parameter. TheK Ker j is obtained through the centralization of kernel matrix K Ker j by using the following formula: where I is a k-dimensional unit matrix, and 1 k is a vector with a value of 1 and length of k.
The output of a candidate KPLS sub-model based on spectrum X j can be expressed on the basis of the KPLS algorithm asŷ where T j and U j are the latent score matrixes of the input and output data obtained on the basis of the KPLS algorithm.
The test samples are calibrated by using the following formula, where K t;j is the kernel matrix of the test samples, K t;j ¼ K j ððX t;j Þ l ; ðX j Þ m Þ and fðX j Þ m g k m¼1 are the training data, k t is the number of testing samples, and 1 kt is a vector with a value of 1 and length of k t .
The candidate sub-model output of testing sample fðX t;j Þ l g k t l¼1 can be expressed aŝ The number of latent variables, which refer to the number of layers of the KPLS model denoted as h, should be determined.
The establishment of the jth candidate KPLS model can be expressed as The set of all J candidate KPLS sub-models can be expressed as Heterogeneous selective ensemble learning model for mill load parameters forecasting by using… 13473 where S Can KPLS is the set of all candidate KPLS sub-models. The BB-based SEN is used to select and combine the ensemble KPLS sub-models. The candidate KPLS submodels and weighing algorithm are given first. Then, BBSEN is operated multiple times to obtain optimal SEN models under different ensemble sizes. Finally, the SEN KPLS model is obtained by sorting these models. The set of selected KPLS ensemble sub-models is expressed as ff sel KPLS ðÁÞ j sel g J latent sel j sel ¼1 , and the relationship between ensemble KPLS sub-models and candidate KPLS sub-models is expressed as, where S Sel KPLS is the set of ensemble KPLS sub-models, j sel ¼ 1; 2; :::; J latent sel , and J latent sel is the ensemble size of SEN KPLS models.
The AWF algorithm is used to calculate the weighting coefficients of ensemble KPLS sub-models by using the following formula: is the weighting coefficients of ensemble KPLS sub-models established on the basis of the j sel th spectrum, r j sel is the standard deviation of the sub-models' output value fðŷ latent j sel Þ l g k l¼1 , and k is the modeling sample number. The output valueŷ latent of the SEN KPLS model is calculated by using the following formula: whereŷ latent j sel is the output of the j sel th KPLS ensemble submodel.
The establishment of the aforementioned SEN KPLS model can be expressed as

Latent feature-based SEN fuzzy model
The input of the SEN fuzzy model consists of latent features of multiscale frequency spectra. Here, the same number of latent variables is selected for each frequency spectrum, and the number is denoted as h 0 . The latent feature extracted from the jth spectrum fðx j Þ l g k l¼1 is expressed on the basis of the latent feature extraction method ) as follows: z j ¼ ½z j1 ; :::; z jh 0 ð 21Þ The latent feature subset extracted from all multiscale frequency spectra is denoted as fz j g J j¼1 . The extracted latent features are used to construct the candidate fuzzy sub-models, and the jth candidate fuzzy sub-model can be established as fðz j1 ; :::; z jh ; yÞg k l¼1 L ' À where L is the clustering threshold set during the establishment of fuzzy models. The set of all J candidate fuzzy sub-models can be expressed as where S Can Fuzzy is the set of all candidate fuzzy sub-models. Here, all the selected ensemble fuzzy sub-models are expressed as ff sel Fuzzy ðÁÞ j sel g J fuzzy sel j sel ¼1 , and the relationship between ensemble fuzzy models and candidate fuzzy submodels can be expressed as where S Sel Fuzzy is the set of ensemble fuzzy sub-models, j sel ¼ 1; 2; :::; J fuzzy sel , and J fuzzy sel is the ensemble size of the SEN fuzzy model.
The AWF algorithm is used again to calculate the weighting coefficients of ensemble fuzzy sub-models by using the following formula: where P J fuzzy sel j sel ¼1 w fuzzy j sel ¼ 1 and 0 w fuzzy j sel 1, w fuzzy j sel is the weighting coefficient that corresponds to the j sel th ensemble fuzzy sub-model, r j sel is the standard deviation of the output values fŷ l j sel g k l¼1 of fuzzy inference ensemble submodels, and k is the modeling sample number.
The BBSEN algorithm was used to select and combine the ensemble fuzzy sub-models. Its output valueŷ Fuzzy is calculated aŝ whereŷ latent j sel is the output of the j sel th ensemble fuzzy submodel.
The establishment of the aforementioned SEN fuzzy model can be expressed as

Error information entropy-based combination
The two types of heterogeneous models can be fused based on information entropy. The weighting coefficients of the two types of heterogeneous SEN models are determined on the basis of the output values of the training data (Tang et al. 2012a).
The weighing coefficient is calculated as follows.
The relative error of the predicted output of the j Entropy th ensemble SEN sub-model at each time l is calculated as whereŷ j Entropy l is the output value of the modeling sample at time l using the j Entropy th SEN sub-model. In this study, weighing j Entropy ¼ 1; 2, which represents the latent structure SEN model and fuzzy reasoning SEN model, respectively, i.e.,ŷ j Entropy l =ŷ latent l orŷ fuzzy l with j Entropy ¼ 1 or 2. The proportion p j Entropy l of the relative error of the predicted output of the j Entropy th ensemble SEN sub-model is calculated as Entropy E j Entropy of the relative error of the predicted output of the j Entropy th ensemble SEN sub-model is calculated as The weighting coefficient of the j Entropy th ensemble SEN sub-model is calculated as where P J Entropy j Entropy ¼1 W j Entropy ¼ 1, and J Entropy is the number of ensemble SEN sub-models.
In this study, J Entropy = 2, which indicates that the following corresponding relationships exist between the SEN KPLS and SEN fuzzy models using the aforementioned weighting algorithm:

Data description
The experiments were conducted on a XMQL 420 9 450 ball mill, where the outer diameter and length of its shell were 460 mm. The mill was driven by a 2.12 kw threephase motor, with 80 kg maximum steel ball loading capacity, 10 kg/h power grinding capacity, and 57 r/min revolving speed. An opening, which was used to add the steel ball, mill, and water load, was in the middle of the mill. The materials used in this experiment included copper ores with diameter less than 6 mm and density of 4.2 t/m 3 . Steel balls with diameters of 30, 20, and 15 mm were used as grinding media, and the matching ratio was 3:4:3. The data acquisition system was installed on the mill shell and mainly consisted of acceleration sensor and DSP equipment. In this study, the acquisition frequencies of mill shell vibration and acoustic signals were 51,200 Hz and 8,000 Hz, respectively.
Due to the limited modeling samples, the data are partitioned as the training and testing ones, which are the same as those reported by Tang et al. (2018b) and Tang et al. (2016). In their works, bootstrap algorithm is used to address the sample partition problem. The results are almost the same as those of the training and testing methods.

Multiscale frequency spectrum transformation results
The parameters A noise = 0.1 and M = 10 are selected based on the work of Tang et al. (2018b). The original shell vibration and acoustic signals of the four rotating periods of the mill are decomposed into time-domain sub-signals with different timescales based on EEMD. These IMFs with different timescales are transformed into multiscale frequency spectra by FFT. The spectra of the former eight shell vibration sub-signals (VIMF) and acoustic sub-signals (AIMF) are shown in Fig. 3.
The figure indicates that these multiscale sub-signals, which have high-dimensional features, are sorted from high to low frequency.      The learning parameters of MBVR, PD, and CVR softsensing models with SEN KPLS algorithm are determined based on the aforementioned results, and the corresponding ensemble sub-models are shown in Tables 2, 3 and 4. The multiscale spectra that correspond to the ensemble submodels numbered as 1-10 are VIMF1-VIMF10, and those that correspond to the ensemble sub-models numbered as 11-20 are AIMF1-AIMF10.    The aforementioned results indicate that the ensemble sub-models selected by the SEN KPLS model mainly come from shell vibration signals.

Latent feature-based SEN fuzzy model results
The relationships between the clustering threshold and the RMSRE of the SEN fuzzy models are shown in Figs. 10, 11 and 12.   The learning parameters of MBVR, PD, and CVR softsensing models with the SEN fuzzy algorithm based on the aforementioned results are determined, and the corresponding ensemble fuzzy sub-models are shown in Tables 5, 6 and 7. Tables 5, 6 and 7 show that half of the ensemble submodels selected by the fuzzy SEN model come from shell vibration signals, and the other come from acoustic signals.

Error information entropy-based combination results
The weighing coefficients of the SEN KPLS and SEN fuzzy models for the MBVR model are 0.6148 and 0.3851,   Fig. 16.
As shown in Fig. 16, the training error of the SEN KPLS model is much smaller than that of the testing data, indicating that overfitting exists during the training. The training error of the SEN fuzzy model is one-third of the testing error, indicating that overfitting is reduced and verifying the feasibility of the modeling based on the SEN fuzzy method. The training data error of the HSEN model is between that of the SEN KPLS model and that of the SEN fuzzy model. Meanwhile, the HSEN model has the minimum testing data error, thereby verifying that the fusion of the SEN KPLS and SEN inference models has improved the generalization performance.
For the PD model, the weighting coefficients of the SEN KPLS model and SEN fuzzy model are 0.4605 and 0.5395, respectively, indicating that the contribution rate of the SEN KPLS model is similar to that of the SEN fuzzy model. The prediction errors of various models are shown in Fig. 17.
In Fig. 17, the training error of the SEN KPLS model is smaller than that of the testing data, which is approximately 1/1000 of the testing training error, indicating that overfitting exists during the training of the SEN KPLS model. The training error of the SN fuzzy model is only one-third of the testing error. This finding indicates that the overfitting is reduced and verifies the feasibility of the modeling process based on the SEN fuzzy method. The training data error of the HSEN model is only one-third of the test data error, and its training and testing precisions are improved compared with the SEN fuzzy and SEN KPLS models, indicating that the fusion of these two models has improved the generalization performance.
For the CVR soft-sensing model, the weighting coefficients of the SEN KPLS and SEN fuzzy models are 0.5598 and 0.4401, respectively, indicating that the contribution rate of the SEN KPLS model is slightly stronger than that of the SEN fuzzy model. The prediction errors of the different models are shown in Fig. 18.
In Fig. 18, the training error of the SEN KPLS model is smaller than that of the testing data, indicating that overfitting exists during the training of the SEN KPLS model. The training error of the SEN fuzzy model is only one-third of the test error, indicating that overfitting declines to a certain degree and verifying the feasibility of the modeling process based on the SEN fuzzy method. The training data error of the HSEN model is only half of the testing data error, and its training and testing precisions are improved compared with the SEN fuzzy and SEN KPLS models. These results indicate that the fusion of the two models has improved the generalization performance.

HSEN model and its sub-models
Considering the trade-off difficulty between the diversity and precisions of the sub-models, the proposed HSEN models select the model learning parameters through global optimization. In terms of ensemble models, the proposed MLP soft-sensing model is a double-layer ensemble with dual-layer sub-models. The first-layer sub-models refer to the SEN fuzzy and SEN KPLS models. The second-layer sub-models refer to the fuzzy and KPLS submodels. The statistical results of different MLP models obtained by the proposed method are shown in Table 8. Table 8 shows that: 1) The HSEN model has satisfactory generalization performance and fuses two heterogeneous SEN submodels effectively. The modeling performance of the SEN KPLS model is stronger than that of the SEN fuzzy model, which is based on a previous analysis. 2) In the first-layer sub-models, for MBVR, the contribution rate of the SEN KPLS model is higher than that of the SEN fuzzy model. However, for the PD and CVR, the contribution rates are similar between two types of first-layer SEN sub-models. 3) For the contribution rates of the second-layer submodels, three different MLP models select the

HSEN model
The proposed soft-sensing method is compared with singlescale spectral feature extraction and selection method (Tang et al. 2012d), linear PLS model-based multiscale ensemble modeling method (Zhao et al. 2012), nonlinear KPLS model-based multiscale SEN modeling method , latent features and adaptive genetic algorithm-based single model method , selective fusion multi-condition samples and multi-source features method (Tang et al. 2018b), and dual-layer optimized selective information fusion using multi-source frequency spectrum features method (Tang et al. 2020).
The results of the comparison are reported in Table 9. Table 9 shows the following:' 1) The proposed method has the best average modeling performance for MBVR, and its RMSRE is 0.08822. One reason is that EEMD improves the adaptive decomposition precision of the shell vibration and acoustic signals. Another reason is that the HSEN modeling strategy that fuses the two heterogeneous SEN models can simulate and compensate the MLP estimation mechanism of operation experts more effectively compared with the previous method. Moreover, in practice, most of the operational experts can estimate MBVR for their familiar ball mill by hearing the acoustical signal. That is, the proposed method immensely improves MBVR in terms of modeling performance, which is in-line with the fuzzy estimation of the MBVR method in industrial practice.
2) The traditional single model ) based on the shell vibration signal has the best modeling performance for CVR and its RMSER is 0.08992, which uses GA to jointly select the input latent features and learning parameters. The traditional single model established by Tang et al. (2012d) provides poor physical explanation.
3) The dual-layer SEN in terms of selective fusion multi-condition samples and multi-source features based on the single-scale frequency spectrum has the best modeling performance for PD and its RMSER is 0.07004. By using three multi-component signal decomposition approaches, Tang et al. obtained the secondary prediction performance of PD (Tang et al. 2020). Thus, different MLPs require various softsensing modeling strategies related to the production mechanism of MLPs on shell vibration/acoustical signals, and deep cognition remains to be elucidated. 4) This study and the method presented by Tang et al. (2018bTang et al. ( , 2020 have greater complexity than the single-scale frequency spectrum ones in terms of mechanical signal decomposition. Based on this study as an example, for the signal decomposition in the model training phase, the calculation quantity of EEMD is M times of EMD and J*M times of FFT, where M is the EMD execution times, and J is the number of multiscale sub-signals decomposed by EMD. For the multiscale spectral feature selection, the proposed method has not conducted feature selection compared with the works of Tang et al.  (2012d, 2014). In the establishment phase of the softsensing model, the proposed HSEN model has double layers that are more complex than those of other single models and single-layer SEN modeling methods. 5) Among all the methods, only the proposed method uses the fuzzy inference mechanism. Thus, the proposed method has advantages in terms of the simulation and compensation of the cognition mechanism of operation experts.

Conclusions
In this study, we successfully combine two types of heterogeneous models, namely, frequency spectrum-based KPLS and latent feature-based fuzzy models, by using the prediction error information entropy. This soft-sensing method can simulate and compensate the fuzzy cognition behavior of industrial experts in the mineral grinding process. This novel model is a dual-layer ensemble construction strategy that can fuse multi-source information in various grinding conditions. The simulation results are from a small sample of experimental data on a laboratory-scale ball mill. The experiment results show that the new model performs better than the other algorithms.
Our future work will apply the industrial ball-mill data closing to actual working conditions. Moreover, intelligent optimization algorithms will be used to select the learning parameters of the HSEN strategy.