A selective hybrid system for state-of-charge forecasting of lithium–ion batteries

As research in lithium–ion batteries field has extended, the need for better management systems also increases. An important part of them is the proper estimation of battery status over time with indirect metrics such as state-of-charge (SoC). In the machine learning environment, different simple techniques have been tested showing good performance and being surpassed by hybrid systems. In this study, a static selection model is proposed to choose the best nonlinear predictor to work with and ARIMA model and combination function for a specific database considering how they perform in a validation set. This architecture allows to consider linear and nonlinear components of the time series using residual forecasting and a selection step to reduce the chances of choosing the wrong combination in each specific database. SoC values for five different databases where forecasted, allowing to compare the results of six models relevant in the literature to the proposed one. The results showed a superior performance for the proposed model in four out of the five databases, with gains of 5.27%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, 13.51%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, 56.67%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, and 38.71%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}. There was only one database where the proposed model scored in second place. To determine whether the obtained results were statistically significant enough to make a conclusion, a Nemenyi test was conducted using MSE and MAE values to rank the performance of all models in all databases. The critical distance and rank achieved by the proposed model allowed to conclude that this, in fact, delivered the best performance amongst all models tested.


Introduction
Lithium-ion (Li-ion) batteries currently stand as one of the most vital energy storage strategies, given their applicability as power sources for mobiles, laptops, everyday electronics, power backups, and electric vehicles (EVs), [1][2][3][4] amongst others. Furthermore, storage-focused systems use battery packs made of numerous lithium-ion batteries, which require close monitoring of the battery states to maintain safe and efficient operation [5]. Thus, battery management requires estimating the battery's states, such as the state-of-charge (SoC), state-of-health (SoH), stateof-power (SoP), and state-of-life (SoL).
The state-of-charge is known as the remaining usable percentage of the capacity of a battery. Therefore, a 100% SoC indicates the availability of use for the maximum capacity, while a 0% SoC suggests the opposite. Furthermore, SoC also provides information on a system's reliability, efficiency, and safety [6]. Accurate SoC estimations can help battery management systems with cell balancing in packs, avoiding cell over-charging and over-discharging, which can result in battery damage [7]. However, the state-of-charge is not directly measurable since it depends on other factors within the battery and requires separate calculation. The development of data-driven algorithms has taken a significant step in recent years to enhance SoC measurements by improving the generalization performance, learning capabilities for high accuracy, and convergence [8]. In addition, recent work's conventional standalone machine learning techniques suffer from an accuracy standpoint and thus have been replaced by high-fidelity hybrid machine learning techniques [9]. Furthermore, SoC time series are often composed of linear and nonlinear patterns, which may not be dealt with equally well by a single forecasting method [10]. In this sense, traditional forecasting methods such as the ARIMA have been explored in SoC forecasting [9,11] due to their well-established Box-Jenkins [12] methodology for model selection. However, despite the forecasting capabilities of the ARIMA model for nonstationary time series, the presence of nonlinear patterns can reduce its accuracy. On the other hand, nonlinear models such as artificial neural networks (ANNs) can learn nonlinear patterns. However, their accuracy is dependent on the selected hyperparameters, which lead to overfitted or under-fitted models when misspecified [10].
The ability to learn the data is often referred to as model bias, whereas the stability of the model when forecasting new samples is referred to as variance. Therefore, data-driven models such as ANNs may learn well from data and present a slight bias. However, it may excessively rely on some samples, producing high variance [13]. Thus, a suitable trade-off between model bias and variance is desirable in forecasting systems since low bias models could reflect high variance.
Considering the limitations of stand-alone models in forecasting tasks, several hybrid systems that decompose the time series into its linear and nonlinear components have been proposed to overcome the limitations of linear and nonlinear models and improve forecasting accuracy by mitigating the risk of selecting an inappropriate model. In general, the architecture of such hybrid systems is composed of three main steps: I) time-series forecasting, II) residual forecasting, and III) combination. Usually, in the first step, a linear model such as ARIMA is employed to perform time-series forecasts. In the second step, the residual from the ARIMA forecasts is calculated and used as input for a nonlinear model. Residual forecasting is a challenging task due to the presence of heteroscedastic patterns and may be affected by noise. The third and last step combines the forecasts obtained by the linear and nonlinear models.
Traditional studies in the literature assume a linear relation between linear and nonlinear forecasts; however, in several scenarios, this relation is better represented by nonlinear functions [14,15]. Furthermore, other architectures have been proposed to improve prediction stability, reduce variance, and find the best model combinations. For example, Cruz and Oliveira [16] proposed a model with a weighted average combination of an ARIMA+SVR and a LSTM using R2 as a base metric to compute static weights to predict current and voltage time series. Models like this have been shown to perform adequately in SoC prediction. Nonetheless, introducing a selection process to adapt models and combinations to the architecture would greatly benefit overall performance, as this ensures the best adaptation to the characteristics of each time series.
This research proposes a novel hybrid system architecture to improve the forecasting accuracy of the hybrid systems presented in the SoC forecasting literature. The proposed hybrid system improvements can be achieved by the employment of two levels of selection. The first level is performed by selecting the nonlinear model to be combined with linear forecasts according to the best performing option in the validation set. The second level selects the best combination function for the outputs obtained from linear and selected nonlinear forecasts. As a result, the proposed system has the following advantages: • Reduces the risk of selecting an inappropriate model by combining different forecasting systems. • Increases model performance for each predicted dataset as the combination function also adapts to specific time-series characteristics. • Its versatility allows different models to be used in each step.
The remainder of this work is organized as follows: Sect. 2 presents the related studies and state-of-the-art in the literature. In Sect. 3, the proposed method is described. Then, the results are given in Sect. 5. Finally, Sect. 6 presents the conclusion and future studies.

Related work
A relevant approach used to overcome the limitations of linear and nonlinear models is to decompose the time series into their linear and nonlinear components through a hybrid system. Zhang's [13] considered a time series Z t to be a linear relation between its linear L t and nonlinear N t components as described in Eq. 1 The first stage in the model proposed in [13] is to predict the forecasts of the linear component L t using an ARIMA model. The residual (error) series is computed as the difference between the time series Z t and linear forecasts L t as presented in Eq. 2.
(2) E t = Z t −L t 1 3 A selective hybrid system for state-of-charge forecasting… Second, the residual series generated in last step is introduced nonlinear function f which considers k past residuals in order to perform nonlinear forecasts N t as demonstrated in Eq. 3.
In the nonlinear function f, the number past information k represents the length of the time window. In the work of Zhang [13], a Multi-layer Perceptron (MLP) neural network was considered as nonlinear model to perform the residual forecasts. The last step performs the combination of the linear and nonlinear forecasts, producing the final forecast Ẑ t , as described in Eq. 4.
For the sake of simplicity, the architecture described previously, will be henceforward referred to as ARIMA+MLP, representing the summation of forecasts achieved by the ARIMA and MLP as linear and nonlinear models, respectively. Subsequent to Zhang's contribution, several models proposed in the literature have followed the same architecture. Panigrahi and Behera [17] proposed an exponential smoothing (ETS) to forecast the time series and a MLP to forecast residuals and a summation operator for the combination (ETS+MLP). Pai and Lin [18] proposed an ARIMA and support vector regression (ARIMA + SVR) to forecast stock pricing, Holanda and de Oliveira [19] employed a combination of models in the residuals using an ARIMA+PSO-SVR hybrid system. De Oliveira [20] proposed an alternative to Zhang's model using dynamic residual forecasting (DReF) aiming to reduce the uncertainty of model selection and avoiding deterioration of the timeseries forecast by finding the most suitable machine learning model from a pool to predict a specific pattern of the residual series and deciding whether it is a fitted option to increase the accuracy of the linear combination stage. The model is proved to perform satisfactory in several databases, which opens the opportunity for it to be implemented in an application-specific approach.
Despite the importance of such architecture to time-series forecasting tasks, the assumption of linearity between linear and nonlinear forecasts can reduce the accuracy of the system, since the real combination function is unknown. In this sense, several other studies in the literature have investigated the nonlinear combination of linear L t and nonlinear N t forecasts. Lucena [21] proposed an ARIMA model to perform time-series forecasting, a SVR model to perform residual forecasting, and the combination between them being performed by either an SVR or a MLP. In this sense, to simplify the notation, this architecture will be referred to as MLP(ARIMA,SVR) and SVR(ARIMA, SVR) representing the combination of ARIMA and SVR forecasts by a MLP ensemble. Moreover, Santos et al. [15] used a similar architecture, but incorporating past forecasting values as input for the combination function.
In the specific application of SoC forecasting, some models also follow a similar idea, as is the case of Khalid [9], combining a Minimized Akaike Information Criterion tuned ARIMA, and a unified Multi-layer Perceptron (MLP) and then with Nonlinear Autoregressive Neural Network with external input (NARX). A similar architecture was also proposed by Cruz and Oliveira [16] using a weighted average combination of an ARIMA+SVR and a LSTM using R2 as a base metric to compute both static weights. In this model, current and voltage time series are first predicted, and then, coulomb counting technique is used to compute SoC values through time by integration. Moreover, the coulombs counting method could be implemented in a more sophisticated form as it is presented as the simple integration of current values over time in Eq. 5.
where SoC t o is the state-of-charge at the beginning of the current measurement, Δt is the period of observation to be computed, C p is the capacity of the battery, and I(t) is the current being measured from the battery over time.
The current measured in the battery is considered to be positive in the charging semi-cycle ( I c (t) ) and negative in the discharging semi-cycle ( I d (t) ). This allows to express I(t) as presented in Eq. 6: Where a is a coefficient of values: Meaning, possible current values in Eq. 6 using a in 7 can also be expressed in parts as: An alternative to improve Eq. 5 is proposed by Mohammadi [22] with an improved coulomb counting algorithm (refered to as iCC) for lithium-ion batteries. Here, based on given Eq. 5, uncertainties from current measurement and integration are considered to shape Eq. 9 for a one-hour analysis: Where I(t) represents current measures without uncertainties as shown in Eq. 8, M represents the measurement uncertainty, and I represents the integration uncertainty. Both M and I are functions of the intrinsic characteristics of the battery and the integration process that add error to the SoC computation, these were studied by Mohammadi [22], and typical values for them were estimated to use them in Eq. 9 as constants with approximate values expressed in Eqs. 10 and 11: Where C corresponds to the constant value of capacity of the battery.
The proposed model differs from previous approaches in the literature through the employment of two forecasting steps and introducing a selection approach to the nonlinear model and combination function between linear and nonlinear forecasts. This step is important to mitigate problems related to uncertainty in model selection.

Proposed model
The proposed method follows the architecture of the studies in the literature, and it is composed of three main stages: • Linear modelling stage • Nonlinear modelling stage • Combination stage.
The stages are required to perform modelling of the time series, residual series, and the combination of forecasts. The architecture of training process for the proposed model is presented in Fig. 1 whereas for the testing process is given in Fig. 2. The main objective is to select an appropriate model for the forecast of residuals and for the combination of forecasts, which are fixed in several studies in the literature. Moreover, for each dataset under study, a new selection process is carried, resulting in a different forecasting system.

Linear modelling stage
The first step in the pipeline of the architecture is to implement a linear model (LM) to perform a forecast L t of the series. It is important to highlight that the linear forecasts L t are achieved through the employment of a linear model in the time series Z t as expressed in Eq. 12, where f is a linear function used to map the p last observations in the time series.
In this stage, an ARIMA model is employed to perform linear predictions in the time series, the training of the model is performed through a stepwise procedure [23] to select its most suitable parameters.
The residual calculation is then performed as presented in Eq. 2 to be used as input in the next stage.

Nonlinear modelling stage
The residual forecasting process presented in Eq. 3 uses a single nonlinear model to perform a forecast N t of the residuals of the linear stage E t . However, a single model may not be the best in every situation [24], and a proper selection procedure could (12) L t = f (z t−1 , z t−2 , ..., z t−p ) Fig. 2 Testing process for proposed model be employed to enhance the forecasting accuracy of the system. In the proposed model, this step is expanded using a pool of nonlinear models: Multi-layer Perceptron (MLP), Decision Tree Regressor (DTR), and Support Vector Regressor (SVR). The training process of each models is performed individually, with the parameter selection performed through a grid search process.
The most suitable nonlinear model (NM) for each time-series prediction is then selected by evaluating all models in pool using a validation set, as these data are not contained in training nor testing sets.

Combination stage
The last stage performs the combinations of forecasts of linear ( L t ) and nonlinear N t modelling stages. It is important to highlight that this stage performs a regression task in order to perform the nonlinear combination, that is, there are two inputs (linear and nonlinear forecasts) and one output which is the target value at time t. Therefore, we can consider that the combination model will represent a function CM as described in Eq. 13.
Unlike other studies in the literature [14,15,20], the proposed model selects the most suitable model to perform the combination of forecasts for a given dataset. Furthermore, the pool of models in this stage is composed by the summation operator representing a linear combination function and three machine learning models SVR, DTR, and MLP to perform nonlinear combinations. It is important to mention that the selection process is performed on the validation set, which enables to select models with better generalization capacity.

Testing process
After training the ARIMA model LM and selecting appropriate models to forecast the residuals and perform the combination (NM and CM, respectively), they are used to perform the forecasting and combination of new patterns. Thus, the same stages performed in the training process are also required in the testing.
Linear forecasting stage is performed through the employment of the linear model LM, which produces linear forecasts L t and then calculates the residual E t = Z t −L t . In the nonlinear forecasting stage, the residual series E t is used by the nonlinear model NM to produce nonlinear forecasts N t . The combination stage employed the combination model CM taking into consideration previous forecasts L t and N t to perform the final forecast Ẑ t . The testing procedure described in Fig. 2 shows the employment of the selected models. It is important to mention that the CM model in the combination can be represented by linear nor nonlinear functions.

Experiments
To perform comparisons of different algorithms presented in the literature, different datasets concerning SoC of batteries with different characteristics are employed. The models are compared using several error metrics, and finally, a hypothesis testing is employed to perform the final analysis.

Datasets
In order to test performance of the proposed model, tests were conducted on four time series representing charge and discharge cycles of lithium-ion batteries with different characteristics. The data were all accessed through the Battery Archive repository [25], which allows to visualize, download, compare, and filter battery data shared by five different institutions. The specific databases used in these experiments are as follows:  [28].
From each of these sets, one time series was selected to test the ability of the proposed model to forecast the state-of-charge of lithium-ion batteries with graphite anode and different characteristics as summarized in Table 1, where C is the capacity of the battery in ampere hours, T is the temperature in Celsius degrees, Ch C-rate is the speed at which the battery is fully charged, and Dis C-rate is the speed at which the battery is fully discharged.
Cathode chemistry is specified for each battery recorded. Their abbreviations correspond to: 1. NCA Lithium-nickel-cobalt-aluminium oxide used to enhance battery effectiveness and efficiency, enabling high-energy density batteries. 2. LCO Lithium-cobalt oxide is one of the first used in lithium-ion batteries and one the most common. 3. NMC Lithium-nickel-cobalt-manganese oxide, strong overall performance and the most used in automotive applications. 4. LFP Lithium iron phosphate, enviromentally safer option with low cost and good performance.
While chemistry is a battery characteristic to be taken into consideration when analysing the prediction of each time series, given that the proposed model is of datadriven nature (and not chemical/electrical modelling), in this research work, the most important values in Table 1 along with the number of cycles are the charging C-rate and discharging C-rate, which represent the time steps necessary to repeat each recorded cycle.

Preprocessing
Data normalization was applied to the input features. All of them are transformed into the range [0,1] according to Eq. 14, where Z Norm is the normalized series, and the maximum and minimum values of the series are represented by min(Z t ) and max(Z t ) , respectively.
Each time series was split into three subsets: training (60% ), validation (20% ), and testing (20% ). This is specified for all databases in Table 2. The training set is destined for linear and nonlinear models training processes, the validation set is used for best parameters, nonlinear model and combination function of the model, and testing set is used for evaluation stage.

Parameter selection
ARIMA has three parameters to be adjusted, where p represents the autorregressive order, d is the number of differences in the series in order to make it stationary, and q represents the order of the moving average. The ARIMA model was generated using "forecast" package in R language [29], through the auto.arima function which searches automatically the best parameter according to either AIC (Akaike Information Criterion), AICc (second order AIC), or BIC (Bayesian Information Criterion). For the rest of the models (nonlinear forecasting and combination), parameter selection process was based on a grid search over a set of possible configurations as described in Table 3. For the SVR, hyperparameters were tuned under the definitions: 1. : The sparsity can be induced through the employment of a -insensitive loss function which creates a -tube. 2. C represents the regularization factor, which will perform the trade-off between complexity and generalization capacity of the model. 3. Kernels perform nonlinear mappings of the datasets into higher dimensional space. 4.
defines the influence of each training example.
Next, for MLP: 1. Hidden layer sizes as the amount of neurons in each given hidden layer. And, lastly, for DTR: 1. Splitter as the strategy selected to split the tree in each node. 2. Maximum depth of the tree. 3. Minimum amount of samples at each leaf node. 4. Minimum weighted fraction of the sum total of weights at each leaf node. 5. Maximum number of features to look for when splitting. 6. Maximum leaf nodes to grow the tree.

Evaluation metrics
All methods were evaluated using mean squared error (MSE) and mean absolute error (MAE), both described in Eqs. 15 and 16, respectively. The MSE performs a quadratic evaluation of errors which applies heavier penalties to higher forecasting errors, it is a scale-dependent metric, thus its not advisable to perform comparisons over multiple datasets. The MAE metric calculates the average error as output, using the same scale as the time series being evaluated [30].
These measures, besides allowing to make some conclusions on the model performance, also allow to conduct Friedman and Nemenyi tests as a mean to determine whether or not there is a statistically significant difference in performance of all evaluated models across all datasets. This would bring conclusive evidence to backup the idea of the proposed model performing better than other models.

Results
Following the grid search process mentioned in Sect. 4 for each model in all databases, the experiments were conducted on the 5 aforementioned datasets. In the proposed model, different models and architectures may be selected for each dataset, as presented in Table 4. In CALCE dataset, a hybrid system based on a summation between ARIMA and DTR models was selected (ARIMA+DTR), whereas in the other datasets, nonlinear combinations were the ones with the lowest validation errors. For evaluation and comparison purposes, seven relevant methods found in related literature were implemented using the five datasets discussed in Sect. 4.1. Mentioned models were as follows: ARIMA [31], MLP [32], SVR [33], DTR [34], ARIMA+MLP [13], ARIMA+SVR [18], and SVR(ARIMA, SVR) [21]. Model performance was evaluated first through MSE and MAE values presented in Tables 5 and 6, respectively. It is important to notice that the lowest error values achieved in each database is highlighted with bold numbers, meaning the corresponding prediction model was the best performing amongst all evaluated in the experiment. In Table 5 for MSE, the worst performances across datasets were not all attributed to a unique model. In some cases, DTR delivered the highest errors (as in CALCE), in others, it was ARIMA (as in SNL), and in others, it was SVR (as in UL-PUR). This points to the fact that datasets have different characteristics that are not always perfectly predicted by a specific type of model, in some of them, even ARIMA might perform better than a Multi-layer Perceptron (the series could have a strong influence of linear patterns and not so much nonlinear ones). This only adds to the idea of combination and model selections allowing to improve overall performance. Thus, in CALCE dataset, the DTR model achieved lowest performance, however, when used in a hybrid way (ARIMA+DTR), the proposed model was able to outperform both methods when used individually.
MSE and MAE results also support the statement of the proposed model being better than the ones found in the literature for datasets CALCE, OX, SNL, and HNEI by achieving the lowest error values in both metrics against simple and complex models. This is best represented in Table 7, where performance gain reached 5.27% , 13.51% , 56.67% , and 38.71% , respectively. For UL-PUR, this was not the case as SVR(ARIMA, SVR) performed better (20.55% performance gain), leaving the proposed model to the second place in performance (16.98% gain). Still, this does not discard the proposed model as the best option amongst all options compared as it was confirmed by Nemenyi test later on (Fig. 3).
In error analysis, the proposed model showed evidence of being probably the best method to predict the datasets. However, it is important to determine if the values obtain allow to make the conclusion that the model is statistically better than the others. To achieve such certainty, the post hoc Nemenyi test [35] was implemented to reject the hypothesis of model performances being similar. This allows to examine the ranking of several methods over several datasets determining a minimal critical distance at which any performance would be considered to be different than the other.  Using error values, model average ranks were approximated corresponding the highest one (rank 1.2) to the proposed model and the lowest one (rank 6.6) to DTR. All of the others are placed in between these values, as represented in Fig. 4.
The test was conducted with a significance level ( ) of 0.05, meaning that the probability of the test failing on giving the correct inference is 95% . Also, calculated critical distance for available models and datasets was 2.189250 meaning that if two models find themselves separated by a greater distance, they would be considered to be different. This is specially important for the proposed model case, as it is important to discover if performance observed in error metrics allows to make favourable conclusions. The result of Nemenyi test is represented in Fig. 4, with models ordered in average ranks, and bold horizontal lines represent models that find themselves within a smaller range than the critical distance. ARIMA, DTR, and SVR are placed in the lowest ranks, as it was to be expected by observing their error values. MLP, ARIMA+MLP, and SVR(ARIMA, SVR) are found in the middle range being statistically similar but different from the proposed model, as there is no horizontal line joining them. This confirms the idea of the proposed model being the best performing one in the dataset group as it is the one with highest score, and all of the other models are ranked farther than the critical distance from it.

Conclusions
In this study, a hybrid system with model selection in the residual forecasting stage and combination selection is presented in order to approximate future values of SoC in lithium-ion batteries. The proposed model is formed by an ARIMA model and a nonlinear predictor, which is selected from a pool of nonlinear models according to its performance on validation set. It also selects the best combination function over a pool of options to combine the forecasts of the linear and nonlinear models. The experiments were conducted on five univariate time series containing lithium-ion batteries records that were computed to predict their state-of-charge values.
The results of the experiments highlight that the models with lowest performance were ARIMA, SVR, and DTR. This is supported by error values and Nemenyi test ranks. However, none of the three models mentioned was the worst performer on all datasets. This suggest the variety of linear and nonlinear components in all time series, which would not allow for a single model to predict efficiently all of them.
The ARIMA+SVR and MLP models performed in the middle ranks for all datasets, implying that neural network-based and hybrid models would ensure performance improvement. ARIMA+MLP and SVR(ARIMA, SVR) also help to support this idea and add to the conclusion that combination functions other than aggregation can help predictions to adapt best to a specific time-series characteristics and, therefore, achieve the best predictions.
The proposed model achieved the lowest error values for both MSE and MAE in four of the five datasets used. In the case in which it did not get the lowest value, it was only displaced to second best rank by SVR(ARIMA,SVR) model. To determine if error values were statistically significant enough to conclude that the proposed model performs better than the other seven methods, a Nemenyi test was conducted.