Data-driven Direct Diagnosis of PV Connected Batteries

Photovoltaic systems are providing a growing share of power to the electric grid worldwide. To mitigate resource intermittency issues, new systems are increasingly being paired with battery energy storage for which ensuring long and safe operation is critical. Unlike more typical battery applications, these batteries will undergo sporadic usage which will prevent the application of traditional diagnosis methods. This work proposes a new methodology for opportunistic diagnosis using machine learning algorithms trained directly on photovoltaic battery charging data. The training was performed on synthetic voltage data under different degradations calculated from clear-sky model irradiance data. Validation was performed on synthetic voltage responses calculated from plane of array irradiance observations for a photovoltaic system located in Maui, HI, USA. An average RMSE of 2.75% was obtained for more than 10,000 different degradation paths with 25% or less degradation on the cells.


Introduction
In recent years solar photovoltaic (PV) technologies provided the most additional generating capacity to the United States grid [1]. In 2021, a record 23.6 GW of solar capacity was installed and, over the next 10 years, it is predicted that 324 GW of new solar capacity will be added to the electric grid, quadrupling current levels [1]. New solar systems are increasingly being paired with battery energy storage systems at multiple grid levels. While some of the storage will be performed by grid-scale batteries, the percentage of residential storage installations has also been steadily increasing, reaching 8.1% in 2020 [2]. It is estimated that by 2025, one in three residential solar systems will be paired with small scale energy storage [1], most likely provided by Lithium-ion batteries [3].
To ensure long, safe, and continuous operation, batteries must be maintained and controlled properly, which includes regular estimation of their state of health. This can be problematic for batteries paired with PV because of sporadic usage in both charge and discharge. As a result of this unpredictability, diagnosis might only be performable under lengthy maintenance cycles. An alternative to avoid downtime could be to identify and take advantage of auspicious conditions to perform state estimation.
With batteries supposed to last a decade or more, this opens the gates for different approaches such as using batteries response under clear-sky conditions under which PV power production is predictable on time-scales of up to twelve hours.
Even if the PV power output offered by clear-sky conditions is predictable, state estimation will still be complex and requiring robust methodologies for Li-ion battery diagnosis. Because the duty-cycles of batteries paired with PV will not be under constant current (CC), the traditionally used features [4] to estimate SOH might di cult to interpret. This favors data-driven methods, and in particular machine learning (ML) methods. However, to be applicable, ML algorithms need to be trained on a wide variety of data covering the sporadicity of the application. Unfortunately, while some market data are beginning to be available publicly [5,6], actual PV connected battery data is not to the best of our knowledge. Few studies are available with battery testing associated with PV duty cycles [7,8], with most of the studies being modeling centered and using constant current testing [9][10][11][12][13][14][15]. Looking at CC data, the lack-of-data problem was recently solved with the introduction of synthetic datasets that enabled the emulation of every possible battery degradation [16][17][18][19]. While the duty cycle for clear-sky irradiance will be more complex, recent work suggest that the methodology used to generate the synthetic data could be applied outside of CC [20] and thus be applicable to irradiance.
This work proposes a new method for diagnosis PV connected batteries using synthetic datasets that would allow for SOH estimation during normal battery operations. The method uses periods of clear-sky conditions, where charging from PV generation is relatively stable and predictable, for diagnosis. This paper describes a framework for (1) generating synthetic datasets of the voltage response of cells charged by PV systems, (2) synthetic dataset training of state-of-the-art ML algorithms, and (3) algorithm validation using synthesized data. This framework is summarized in Fig. 1 with a branch for training and a branch for validation.
The training branch uses irradiance data from a clear-sky model, PV system information (longitude, tilt, orientation), and battery information (chemistry and W/Ah) to generate synthetic cycles consisting of the voltage response of the cells for speci c duty cycles under tens of thousands of different degradations.
While the validation would ideally be performed on data from deployed batteries, no deployed PV linked batteries dataset is publicly available to the best of our knowledge. Even if data was available, the actual degradation of each individual system would not likely be, making validation impossible. In anticipation of data becoming available, we examined the applicability of our approach to real systems by replacing the deployed data by synthetic datasets generated for various sky-clearness levels, Fig. 1. These synthetic datasets are used to validate the applicability of the clear-sky irradiance trained ML algorithms for diagnosis under cloudy conditions. To further emulate realistic conditions, each dataset was calculated on a cell with slightly different parameters to account for cell-to-cell variations and inhomogeneities [25]. A selection of the data generated from this work is available in [26, 27].

Irradiance data selection
The output of a PV system is dependent on irradiance which is the power of the solar radiation striking the panels. Irradiance variability is driven by extraterrestrial and atmospheric effects and is also dependent on panel orientation [28,29]. The latter has become more varied in recent years [3], but panels are nominally oriented toward the equator at a tilt angle near to the latitude of the installation in order to maximize solar energy yield [30].
Clear-sky irradiance occurs during clear-sky conditions, de ned as an absence of visible clouds across the entire sky dome [31]. Clear-sky irradiance is estimated using a clear-sky model (CSM) which calculates solar geometry and accounts for variations in air mass and variations in optical depth [32]. In this work, we used the CSM proposed by Ineichen and Perez [32] for a horizontal surface extended to estimate clear-sky irradiance on a tilted surface in the plan of array (POA) of a PV panel. The extended CSM recomputes the solar angle of incidence, accounts for the reduction of diffuse irradiance received [33], and adds a new ground re ected irradiance source [34]. Northern Tropic, clear-sky irradiance levels incident on a horizontal surface remained high from April to September, but drop during winter. Seasonal variations of clear-sky irradiance incident on the POA of PV panels located at the site are reduced, with peak levels found in spring and summer months. POA irradiance also peaks later in the day (by around 1 hour) relative to horizontal values, due to the panels facing slightly westward instead of due south. To further illustrate seasonal uctuations, data at solstice and equinoxes are presented in Fig. 2 (b) and Fig. 2(c) for horizontal and POA irradiances respectively.
The CSM fails to account for several factors driving real irradiance variability. While the most obvious is cloud cover, uctuations in atmospheric turbidity, shading, soiling, and re ection losses also affect the amount of irradiance available for a PV system. In this work, clear-sky conditions were identi ed using an algorithm that applies a series of threshold criteria tests to compare the smoothness, shape, and magnitude of observed values within a moving window to corresponding clear-sky values from the CSM [35,36]. The algorithm is assigning a daily "clearness" value, determined using the number of observations identi ed as clear-sky over the total number of observations during daytime conditions. The distribution of daily clearness per season for the whole dataset is shown in Fig. 3(b). For almost half of the 2-year dataset, clear-sky conditions were found in less than 20% of the daily observations, however, in nearly one in ve days, more than 50% of observations were identi ed as clear-sky. Moreover, the distribution of daily clearness values indicates only slight seasonal variations at the test site.
To assess how the accuracy of the diagnosis was affected by irradiance variability, 18 days from the 2year dataset were selected to encompass a range of irradiance conditions, Fig. 3(c). The conditions range from a minimum clearness of 4% to a maximum of 84%, with cloud cover occurring at different times.
Cloud effects range from small perturbations, likely due to high cirrus clouds, to signi cant attenuation and cloud enhancement, due to more opaque cloud cover. Shading effects caused by the construction of a nearby building can also be seen in the afternoon hours of the two days in October 2017.

Cell emulation & Duty cycle emulation
As presented in Fig. 1, a digital twin was used to generate the battery data needed to assess the impact of the different duty cycles on battery performance. The battery model included in the twin was based on the 'alawa mechanistic model [37,38]. To parameterize the model and emulate the electrochemical response of the selected commercial Li-ion battery, the half-cell data for both the positive and negative electrodes (PE and NE respectively) were imported into the 'alawa toolbox. The half-cell data was tted to the full-cell by scanning different values for the loading ratio (LR), offset (OFS), resistance (R), and rate degradation factor (RDF) for the PE and the NE. Because the duty cycles simulated in this work were not CC, the 'alawa model needed to be calibrated to correctly simulate rates within the range used by the duty cycles of batteries paired with PV. This calibration required emulations at different rates and veri cation of continuity between the rate-dependent emulation parameters to enable interpolation and extrapolation to other rates. Figure 4(a,b,c) presents the results of the full-cell emulation of the C/15, C/8.5, and C/4 cycles, respectively, based on the half-cell data gathered from the harvested electrodes. The best t had a LR of 1.2 with a 4% offset and a -0.1 resistance correction for the rate-independent parameters. Looking at the rate-dependent parameters, as the simulated rate increased, the RDFs for both electrodes were found to decrease from 0.6 to 0.2 for the PE and from 0.8 to 0.6 for the NE. An additional resistance correction was needed to compensate for peak movements for the RDF PE (RDF corrPE ). This correction ensured that the electrochemical response at different rates overlapped correctly when kinetics is adjusted, which cannot currently be done automatically by the model. The equation for this additional resistance correction is provided in Figure S1 with an explanatory schematic. No correction was needed for the RDF NE . The evolution of the three varying rate-dependent parameters could be tted with power laws y = a*rate b with R 2 ≥0.997 vs. R 2 ~ 0.97 for linear regressions, Fig. 4(d).
Using the best t parameters and equations, the synthetic cell voltage response under the different duty cycles was generated using the method proposed in [20] using solar panel power output as a duty cycle instead of CC. An example of clear-sky solar panel power output is presented in Fig. 5(a). The power is 0 at sunrise, ramps up to its maximum around solar noon, then ramps down to 0 at sunset. As proposed in [20], and in order to simulate this duty cycle, a set of 100 voltage responses were simulated between the lowest rate (minimum power at maximum voltage) and the highest rate (maximum power at minimum voltage), Fig. 5(b). The correct [voltage, rate] couple to match the required power was calculated for each 0.1% state of charge until full charge. Overall, the maximum rate was chosen to be C/6 so that around 95% of the cell capacity is used through an average day (03/21). C/6 is below the highest rate for the which the emulation parameters were deciphered (C/4). This will allow high con dence in the simulation of high loss of active material (LAM), because, with at most 50% degradation, the local rate would at worst double from C/6 to C/3 [20,37], which is still close to the range of experimentally tested rates.
Thermodynamic battery degradation can be decomposed in three degradation modes, the loss of lithium inventory (LLI) and LAM on both the PE and the NE [20,37], because independent of what mechanism is inducing degradation, what will change is how much of each electrode is available to host lithium and how much lithium is able to go back and forth. Each combination of LLI, LAM PE , and LAM NE corresponds to a unique degradation and has a unique voltage signature. Diagnosis of a battery then corresponds to the quanti cation of the three degradation modes. As proposed in [17][18][19], the different degradations were simulated by scanning the entire range of possible combinations for loss of lithium inventory (LLI) and LAMs. Once generated, the data was used to train and validate ML algorithms. More details on the synthetic data generation and the training are provided in the method section.
All the selected ML algorithms used in this work were developed to use features from a derivative of the voltage response under CC as input. In order to determine if these algorithms could be applied to irradiance duty cycles, it was necessary to verify that the associated derivative voltage response still showcased the expected features. Figure 6(a-c) presents simulations of the voltage response of a cell, plotted as incremental capacity curves (IC, dQ/dV = f(V)), for individual degradation modes as calculated using the 'alawa model from the clear-sky irradiance on March 21st. This degradation map is useful to assess the impact of degradation on the voltage response. The voltage evolutions in Fig. 6(a-c) closely resemble the one observed for a traditional GIC/NMC cell tested under CC [17]. This provide con dence that the diagnosis algorithms developed under CC can be used on the data generated from PV irradiance.
Since the simulations were not performed under CC, the voltage response versus time is different than the voltage response versus capacity. This is because capacity corresponds to time multiplied by current, capacity and time are thus only directly correlated if the current is constant. The time vs. voltage data offer a different dataset that could be available for training and validation if features are identi able. Figure 6(d-f) presents the t-based equivalent to the IC degradation maps (IT, dt/dV = f(V)). Despite some deformations, the t-based curves showcase signi cant similarities to their capacity counterparts and are therefore also perfectly suited for degradation mode quanti cation using the selected algorithms. In this work, both the capacity (Q) and time (t) based datasets will be generated and analyzed to determine if a tbased method could be as accurate as a Q-based one.

Diagnosability
Three sets of experiments were performed for this work. More details can be found in the method section.
Training for the rst two sets of experiments was performed on synthetic data generated from clear-sky irradiance for the spring equinox (3/21). The spring equinox was selected because its POA clear-sky irradiance is close to the yearly average. For the initial set of experiments, representing an ideal case, validation was performed using the same data as the training. For the second set of experiments, aimed at quantifying the impact of seasonal variability on diagnosis accuracy, validation was performed using synthetic data generated from clear-sky irradiance for the 1st of each month. Finally, for the third set of experiments, to test the impact of cloud cover, training was performed on synthetic data generated from clear-sky irradiance for the 18 cloudy days detailed in section 2.1 and validation was performed using synthetic data generated from observed irradiance for those speci c days.

First set, same day training & validation on clear-sky irradiance
In order to test if the ML algorithms trained on clear-sky irradiance were able to diagnose different battery degradations, they were rst validated using the same clear-sky irradiance. The only difference between the training and validation datasets were the cell parameters that were slightly varied to take cell-to-cell variations into consideration (cells 1 and 2 in Table S1, see methods for more details). The rst 4 rows of Table 1 present average root-mean-square deviation (RMSE) between the real and predicted values for more than 100,000 different combinations of the three degradation modes. The algorithms were all able to quantify each degradation mode properly with RMSEs of 2.1%, at worst. Since smaller RMSEs were observed for lower degradations, Figure S2(a-e), Table 1 presents the average RMSEs for the Q-and tdiagnosis for degradations with at most 25% and 50% of each degradation mode. Statistics for the full dataset with additional metrics such as the mean absolute error (MAE) and Pearson's correlation coe cient (ρ) are provided in Table S2. Overall, RMSEs below 0.85% for 25% or less degradation and 1.66% for 50% or less degradation were observed. Looking at the Q-diagnosis (top two rows), all algorithms performed nearly identically with average RMSEs around 0.70% for 25% or less degradation and 1.5% for 50% or less degradation. The average RMSEs of t-based data (rows 3 and 4) were similar but the individual algorithm performance varied. XGB, FNN, and 1DConv showed similar RMSEs, 1DConv showed a lowest average RMSE at 0.37%, and DTW-CNN RMSE nearly doubled compared to its Q-based counterpart. From the complete statistics in Table S2, it can be seen that LLI seems the easiest to diagnose before LAM PE for Q-based diagnosis while the opposite holds true for t-based ones. In both cases LAM NE was the hardest to decipher.
In general, all the calculated RMSEs were small, below or near 2.1% at worst, for more than 100,000 tested degradations up to 50% degradation, demonstrating that data from irradiance duty cycles can be successfully diagnosed for the ideal case of a single day with no cloud coverage at all. The next sections will quantify the diagnosability cost associated with irradiance variability on time scales longer than one day and cloud coverage.  Fig. 2(a). This was done twice with two different set of cell parameters (details are in Table S1, cells 5 to 28) to investigate the impact of cell-to-cell variations at the same time. This impact will be assessed by comparing the diagnosis statistics for the two different batches of cells (12x43,000 data points).
The impact of irradiance variations was signi cant as the average RMSE increased by 1.6% for Qdiagnosis and by more than 2% for the t-diagnosis compared to the ideal scenario, Table 1 bottom 4 rows. The three NN methods were the best performing for Q-diagnosis, with RMSE below 1% for 25% or less degradation and below 2% for 50% or less degradation. For the t-diagnosis, all the algorithms but DTW-CNN performed similarly, with RMSE slightly over 3% for 25% or less degradation (around 5% for 50% or less degradation). LAM NE diagnosis still had the highest RMSE with LLI and LAM PE RMSEs being very close. LAM PE RMSEs were lower for Q-diagnosis and the LLI ones for t-diagnosis. Cell-to-cell variations were negligeable with on average 0.4% MAE with a 0.7% standard deviation between the two sets, Figure S2(f).

Third set, impact of cloud coverage
The third set of experiments examined the impact of different cloud coverages. This corresponds to the validation using observations. To remove the effects of a time difference between training and validation data (see section 2.3.2), training and validation data correspond in time. For each of the 18 cloudy days detailed in section 2.1, validation of algorithms trained on clear-sky irradiance was performed using irradiance observations for that day which included cloud effects.  Table S3 and S4.
Overall, for degradation paths with less than 25% of each degradation modes, the RMSEs were in the 1.75 to 3.6% range for all algorithms for Q-diagnosis and in the 4.4 to 5.2% range for t-diagnosis. Focusing on clearer days reduced the RMSE signi cantly, to below 1% (FNN, 1DConv, DTW-CNN) for Q-diagnosis and 2.5% (XGB, RF, 1DConv) for t-diagnosis. This highlights the validity of diagnosis for days with cloud coverage.

Discussion
This study provides the rst application of synthetic datasets for non-CC simulations. Because current was not constant, capacity and time were uncorrelated, which offered an opportunity to study two different datasets, V vs. Q and V vs. t.. While using voltage versus capacity is more traditional, it might not be the best solution for deployed systems because the V vs. t dataset should be less error-prone than the V vs. Q one as capacity is not directly measurable, but derived from time and current [39]. The V vs. Q dataset is however expected to be easier to diagnose because the area under a dQ/dV peak corresponds to capacity and, at low rates, it is independent of the applied current because it has a nite value.
Therefore, current variations should have a limited impact on the overall peak shape and intensity. This is why the voltage responses showcased in Fig. 6(a-c) are really similar to the signature under CC [17] despite the current varying. This is not the case for dt/dV peaks, because, while capacity stays the same with varying current, the time taken to complete the peak will be different. Therefore dt/dV peaks are much more sensitive to changes of current than dQ/dV ones. This sensibility explains the differences observed between Fig. 6(d-f) and Fig. 6(a-c) and why the t-diagnoses average errors were on average more than double the Q-based ones. The increased error was especially visible when the validation was done on a duty cycle different from the one used for training. Figure 7(a) plots the RMSE variations as a function of the month of the year for algorithms trained on one day only (second set of experiments). The Q-based RMSE showcased little to no effect of the month of the year whereas the t-based ones varied signi cantly with a minimum close to the training day (March and April) and in fall (September and October) when irradiances are the most similar to the one used for training (03/21), Fig. 2(b). The difference was also much more pronounced for cloudy days and aged cells. Therefore, although tdiagnosis is more interesting on paper, it might not be the best solution where clear-sky did not signi cantly dominate, at least for the tested algorithms. Figure S2, Table 1 and Table 2, showed the performance degradation with increasing degradation percentage. This decline in performance can be explained by multiple factors. Although data imbalance during training could be a possible factor, as 2/3rds of the training data has a degradation below 25%, the main factor seems to come from the fact that small variations in one of the three degradation modes are hard to quantify when at least one of the other two modes has large variations. This is exempli ed in Fig. 7(b,c) where a distribution of the estimated vs. true values for LLI and the DTW-CNN algorithms are plotted for 50% or less and 25% or less degradation. For the 50% or less degradation, there is a haze around the 1:1 line below 20% LLI that disappears when the maximum degradation is set at 25%. This indicates that the error mostly comes from degradation paths with low LLI but at least one LAM above 25%. This will be investigated further in future work.
Looking at the detailed statistics, it can be seen from Table 2, S3 and S4 that, although the algorithms performance was really close, some differences were noticeable. Overall, the DTW-CNN algorithm offers the best performance for Q-diagnosis while 1DConv is signi cantly better for t-diagnosis for degradations below 25%. Moreover, the algorithms are not all affected the same by the change of duty cycles. This is especially visible on Fig. 7(a) were the performance of the t-diagnosis much more affected for winter days than of summer days for RF while the opposite is true or DTW-CNN. The other three algorithms being impacted the same. Future work will investigate in details the parameterization of the algorithms to improve performance. Looking in more details, it appears that the largest errors were always observed for LAM NE estimation (Tables S2-S4). This could be explained by the fact that LAM NE cannot be directly inferred from any feature of interest of the IC or IT curves. For the other two degradation modes, and as showcased in [17], the intensity of high voltage shoulder is in most cases directly proportional to LAM PE and the intensity of the main peak to LLI. This is NMC speci c and different results are expected for nonlayered oxides such as batteries based on LiFePO 4, where LAM PE should be much harder to quantify than the other two. A possible solution to improve the accuracy of LAM NE estimation for the current algorithms could be to train the algorithms on DV curves on top of the IC curves as LAM NE is, in most cases, directly decipherable from the DV curves. Figure 8 presents the RMSE variation for all the cloudy days tested in this work sorted by their clear-sky percentage with the associated actual irradiance vs. time curve as inset. Overall, clear-sky percentage is a useful indicator of diagnosability, although some other parameters also come into play. In general, the RMSE increases as clear-sky percentage decrease. However, there were some duty cycles that showed abnormal high (e.g., 34% clearness and 59% clearness) or low (e.g., 27% clearness) RMSE, indicating that the intensity and time of the cloud coverage could also play a major role in diagnosability. Together with cloud coverage, the type of diagnosis, Q-based or t-based also has a role. For example, for a day with 59% clear-sky, Q-diagnosis was better than normal and the t-based one far worse, while for a day with 34% clear-sky, the opposite was true. Finding the right set of parameters to identify which days are more auspicious for diagnosis will require more work but, from these results, it is clear that the use of synthetic data will be instrumental in testing the impact of different classi cation schemes.

Conclusions & Outlook
This paper proposes and validates a new approach for the diagnosis of batteries paired with PV using synthetic data. This approach allows the degradation of PV-connected batteries to be diagnosed without the need for maintenance cycles by using state-of-art machine learning algorithms. Diagnoses have an average RMSE of 2.75% for more than 10,000 different degradation paths with 25% or less of the three thermodynamic degradation modes. Because diagnosis was done outside of constant current, the capacity-and time-based information could be decorrelated and compared. Time-based diagnosis was shown to be less accurate than its capacity counterpart for the tested algorithms. However, the accuracy of the both types of diagnosis is satisfactory for days where clear-sky dominates. For days with lower clear-sky conditions, accuracy depends on the clear-sky percentage, but additional factors such as the time and duration of cloud coverage also come into play. These factors will be investigated further in future work.
The framework presented here proved that opportunistic diagnosis of PV connected battery is possible from auspicious cloud coverages. Based on our results, and for the studied system and location, diagnosis could be possible one out of every ve days independent of the season which is more than frequent enough for batteries supposed to last 3,500 days or more. This number might be different in other locations where shadowing or snow could play a signi cant role but it could be easily assessed using the framework presented here. This highlights the signi cant bene ts of using synthetic data to understand the expected variations of voltage response for different cloud coverage and to develop adapted diagnosis tools, particularly as real photovoltaic battery degradation data is not yet available.
Despite promising results, there is still a signi cant amount of work to be done before this technique can be applied to deployed systems. There is a need for training under a wide array of different conditions, as PV systems in the eld will have varying orientation, tilt, location, cleanliness, etc… Moreover, this work was performed on single cells and without considering any additional usage on the cells. Real systems will be composed of battery packs, which will have varying voltage response due to inhomogeneities and imbalance. Furthermore, these batteries will likely be used at the same time they are charged, which will further modify the duty cycles. The validation framework provided here can be applied to study these case gures and future work that addresses the complexity of different locations, modules, and possible additional load on batteries. The proposed framework might even be applicable to other types of intermittent renewable power systems for which storage could be considered, such as wave or tidal energy.

PV data acquisition
The PV testbed used in this work includes instrumentation for high-frequency PV and solar resource monitoring, including a Kipp & Zonen SMP21-A secondary standard pyranometer, which is installed in the POA of the testbed PV panels. The data was collected at 1 second intervals and averaged to 1 minute for 2 years.

Battery testing
The commercial cells used in this work were provided by an industrial partner and are composed of a graphite intercalation compound (GIC) based negative electrode (NE) and a Nickel Manganese Cobalt oxide (NMC) positive electrode (NE) with a 1:1:1 stoichiometry. The industrial partner also provided the full cell cycling data with C/15, C/8.5 and C/4 cycles performed on a pristine cell. The half-cell data was harvested by opening a commercial cell from the same batch. The cell was slowly discharged to 2.0V before being opened in a glove box. The double-sided electrodes were rinsed with dimethyl carbonate and one side was scrubbed using N-Methylpyrrolidone before 1.8mm diameter electrodes were cut using an EL-CUT punching tool (EL-CELL, Hamburg, Germany). Half-cells were assembled in PAT-CELLs (EL-CELL, Hamburg, Germany) using a standard polypropylene sleeve, a borosilicate glass ber separator, a metallic Li NE, as well as an electrolyte composed of ethylene carbonate and propylene carbonate in a 1:1 ratio with 1M Lithium hexa uorophosphate and 2% vinylene carbonate (all Sigma-Aldrich USA). For the testing, the cell formation consisted of 8 cycles at C/10 followed by 1 cycle at C/25 between 3.2V and 4.3V for the PE and 0.02 and 1.2V for the NE. After the formation cycles, the cells were tested at C/50, C/25, C/15, C/8, C/4, C/2, and C/1 with residual capacity measurements at C/50 for each regime following the protocol described in [40].

Synthetic data generation
The synthetic data used in this work, both for training and validation, was generated using the method described in [17][18][19] by scanning the entire range of possible combinations for loss of lithium inventory (LLI) and LAMs. Because the duty cycles have maximum currents below C/6, only the thermodynamic degradation parameters will be considered in this work. The maximum value for the degradation modes was set at 50%. For the main training dataset, the composition resolution was set at 1% (5,000 triplet tested) with at most a simulation every 0.5% for each degradation modes (>125 simulations per triplet from 0 to 50%). This resulted in around 700,000 unique voltage response for training. Additional training on different duty cycles was done with a 2.5% resolution with 1% steps to limit le sizes. This corresponds to more than 850 different triplets and 43,000 unique voltage curves. For the validation datasets, the resolution was decreased to 5% (225 triplets) with 1% steps (50 simulations per triplet) resulting in around 11,000 curves per condition.
Finally, to avoid any over tting error by training and validating on the same data, each simulation will be performed on a slightly different cell, i.e., a cell with emulation parameters (LR, OFS, R, and RDFs) randomly varied by ±1% to be in the same range as observed cell-to-cell variations in commercial cells [41]. The overall parameters for each simulation with the associated duty cycles are summarized in Table   S1.

Diagnosis algorithms
In this work, the leading machine learning algorithms for degradation modes quanti cation were used to validate our approach. A thermodynamic degradation modes diagnosis corresponds to the quanti cation of LLI, and LAMs for the positive and negative electrodes (PE and NE respectively) [37,42]. Such quanti cation provides more information than a simple capacity estimation and enables prognosis [17].
The selected algorithms can be divided into two categories, decision tree ensemble methods and neural networks. Decision trees are deterministic models that rely on multiple conditionals while neural networks follow a probabilistic approach in which they seek to learn by activating arti cial neurons. For this work, Random Forest (RF) [21] and Gradient Boosting Trees (XGB) [22] algorithms were selected for the decision trees and Feed-forward neural network (FNN) [23], 1D-CNN (1DConv) [16], and the DTW-CNN approach [24] were selected as neural networks. It is important to note that in all cases the models use the raw derivative voltage curves as input except for DTW-CNN, which uses images created from the DTW matrix between the pristine and the degraded derivative curves. This allows to transform voltage changes into images that re ect the degradation and enables the use of 2D CNNs, which are widely known in the literature to work remarkable well with images [24].
In terms of implementation, and for the decision tree ensemble methods: The sklearn library [43] was used to implement the Random Forrest, speci cally the ensemble module with the RandomForestRegressor algorithm, the hyperparameters were max_depth, and n_estimators.
For the neural networks, all the models were implemented in TensorFlow [45]. The models con gurations were as follows: The hyperparameters to be set in these three cases were the batch size and learning rate.
The WandB framework [46] was used for hyperparameter tuning and callbacks were used during training to relegate the training stop condition to the validation error instead of the number of epochs.
In this work, validation comprised varying initial conditions for the ML algorithms to produce model output that is compared against some truth to generate error statistics, which are used to quantify the experiments.
Further details regarding the experimental setup and the source code to reproduce the experimental results are available in the following public git repository: https://github.com/NahuelCostaCortez/PVDiagnosis.
All the sharable data from this work is available here [26,27] with the actual PV data and the synthetic cycles for perfect irradiance and cloud coverage.           the entire dataset and (c) all degradation modes below 25%.