This study clusters solar inverters and wind turbines to help Enlitia’s clients optimize resource allocation and operational strategies by identifying similar assets based on historical power production, meteorological data, and power curve characteristics. The project employs Data Mining techniques following the CRISP-DM methodology, emphasizing data cleaning to handle null values, duplicates , and outliers. For wind turbines, outliers are managed using power curves, while for solar inverters, I-V curves are utilized.
Clustering begins after data cleaning, using algorithms from classical, ensemble, and time series clustering categories. Principal Component Analysis is applied to reduce computational costs while preserving significant data variation. Classical clustering involves five hierarchical, two partitional, one soft, one model-based, and two density-based algorithms, evaluated using four distinct indices. The top three classical algorithms proceed to ensemble clustering, combining the three algorithms via weighted major voting. Lastly, two time series clustering algorithms are applied to pre-processed datasets.
Evaluation of segmentations indicates that time is a significant factor in data variation. Time series clustering consistently produces the best segmentations for both solar and wind datasets.