OSeMOSYS Global is a workflow management system that executes a series of Python scripts and terminal commands in a specific order to create, solve, and visualize the results from an OSeMOSYS model. This section will describe how the workflow is constructed, how the scripts and commands behind the workflow function, and what data is used in the workflow.
Workflow
Creating a flexible system for policy makers and researchers to investigate decarbonization pathways at differing time and geographical scales is a key component of OSeMOSYS Global. This requires a workflow that will dynamically update with each scenario run. To generate scenarios, users modify a single configuration and execute the workflow, providing a low barrier of entry that requires minimal to no prior Python and OSeMOSYS knowledge. Moreover, as OSeMOSYS Global grows and develops, having a mechanism to easily add in new user defined assumptions, such as technology retirement requirements or emission limits, will be required. A popular open-source tool used to manage these types of flexible workflows is the Python package Snakemake[11] [35].
Snakemake provides a system to create scalable and flexible workflow pipelines using Python syntax. In Snakemake, rules are defined which have input files, output files, and an associated script or shell command. The input and output files can either have a specific filename or a generic filename, allowing for reusable rules. Based on the defined input and output files, Snakemake will determine how the rules are linked and execute them in the correct order, parallelizing the work where possible. Furthermore, Snakemake will search through the workflow before running to only execute rules downstream of where changes have been made. This can save significant processing time, as during scenario iteration the workflow will skip initial data processing steps if changes only occur to data at a later stage in the workflow.
The high-level overview of the OSeMOSYS Global’s Snakemake workflow is shown in Figure 7 below, and later discussed in more detail. The user starts by inputting parameters into the configuration file and running the workflow, which will execute all steps shown automatically. To start, Python scripts process the raw data into formatted scenario data. The Python package otoole is then used to create an OSeMOSYS compatible data file, which is combined with the OSeMOSYS model file to create a solver agnostic linear programming file. The model is solved and the results are processed and visualized. While this is an automated process, all files are exposed to the user for exploration.
Input Data
The input data for the scripts is largely based on work by Brinkerink et al. [8] which consists of an openly available public global power system dataset as part of the work for the developed ‘PLEXOS-World’ power system model [36]. The dataset is composed from a wide range of public sources which, among others, includes global power plant capacities, power plant specific capacity factor time series for hydro, solar, and wind, historical electricity demand time series for all countries, and existing cross-border transmission capacities; refer to [8] for a full overview of the dataset and all its used sources. Next to the above input data, the OSeMOSYS global scripts make use of tools for demand projections based on earlier work by Brinkerink et al. [37] as well as transmission line specific cost projections for all global transmission pathways [38]. The OSeMOSYS Global scripts are directly connected to the PLEXOS-World dataset depositories and automatically extract the relevant data.
Additionally, OSeMOSYS Global pulls data from a variety of other sources. Capital costs and fixed operating costs for power plants are extracted from the World Energy Outlook [39], while the variable operating costs are taken from the World Bank Commodity Market Outlooks [40]. Power plant operational lifespans are from the NREL Annual Technology Baseline [41], while emission factors for each fuel are from the United States Environmental Protection Agency [42]. Finally, renewable resource potentials that determine the availability for future investments within the model are based on [43] for hydro, [44] for solar, and [45] for wind.
Model Architecture
OSeMOSYS Global’s workflow can broadly be categorized into two processing steps. The first step is obtaining and formatting scenario data, while the second step is building, solving, and visualizing the scenario results. This section will describe these two steps in further detail.
Scenario Creation
The core of OSeMOSYS Global is a series of Python scripts that reads in raw unformatted data and processes it into an OSeMOSYS compatible format based on user selected parameters. This is represented by the items in the first row of Figure 7.
Firstly, the raw data is processed into formatted CSV files that hold data for the world (all 265 nodes). Included in these processing steps are creation of data related to electricity demand, timeslice structure, and generation technologies. Next, the user defined geographic filter is applied to create the scenario specific data.
The first major component of the data processing involves demand forecasting. The “Demand Data” script is used to project electricity demand values which are based on a multivariate linear regression approach with GDP at purchasing power parity per capita and urbanization shares as independent variables, and electricity consumption per capita as the dependent variable. Details on this approach, including its limitations such as the fact that as of now the projection does not take into account factors that can affect demand projections, such as electrification, can be found in [37]. Moreover, the demand projection script creates a series of visualizations, with an example shown in Figure 8. These graphs allow the modeller to confirm if the demand projections are reasonable and to identify outliers due to the demand projection algorithm. Note that by default countries are grouped at a continental level during the projection, however, users are free to group countries in a different manner for example by relative development status.
The next major data processing component involves creating power plants and their associated parameters. The “Power Plant Data'' script creates a structured output of mining and powerplant technologies. A mining technology generates the raw resources that electricity generators use, such as natural gas, coal, wind, or water, to allow tracking of raw material usage. Power plant generators are the technologies that produce electricity, and will operate on one of the mined fuels. This script will also combine all plants of a given type in a single node together. This is needed because the input data specifies plants at an individual station level, while OSeMOSYS Global will represent all plants in a given region as one representative station to limit computational complexity. While the existing capacity of all technologies is included, resource limits on biomass, geothermal, and wave power were not implemented due to a lack of consistent global data on these resources. Instead, users are advised to consider the specific policy questions they wish to address and how specific resource constraints may impact these results.
Finally, the “Timeslice Data” script will generate the time slice structure defined by the user in the configuration file, and update parameters that rely on time slice definition respectively, such as the capacity factor and electricity demand profiles. OSeMOSYS Global uses representative days to timeslice the model [13], [29]. This approach involves representing a time period using average values for variable parameters, such as loads or renewable generation profiles, over a specified time period. In OSeMOSYS Global, a representative period can be as fine as 24 hours per month (288 total time periods per year) or as coarse as one representative period per year. The model horizon can be set for any interval between the years of 2015 and 2100. With future model releases, the time slicing structure can be further enhanced through introducing more data or data clustering methods [46].
Model Processing
Once the scenario CSV data has been created, it is processed into an OSeMOSYS compatible datafile, the model is built and solved, and result data tables and graphics are generated. To transform the CSV data describing a specific scenario into an OSeMOSYS compatible datafile, the Python package OSeMOSYS Tools for Energy[12] (otoole) is used. The output datafile from otoole is fed into the open-source GLPK[13] tool with the OSeMOSYS model to create a solver independent linear programming file. This file can then be called by any solver described in the configuration file (CBC[14], CPLEX[15], or Gurobi[16]). Once solved, the solution file is processed by otoole to generate a complete set of result CSV files. The “Visualization” Python script will then process the CSV results to display a series of summary result tables and result graphs, some of which have been shown earlier in the Results section.
Data Availability
All input datasets for the OSeMOSYS Global model are available from openly licensed sources as identified in the OSeMOSYS Global code repository at [47]. Data generated for the figures and analysis in this article can be relicated using version 0.4.0 of OSeMOSYS Global and modifying the configuration file accordingly.
Code Availability
All code for OSeMOSYS Global is provided in a public GitHub repository at [47] under a MIT license. Instructions on how to run OSeMOSYS Global can be found on the repository. While a basic understanding of Python and OSeMOSYS will be beneficial, it is not required to run the workflow. All users and readers are invited to contribute to OSeMOSYS Global following the contribution guidelines outlined on the repository.