Towards FAIR Data for Low Carbon Energy - Current State and Call for Action

13 With the continued digitization of the energy sector, the problem of sunken scholarly data investments and forgone opportunities of harvesting existing data is exacerbating. It adds to the problem that the reproduction of knowledge is incomplete, impeding the transparency of science-based evidence for the choices made in the energy transition. We comprehensively test FAIR data practices in the energy domain with the help of automated and manual tests. We document the state-of-the art and provide insights on bottlenecks from the human and machine perspectives. We propose action items for overcoming the problem with FAIR and open energy data and suggest how to prioritize activities. This study is the ﬁrst to assess and document FAIR and open data practices in the energy domain. We test 80 databases that are representative of data ﬂows in the energy system with the help of manual and machine-based assessments. The comparison offers several novel insights, suggesting how to move forward in and with the community. We recommend

. The energy system with human and machine agents at the center. The top layer details human actors in the energy sector, engaged in the production, distribution, and/or consumption of energy services. Their decisions and behaviors define the objectives and constraints of the energy system. This information is delivered through bilateral heterogeneous data bundles, that are taken up by smart energy technologies to monitor and steer the energy system infrastructure (bottom layer). technologies such as data-driven regulatory systems with feedback and adaptive behavior, infrastructure control, flagging of 46 alerts, real-time monitoring, data-driven compliance and regulation. The importance of data is expected to further grow in 47 the future with the continued digitization of the energy sector, in particular with the broad introduction of smart and AI-based 48 technologies in support of decision-making and real-time system adaptation 10, 12 . Moreover, algorithm-based strategies to 49 identify technological solutions are increasing. An example is the automatized material selection and design without the need of 50 expensive and/or risky experiments 13 . Consequently, market opportunities for sharing data are expected to grow tremendously, 51 provided that the bottlenecks for doing so are removed.

52
The main task of machine agents is to support the infrastructure needed to deliver energy services (3rd layer). This includes 53 the extraction and harvesting of energy resources, the conversion between different forms of energy to useful energy, the 54 distribution of fuels, as well as the operation and maintenance of the energy equipment across time and spatial scales. Data 55 streams flowing between the top and the middle layer are input to machines in the form of signals, objective functions, and 56 constraints. These include taxes on energy fuels, R&D programs, energy security targets, health and sustainable development 57 goals as well as data security and privacy requirements. The third layer exchanges data with smart energy technologies to 58 provide the foundations for humans and machines to manage the necessary energy infrastructure.  data and not the data themselves. In these cases, the design is not suitable for machine access. Examples include drop-down 96 menus or hover boxes for value selection. Accessibility to (meta-)data is mostly impeded because metadata are not persistent.

97
Among the bottlenecks for Findability are missing metadata pointers, long-term and stable access to data, and searchability of 98 (meta-)data. At the same time, we do not observe lower scores for both machine and manual assessments. The same observation 99 holds for the Accessibility criteria. The simple reason behind it is a selection bias -we study databases that are findable and 100 accessible -at least through a website. This would change if a scalable, automated test, as suggested by 16 , existed (or even 101 crawling websites to find assessment candidates). Reusability is an issue because machines do not find license information, 102 even if available for humans (hence the binary scoring results for machines). For Interoperability to work, data would need a 103 much better standardized description of what they are about. A good example is the approach proposed for the smart grid by 9 .

104
However, we also find that Interoperability is assessed most differently from the human and machine perspective (Fig. 3). A 105 lack observed from a human point of view is that only islands of standardized knowledge representation and terminology exist; 106 even less often they are interlinked, which would allow for the navigation of data and metadata from one field of expertise 107 to the next. From the machine perspective, a standardization of vocabularies along with the pointers to its place of definition 108 is indispensable. For example, while "Kilowatt hour" bears a meaning to users of energy data, machines need a semantic The test allows scoring comparable to the machine test (12 tests) and was the only one available when this study was started.

172
Although both tools have their own set of test questions, a mapping between is possible at the level of each of the FAIR 173 principles. The supplementary material details the approach and connected scores (Table 5). 174 30 assessments were carried out manually, while 80 tests are machine-based. The rationale for this is that we were already 175 able to identify systematic patterns and more would not have led to more. The number of assessed databases has been decided 176 with emergence of generalizable patterns for the state-of-the-art. Note also that manual assessment was carried out before and 177 after a briefing on how to assess, with the intention to detect the amount of subjectivity of tests. Fig. 3 shows the example for 178 the interoperability criteria, for others see the Supplementary material.

179
The supplementary material provides further methodological details (Section 2), results from manual tests (Section 3.2), 180 results from machines test (Section 3.3, spreadsheet) and aggregate scores for the comparison (Section 3.1).