To provide a complete and clear illustration for producing RDF-based semantics, a systematic framework was proposed as shown in Fig. 1. This contained 3 layers, namely data modelling and extraction, as well as RDF information management.
3.1. Data Modelling
Based on the first layer, data modelling was carried out by creating the appropriate database schema, to store all objects in the OAS document. This was conducted through the implementation of the normalization techniques. The informal database was also designed by translating each OAS object into a table, to capture all OpenApis document fields without considering database constraints such as primary and foreign keys (PK and FK). Moreover, the relationship between the tables was normally determined. The final result was a relational database design, which had the characteristics of a well-structured table, based on avoiding redundant data, anomaly problems, and manipulation, as well as meeting the normal form (NF) rules [29]. The normalization process was carried out by evaluating the initial table structure (as a parent table) and eliminating repetitions (as a child table). Subsequently, the relationship between the parent and child tables was defined by determining the primary and foreign keys (PK and FK). Fig. 3 is an example of applying the normalization technique for the OAS Info section (Fig. 2), to become a relational database schema that was connected from one table to another.
The implementation of the database normalization technique produced 26 tables, which transformed all the parts and objects within the OAS [30]. The results were then grouped into strong and weak types, which had the potentials to become a parent and child table, with and without a PK, respectively. Furthermore, the tables within the database became a data model, whose quality was improved through important RDF concepts, which eliminated ambiguity, especially when sending information [31]. This was because RDF supported the interoperability of information exchange between applications, which was mechanically understood and had a data graph format to represent each statement. In a graphical illustration, the nodes and arrows represented entities and their relationships. This explicitly indicated that RDF formed a triple group, namely subject, predicate, and object, to explain a semantic statement. According to the principle of linked data, the subject should be a class with a URI, to explain its meaning. The predicate also contained the properties that described the relationship between classes and literal values. Meanwhile, the object was a class or a literal value. This indicated that each subject was connected to the object through the predicate. The selection reference for triple RDF classes and properties was subsequently found in the vocabulary, which described several service elements. Terminology also became a knowledge that expressed the meaning and connection between data, as recent trends related to the formal forms of connectivity provided reusable semantics to support automated composition methods [16]. To describe a resource, the use of common vocabulary facilitated a global understanding of the meaning, according to its relevance irrespective of the origin [32]. Table 1 presents a list of selected vocabulary for the development of the SWS.
Table 1
List of Adopted Vocabularies
# | Prefix | URI |
1 | schema | http://schema.org |
2 | http | http://www.w3.org/2011/http# |
3 | jsonsc | http://www.w3.org/2019/wot/json-schema# |
4 | cnt | http://www.w3.org/2011/content# |
According to Table 1, the classes and properties contained in the vocabulary were mapped to each relational database illustration. This showed that a vocabulary was attached to a table based on the number of classes and properties provided to meet the data meaning needs. For example, the @schema vocabulary was attached to a group of interconnected tables (Fig. 4), to describe the information related to service owners.
Based on Fig. 4, two classes were observed for selection, namely Service and APIReference. The service class describes the programs provided by an organization, e.g., delivery and printing events, etc. This class was adopted to provide meaning to the service table, leading to the emphasis on the interpretation of a web program. Meanwhile, the properties of the Service Class were selected to emphasize the relationship between service, "servicecontact", and "servicelicense" tables, which were described with their supporting data. Furthermore, the APIReference class is defined as a reference document used to describe an Application Programming Interface (API). This class was very suitable for adoption, to define the meaning of the "externaldocument" table. Subsequently, the relationship between service and externaldocument table was explained by using the "isRelatedTo" property. After all data models are mapped into classes and properties, an ontology capable of describing the positions and relationships between several groups was constructed. According to open standards and data structures, ontology creation aims to identify information connectivity and develop common semantics [33]. Moreover, Fig. 5 describes ontology as the basis for developing appropriate SWS with the data model generated through the selected vocabulary. This indicates that a labelled arrow represents the relationship between ontology classes. The direction of the arrow also determines the range (object) belonging to a class (acts as a subject) and has a predicate according to the label. For example, the http:resp predicate links the http:Request class as the subject, with the http:Response class being observed as the object.
As a formal form of concept specification and database schema (data model), ontology became the main basis for RDF creation. To translate the database model into RDF, R2RML was used. This is a language recommended by the W3C for customizing relational database mapping (as an RDF data model) into an RDF dataset. Based on this study, the RDF data model was structurally mapped with a vocabulary into a set of triple RDF, which represented resource information (in the semantic context, web services are components that explain the capabilities of web in OAS documents) as the interrelated graphs between one node and another. In R2RML, the input was a relational database model that matched the schema, while the output was a triple RDF according to a predetermined mapping. Moreover, the data model mapped using R2RML was a table, view, or a Structured Query Language (SQL). In creating R2RML mapping, the determination of subject-predicate-object was also determined from each data model represented by relational database tables. When a table was observed as an RDF class, the PK column became an identifier and a reusable resource. Meanwhile, other columns in the table completed the meaning of the resource. For example, 3 interconnected illustrations were found when observing the table schema (Fig. 3). The service table was found to be the parent and had a "serviceid" column acting as a PK. However, the "servicecontact and servicelicense" tables were the child, related to the service table through the "serviceid" column. The mapping of the data model subsequently stored in the service, servicecontact, and servicelicense tables is described as follows,
Mapping 1. R2RML mapping for table service, servicecontact and servicelicense |
01. | @prefix rr: <http://www.w3.org/ns/r2rml#> . |
02. | @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . |
03. | @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . |
04. | @prefix schema: <http://schema.org/> . |
05. | @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . |
06. | @prefix : <http://sws.itbsmartcampus.id/ont#> . |
07. | |
08. | <#Service> |
09. | rr:logicalTable [ rr:sqlQuery "SELECT SERVICEID, TITLE, DESCRIPTION, TERMOFSERVICE, VERSION FROM SERVICE" ]; |
10. | rr:subjectMap [ |
11. | rr:template "http://sws.itbsmartcampus.id/service/{SERVICEID}" ; |
12. | rr:class schema:Service ; |
13. | ] ; |
14. | rr:predicateObjectMap [ |
15. | rr:predicate schema:brand ; |
16. | rr:objectMap [ |
17. | rr:column "TITLE" ; |
18. | ] ; |
19. | ] ; |
20. | rr:predicateObjectMap [ |
21. | rr:predicate schema:description ; |
22. | rr:objectMap [ |
23. | rr:column "DESCRIPTION" ; |
24. | ] ; |
25. | ] ; |
26. | rr:predicateObjectMap [ |
27. | rr:predicate schema:termOfService ; |
28. | rr:objectMap [ |
29. | rr:column "TERMOFSERVICE" ; |
30. | rr:termType rr:IRI ; |
31. | ] ; |
32. | ] ; |
33. | rr:predicateObjectMap [ |
34. | rr:predicate schema:version ; |
35. | rr:objectMap [ |
36. | rr:column "VERSION" ; |
37. | ] ; |
38. | ] ; |
39. | rr:predicateObjectMap [ |
40. | rr:predicate schema:license ; |
41. | rr:objectMap [ |
42. | rr:parentTriplesMap <#ServiceLicense> ; |
43. | rr:joinCondition [ |
44. | rr:child "SERVICEID" ; |
45. | rr:parent "SERVICEID" ; |
46. | ] |
47. | ] ; |
48. | ] ; |
49. | rr:predicateObjectMap [ |
50. | rr:predicate schema:contactPoints ; |
51. | rr:objectMap [ |
52. | rr:parentTriplesMap <#ServiceContact> ; |
53. | rr:joinCondition [ |
54. | rr:child "SERVICEID" ; |
55. | rr:parent "SERVICEID" ; |
56. | ] |
57. | ] ; |
58. | ] . |
59. | |
60. | <#ServiceLicense> |
61. | rr:logicalTable [ rr:sqlQuery "SELECT NAME, URL, SERVICEID FROM SERVICELICENSE" ]; |
62. | rr:subjectMap [ |
63. | rr:template "{NAME}" ; |
64. | rr:termType rr:BlankNode ; |
65. | ] ; |
66. | rr:predicateObjectMap [ |
67. | rr:predicate schema:name ; |
68. | rr:objectMap [ |
69. | rr:column "NAME" ; |
70. | ] ; |
71. | ] ; |
72. | rr:predicateObjectMap [ |
73. | rr:predicate schema:url ; |
74. | rr:objectMap [ |
75. | rr:column "URL" ; |
76. | ] ; |
77. | ] . |
78. | |
79. | <#ServiceContact> |
80. | rr:logicalTable [ rr:sqlQuery "SELECT NAME, URL, EMAIL, SERVICEID FROM SERVICECONTACT" ]; |
81. | rr:subjectMap [ |
82. | rr:template "{EMAIL}" ; |
83. | rr:termType rr:BlankNode ; |
84. | ] ; |
85. | rr:predicateObjectMap [ |
86. | rr:predicate schema:provider ; |
87. | rr:objectMap [ |
88. | rr:column "NAME" ; |
89. | ] ; |
90. | ] ; |
91. | rr:predicateObjectMap [ |
92. | rr:predicate schema:url ; |
93. | rr:objectMap [ |
94. | rr:column "URL" ; |
95. | ] ; |
96. | ] ; |
97. | rr:predicateObjectMap [ |
98. | rr:predicate schema:email ; |
99. | rr:objectMap [ |
100. | rr:column "EMAIL" ; |
101. | ] ; |
102. | ] . |
Using R2RML, the mapping of the data model stored in the triple RDF database was carried out by selecting a class in the vocabulary that matched the characteristics of the design. To use the selected class, rr should be added as the R2RML IRI vocabulary namespace. The rules for mapping a table into a triple RDF are described in Table 2.
Table 2
Table Type | Column Specification | RDF Form | Column Value Role | R2RML Mapping Template |
Strong | | | | <#_parentMapName_> rr:logicalTable [ rr:sqlQuery "_SQL statement_" ]; |
| Primary key | Subject | As identified resource in URI | rr:subjectMap [ rr:template "BaseURI/{_columnName_}" ; rr:class _vocabulary class_ ; ] ; |
Weak | | | | <#_childMapName_> rr:logicalTable [ rr:sqlQuery "_SQL statement_" ]; |
| Identifier | Subject | As a blank node | rr:subjectMap [ rr:template "{_columnName_ }" ; rr:termType rr:BlankNode ; ] ; |
Strong or weak | Non primary key or non-identifier | Predicate | As a string literal of object | rr:predicateObjectMap [ rr:predicate _vocabulary properties_ ; rr:objectMap [ rr:column "_columnName_" ; ] ; ] ; |
| Foreign key | Predicate | As a resource whose type refers to the parent table | rr:predicateObjectMap [ rr:predicate _vocabulary properties_ ; rr:objectMap [ rr:parentTriplesMap <#_parentMapName_> ; rr:joinCondition [ rr:child "_columnName_" ; rr:parent "_parentColumnName_" ; ] ] ; ] . |
3.1. Data Extraction
The output generated by the data modeling layer became an artifact within the data extraction layer. Also, the relational databases and RDF metadata models translated into RDF mappings became key artifacts, to support goals at the extraction layer. This was because the data extraction layer had 2 main objectives, namely (1) extracting the OAS data into the records stored in a relational database, and (2) transforming the records in the relational database into a collection of triple RDF, using the predefined rules on the mapping process. To achieve the intended target, the sequential arrangement of activities is described in Fig. 6.
The extraction process began with the upload of the OAS document, which was the source of the information saved to the database. Each section and object in the document was also parsed and saved to appropriate tables. In this study, there were 2 types of databases with similar structure, namely temporary and production classes, which were used to manage the information obtained from the extraction process. Furthermore, the production and temporary databases stored the extracted data from all and present OAS documents, respectively. To improve the optimization of the production data reuse, the existence of the similar information should be initially evaluated before addition into the temporary system. When the data was found, the records in the production database was copied to the temporary system. However, a new record was added to the temporary database.
To add records to a database table, the consideration of the dependencies and relationships was very necessary. This indicated that the table with the fewest relationships and lowest dependencies obtained the first order within the process of adding records. These were subsequently found in the OAS structure, as an object in a section was reused in another. Therefore, this object should be able to be parsed and stored in the database, due to not causing dependency problems. Based on Fig. 6, the activity numbers 3-8 showed the order of information extraction from the OAS documents into the database. The relationship between the section and the table that stored the data extracted from the document is described in Table 3.
Table 3
Extraction Mapping on OAS Document and Database Schema
Activity | OAS Section | Data Storage Table |
convertInfoSection() | info | service, servicecontact, and servicelicense |
convertServersSection() | servers | Server |
convertTagsSection() | tags | tag |
convertExternalDocsSection() | externalDocs | externaldocument |
convertComponentsSection() | components | primitiveschema, objectschema, arrayschema, objectproperties, and arrayitem |
convertPathSection() | paths | operation, parameter, header, content, response, requestbody, operationrequestbody, primitiveschema, objectschema, arrayschema, objectproperties, and arrayitem |
Based on this study, all sections and data were successfully parsed and stored in the database, with the user being provided with the option to import the information within the temporary system. Moreover, data extraction was continuously conducted with the activity of generating triple RDF, using the RDF mapping generated within the modeling layer. As an illustration of the RDF mapping, an example of the data stored in Tables 4–6 was presented as a representation of the information within the service, servicecontact, and servicelicense tables.
Table 4
serviceid | title | description | termofservice | version |
5f5d97f0-b09f-11eb-8c83-4fa8fcf43f30 | Google Classroom API | Manages classes, rosters, and invitations in Google Classroom. | https://developers. google.com/ v1/terms/ | v1 |
Table 5
Record in servicecontact Table
name | url | email | serviceid |
Google | https://google.com | | 5f5d97f0-b09f-11eb-8c83-4fa8fcf43f30 |
Table 6
Record in servicelicense Table
name | url | serviceid |
Creative Commons Attribution 3.0 | http://creativecommons.org/licenses/by/3.0/ | 5f5d97f0-b09f-11eb-8c83-4fa8fcf43f30 |
Based on Mapping 1, the service table was a resource included in the schema:Service class, which was represented by a value in the serviceid column. However, the other columns in the service table were the predicates reinforcing the interpretation of the schema:Service class. In the RDF generation process, a triple RDF was obtained in the turtle syntax, which matched the record in the service table, as shown below,
RDF Triple 1. RDF Generation from Table service |
@prefix schema: <http://schema.org/>. @prefix: <http://sws.itbsmartcampus.id/ont#>. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. :5f5d97f0-b09f-11eb-8c83-4fa8fcf43f30 a schema:Service ; schema:brand "Google Classroom API" ; schema:description "Manages classes, rosters, and invitations in Google Classroom." ; schema:termOfService <https://developers.google.com/terms/> ; http://schema:version "v1" . |
In this study, servicecontact and servicelicense tables were weak illustrations with no PK. Therefore, the stored record was not identified as a resource, although described the service table as a blank node. This indicated that every column besides FK was a predicate connecting the RDF schema:Service class. RDF Triple 2 described the triple RDF as follows:
RDF Triple 2. RDF Generation from Table servicecontact and servicelicense |
:5f5d97f0-b09f-11eb-8c83-4fa8fcf43f30 a schema:Service ; schema:license [ schema:name "Creative Commons Attribution 3.0" ; schema:url "http://creativecommons.org/licenses/by/3.0/" ] ; schema:contactPoints [ schema:name "Google" ; schema:url "https://google.com" ] . |