ClassDiagGen Tool: Fine-Tuning the GPT-3 Model for Auto- mated Class Diagram Generation from Textual Descriptions

doi:10.21203/rs.3.rs-4350615/v1

Download PDF

Research Article

ClassDiagGen Tool: Fine-Tuning the GPT-3 Model for Auto- mated Class Diagram Generation from Textual Descriptions

https://doi.org/10.21203/rs.3.rs-4350615/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In the continually evolving realm of software engineering, the advent of Artificial Intelligence (AI) and its implications for automating traditionally laborious tasks has been of paramount interest. This study employs the GPT-3 model, a transformative AI architecture, in automating the extraction of class diagram elements from textual software requirements - a critical yet often complex task in object-oriented programming. GPT-3 was equipped to execute this task proficiently through model fine-tuning using tailored case studies. Our approach emphasized the few-shot learning technique, a proven effective method in enhancing the model's proficiency in specialized tasks. The developed tool, ClassDiagGen, was subjected to thorough testing and evaluation, showcasing exemplary performance with average precision and recall scores of 98.6% and 93.3%, respectively. Our findings underscore the profound potential of AI, particularly the GPT-3 model, in streamlining software development processes while highlighting the importance of customized model training. This study marks the beginning of an exciting journey, with the software engineering landscape poised for further transformative changes through AI integration.

Software Engineering

Class Diagram

GPT-3

Fine-tuning

Unified Modelling Language

Software development is an intricate process that necessitates transforming user requirements into functional systems through a multistage process involving analysis, design, programming, documentation, testing, and bug-fixing [1]. A cornerstone in the design phase is creating Unified Modelling Language (UML) diagrams, notably class diagrams, which are pivotal to system design and analysis [2], [3]. Class diagrams clarify the static structure of software systems by delineating the system's classes, attributes, methods, and relationships [4]. This provides a clear-headed visualization of the system structure and acts as an invaluable blueprint guiding the whole development process. Nevertheless, despite their essential role, creating class diagrams remains a manual task prone to human errors and misinterpretations [5], constituting a potential bottleneck in the development process.

The emergence of artificial intelligence (AI), and more specifically, machine learning and deep learning models, has opened a new horizon of automation possibilities in software engineering [6]. Among the various models, the Generative Pretrained Transformer 3 (GPT-3), developed by OpenAI, stands out for its exceptional capacity to understand and generate text comparable to human quality [7]. This language prediction model, powered by deep learning, can be used in various natural language processing tasks [8], pointing to a promising direction for automating software engineering tasks.

This research presents an approach underpinned by ClassDiagGen, a tool embodying an innovative application of the GPT-3 model designed to generate class diagrams automatically from textual descriptions. Harnessing GPT-3's superior language comprehension and generation capacities, this approach is explicitly fine-tuned for extracting class diagram elements from natural language inputs. Moreover, it employs the PlantUML library for diagram rendering. The primary goal of the presented approach is to reduce the manual labor required in class diagram creation and mitigate the risk of human errors, thereby enhancing the software development lifecycle.

The primary objective of this paper is to present and evaluate the approach and the ClassDiagGen tool. We will provide a comprehensive outline of the fine-tuning process applied to GPT-3, delve into the design and functioning of the ClassDiagGen tool, and present the results of a series of experiments to assess the tool's proficiency in generating class diagrams from textual descriptions. The insights gleaned from this research endeavor are expected to shed light on the vast potential of fine-tuned language models in automating and optimizing tasks within software development.

In the following sections, we will present the related work, discuss our methodology and the specifics of the ClassDiagGen tool, present our evaluation of the tool's performance, discuss the implications of our results, and ultimately conclude the study.

The journey to extract significant concepts from requirement specifications for advancing conceptual modeling is a vibrant narrative that began in the early 1980s. A pivotal moment in this chronicle was when Peter Chen introduced 11 heuristic rules in 1983 [9], effectively igniting the flame for future studies. His contribution ignited a constellation of research, each adding its brilliance to the growing field of conceptual modeling.

In their infancy, these investigations primarily focused on dissecting natural language requirements, a process that was largely manual and labor-intensive [10]. As the story unfolded, a significant plot twist occurred, with a more recent surge in research pivoting towards automating object-oriented modeling extraction directly from natural language requirements [11]. This transition aimed to reduce the heavy reliance on manual tasks and usher in a new era of efficiency in modeling through automated extraction and analysis techniques.

Among the notable contributions to this domain, LOLITA stands out as an NLP-based system introduced by Mich [12] that skillfully extracts objects from nouns and outlines their connections. However, this system cannot distinguish between classes, objectives, and attributes. A beacon of light in the field was LIDA, introduced by Overmyer and Rambow [13], designed to aid designers in producing class diagrams by following Chen's rules and associating nouns with classes, verbs with relationships, and adjectives with attributes. Yet, LIDA, while promising, is not fully automated and requires substantial user input, constraining its usability.

The CM-builder tool, introduced by Harmain and Gaizauskas [14], holds a significant position in tools employing NLP techniques. Despite offering a dual approach of automatic and interactive modes, it hits the iceberg of linguistic analysis, struggling with the inherent ambiguity and redundancy in natural language. Similarly, UMLG by Bajwa et al. [15], and UMGAR by Deeptimahanti and Sanyal [16], although contributing valuable automated and semi-automated techniques, respectively, grappled with their own unique set of constraints, ranging from handling linguistic analysis to requiring user involvement.

Subsequently, the DC-Builder tool [17] offered a more automated approach, incorporating heuristic rules to extract classes from requirements. Unfortunately, its ability to identify operations and relationships was lacking. RAPID [18] attempted to overcome these limitations using NLP and domain ontology techniques. However, the tool imposes a restriction wherein each sentence must adhere to a specific structure prescribed by the tool. If a sentence fails to meet the required structure, the user is prompted to modify the sentence structure accordingly.

The emergence of a method proposed by Sharma et al. [19] was significant, offering a fresh approach to generating class diagrams from requirements. This technique relies on an analysis of the dependencies to convert these requirements into class diagrams automatically. The procedure initially turns requirement statements into an intermediate, frame-based structured representation. It then leverages the information in this representation to produce class diagrams through a rule-based process. This approach marks substantial progress in software engineering, but certain limitations exist. The method's success largely depends on the quality and clarity of the inputted natural language requirements; any ambiguity or intricacy in the requirements could compromise the accurate generation of class diagrams. Additionally, while the technique proved superior to manual creation in the researchers' tests, how it will handle more extensive, more intricate systems or different subject areas is yet to be determined.

Abdelnabi et al. [20] proposed another method for generating class diagrams from natural language requirements, comprising three distinct phases: NLP, application of mapping rules, and class diagram generation. In the NLP phase, sentences are parsed and converted into a formal representation. During the mapping rules phase, a set of heuristic rules is used to extract class elements. Finally, the class diagram generation phase leverages the elements extracted in the previous phase to generate a class diagram. Despite its utility, the method has notable limitations. For example, the method has difficulty with requirements statements that are ambiguous or incomplete. Additionally, the necessity of heuristic rules to extract class elements from requirements introduces another limitation: these rules might not apply to all natural language requirements.

Most recently, Bashir et al. [21] presented a system, READ, developed in Python, that leverages NLP and domain ontology to generate class diagrams from informal natural language requirements. By parsing these requirements into a semantic representation, READ constructs a domain ontology, serving as a bridge to translate this representation into class diagrams. However, the READ system has several limitations. First, the system is only as good as the domain ontology that it is trained on. If the domain ontology is not complete or accurate, the system will produce incorrect class diagrams. Second, the system can only generate class diagrams for a few domains. Third, the system is not able to handle complex requirements. For example, the system cannot handle requirements involving multiple objects or relationships between objects.

In a nutshell, the progression of automated class diagram generation from natural language requirements marks a trajectory of profound research and substantial development, underlined by a chronology of continuous advancement. This journey, beginning with the rudimentary heuristic rules introduced by Peter Chen, has evolved through time, culminating in sophisticated methodologies that employ advanced NLP techniques, as evidenced by the most recent Python-based system, READ. Notwithstanding these notable strides, the path ahead is not without challenges—navigating complex requirements, distinguishing between classes, objects, and attributes, and the dependence on the quality of the inputted natural language requirements remain pertinent issues.

The advent of advanced Artificial intelligence (AI) techniques, particularly GPT-3, shines a promising light on these challenges. Endowed with a robust contextual understanding and remarkable text generation capabilities, GPT-3 presents the potential to enhance the extraction process of object-oriented modeling components, thereby augmenting the accuracy and quality of class diagrams. Its prowess in parsing complex language structures could help alleviate current issues in discerning different elements in requirements and equip the system to handle more complex requirements.

The path forward must be focused on exploiting the capabilities of GPT-3 to surmount the existing obstacles. This continuing endeavor promises not just an evolution, but a revolution in this domain, heralding the emergence of efficient and precise tools for generating class diagrams from text requirements. The future of software engineering could be reshaped by these advancements, underscoring the significance of this research.

The GPT-3 model, a pinnacle in artificial intelligence focused on natural language processing, houses a remarkable network of 175 billion adjustable parameters [22], [23]. This significant enhancement, resulting in a ten times larger model than its precursors, stems from a rigorous pre-training regimen. This intricate procedure necessitated priming GPT-3 with a vast dataset curated for content generation. The dataset amalgamates approximately 570 GB of textual content, meticulously compiled from various internet sources, including but not limited to WebText2 and Wikipedia [24]. However, due to the tremendous scale of GPT-3, it requires significant computational resources. Consequently, it is offered exclusively as a service, allowing access to state-of-the-art language processing without local computational powerhouses [23].

GPT-3 enhances and refines the design principles and operational methodologies of its highly successful predecessor, GPT-2. Its key features encompass reversible tokenization, a technique that optimizes memory usage and promotes efficient model scaling, and pre-normalization, contributing significantly towards stability during model training [25]. Moreover, it introduces an improved initialization procedure, ensuring faster and more reliable convergence during training. In addition, GPT-3 incorporates patterns of dense and locally banded scant attention within its transformer layers, akin to the innovative Sparse Transformer [26].

GPT-3 functions on the foundation of an attention mechanism, a fundamental aspect of transformer models. This key feature enables GPT-3 to predict the next word in a sentence sequence adeptly. It consists of two main components: an encoder and a decoder [23]. The encoder accepts the preceding word in a sentence and transfigures it into a vector format, thereby capturing its contextual essence. This vector is then fed into the attention mechanism to predict the next word. Using the previous word and its vector representation as inputs, the decoder generates a probability distribution over the entire vocabulary, estimating the likelihood of each word being the successor. GPT-3 is designed in a continuum of eight different sizes, ranging from a model with a modest 125 million parameters to a mammoth variant boasting an extraordinary 175 billion parameters. This largest model operates with a learning rate of 0.6 x 10^-4, significantly lower than its smaller siblings [27]. This illustrates a common trend in large models, where bigger batch sizes during training necessitate a reduced learning rate.

GPT-3 encapsulates four different engines: Davinci, Curie, Babbage, and Ada, each tailored to meet diverse computational demands and budget constraints. The Davinci engine leads the quartet in complexity and sophistication, excelling in tasks demanding nuanced understanding. Conversely, the remaining engines offer powerful yet economical alternatives. The Curie engine manages complex tasks requiring advanced knowledge; Babbage stands out in performing specific operations, while Ada is ideally suited for simpler, more straightforward tasks [23]. The Davinci engine is the most fitting choice for our tool, ClassDiagGen, which diligently analyse complex text to identify classes, attributes, methods, and relationships.

3.1 Fine Tuning GPT-3 Model

In the dynamic domain of NLP, the AI language model GPT-3 stands out with its impressive capabilities. However, fine-tuning becomes essential to unlock its potential in specialized tasks [28]. One highly effective approach to this fine-tuning is the application of few-shot learning [29].

Few-shot learning is one of three fundamental learning paradigms identified in the field of NLP, alongside Zero-shot and One-shot learning. It involves providing the model with specific instances of input and the desired output, allowing it to generate similar outputs based on these few examples. This approach is crucial in enhancing the model's proficiency in specialized tasks, such as extracting class diagram elements [30].

Despite the comprehensive initial training GPT-3 has received on a large and diverse text dataset, there are instances where the production of specialized text becomes essential [28]. Tasks such as extracting class diagram elements may necessitate text outputs beyond the pre-trained model's general proficiency. In such situations, the additional fine-tuning of the GPT-3 model is crucial. This pivotal step augments the model's accuracy and efficiency in performing specialized tasks [30].

One significant aspect of this fine-tuning involves the incorporation of a limited dataset into the model, which guides its predictive abilities. This process demonstrates GPT-3's adaptability and versatility in managing specific tasks. In real-world applications, the effective execution of few-shot learning can considerably enhance the model's performance and reliability [29]. This improvement subsequently results in more precise and contextually relevant outputs. In developing our tool, ClassDiagGen, we implemented few-shot learning by utilizing 50 case studies to train the model.

3.2 Dataset

We collected case studies from many Software Engineering projects, forming a diverse ensemble of 50 unique instances. Each case study has been purposefully chosen to embody a variety of scenarios in the field of software engineering. These case studies play the role of guiding lights for the GPT-3 model, steering it to generate the desired output.

The primary function of the training data is to provide a clear navigational path for the GPT-3 model, outlining the exact output we intend to achieve. This data is structured in a JSONL document format, where each line comprises a pair of a prompt and its associated completion. These pairs act as discrete learning opportunities for the model. A standard JSONL document is constructed as follows:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Once the training data was prepared, we moved to fine-tuning the model. The following OpenAI command-line interface (CLI) was the trigger for this phase:

openai api fine_tunes.create -t < TRAIN_Class_Diagram_Data> -m < Davinci 003>

The ClassDiagGen Tool is an application specifically designed to facilitate the generation of UML class diagrams from textual descriptions. This tool harnesses the power of the finely tuned GPT-3 model, seamlessly integrating its capabilities into a user-friendly graphical interface constructed using the PyQt5 library.

The primary interface of ClassDiagGen is sleek and intuitive. When users launch the application, they are greeted by a neatly structured window organized into distinct sections for improved usability. Figures (1) is a snapshot of the tool, showcasing its clean layout and the thoughtful placement of each interactive element.

Located on the left side of the interface is a clearly labeled input section titled 'Enter the Requirement Specification.' Users can type in or paste their specifications directly into a generously sized text box. Alternatively, an 'Upload File' button allows users to upload a text file containing their specifications.

Once the specifications have been set, users can instruct the tool to perform three essential tasks, each dedicated to its respective button. The 'Class Elements' button prompts the tool to analyze the input text and extract potential classes, attributes, methods, and relationships. Upon activation, the finely tuned GPT-3 model springs into action, displaying the results in a text area labeled 'Class Diagram Elements' in the middle of the interface.

The 'Class Code' button allows users to generate a PlantUML class diagram code based on the identified classes. Activating this button prompts the tool to convert the extracted classes into PlantUML code. The generated code is subsequently displayed in another text area on the right side of the interface.

The ' Draw Diagram ' button is another noteworthy feature of the ClassDiagGen Tool. When activated, the tool communicates with a PlantUML server to create a graphical UML class diagram based on the previously generated code. The diagram is then displayed in a new window, offering a 'Save' button for users to store the image at their preferred location.

The ClassDiagGen Tool is an efficient and user-friendly application that improves the generation of UML class diagrams from textual requirements. It seamlessly blends the capabilities of the finely tuned GPT-3 model with the interactive elements of PyQt5, providing a more accessible and effortless process for its users.

The confusion matrix serves as our evaluation tool, gauging both the precision and efficiency of ClassDiagGen. Three case studies from the realm of software engineering have been selected to test our tool's performance.

5.1 Testing and Results

We performed tests on five case studies to assess the tool's classification ability. Each of these studies was meticulously analyzed to extract the class diagram elements, which were subsequently compared with the outcomes produced by the ClassDiagGen tool. The following subsections provide an overview of the tests and their respective products for each case study.

5.1.1 Case Study 1: Movie Rental Store

In this case study, the tool successfully extracted all the classes and their elements. The tool's output matches the manual extraction. Figures (2) illustrates the case study, while Table (1) presents the results. Additionally, Figure (3) provides a visual representation of the tool's output.

Table 1

Comparison of Tool-Extracted and Manually Extracted (Movie Rental Store)
Tool Extracted		Manually Extracted
Class	Elements	Class	Elements
Customer	Attributes: membershipNumber, name, address, dateOfBirth	Customer	Attributes: membershipNumber, name, address, dateOfBirth
	Methods: rentMovie(), returnMovie(), renewMovie()		Methods: rentMovie(), returnMovie(), renewMovie()
	Relations: (one-to-many with Rental)		Relations: (one-to-many with Rental)
Genre	Attributes: genreCode, genreName	Genre	Attributes: genreCode, genreName
	Methods: none		Methods: none
	Relations: (one-to-many with Movie)		Relations: (one-to-many with Movie)
Movie	Attributes: movieCode, title, director, releaseYear	Movie	Attributes: movieCode, title, director, releaseYear
	Methods: none		Methods: none
	Relations: (Many to one with Genre)		Relations: (Many to one with Genre)
Rental	Attributes: rentalId, rentalDate, returnDate	Rental	Attributes: rentalId, rentalDate, returnDate
	Methods: verifyCustomerNumber(), registerMovieCode(), confirmRental(), processRental()		Methods: verifyCustomerNumber(), registerMovieCode(), confirmRental(), processRental()
	Relations: (Many-to-one with Customer), (one-to-one with Movie)		Relations: (Many-to-one with Customer), (one-to-one with Movie)

5.1.2 Case Study 2: Car Rental Service

In this case study, the tool successfully extracted all the classes and their elements, except for one attribute. Figures (4) illustrates the case study, while Table (2) presents the results. Additionally, Figure (5) provides a visual representation of the tool's output.

Table 2

Comparison of Tool-Extracted and Manually Extracted (Car Rental Service)
Tool Extracted		Manually Extracted
Class	Elements	Class	Elements
CarRentalService	Attributes:MaxCars	CarRentalService	Attributes: None explicitly mentioned
	Methods: validateCustomerNumber(), RegisterVIN ()		Methods: ValidateCustomerNumber(), RegisterVIN ()
	Relations: (one-to-many with customer), (one-to-many with car)		Relations: (one-to-many with customer), (one-to-many with car)
Customer	Attributes: customerNumber	Customer	Attributes: customerNumber, rentedCars (0–2)
	Methods: RentCar(), ReturnCar(), ExtendRental()		Methods: RentCar(), ReturnCar(), ExtendRental()
	Relations: (one-to-many with car)		Relations: Customer can rent multiple Cars (but up to 2 at any given time)
Car	Attributes: VIN (unique), make, model, year	Car	Attributes: VIN (unique), make, model, year
	Methods: Rent(), Return(), ExtendRental()		Methods: Rent(), Return(), ExtendRental()
	Relations: (Many-to-one with Customer), (Many-to-one with CarRentalService)		Relations: (Many-to-one with Customer), (Many-to-one with CarRentalService)

5.1.3 Case Study 3: Library System

The tool extracted the classes and their elements in this case study, except for two attributes, methods, and relations. Figures (6) illustrates the case study, while Table (3) presents the results. Furthermore, Figure (7) visually represents the tool's output.

Table 3

Comparison of Tool-Extracted and Manually Extracted (Library System)
Tool Extracted		Manually Extracted
Class	Elements	Class	Elements
Library	Attributes: None	Library	Attributes: None
	Methods: IssueItem(), SearchItem()		Methods: IssueItem(), ValidateMember(), SearchItem()
	Relations: (one-to-many with member), (one-to-many with section), (one-to-many with LoanItem)		Relations: (one-to-many with member), (one-to-many with section), (one-to-many with LoanItem)
Member	Attributes: memberNumber, name, address, dateOfBirth	Member	Attributes: memberNumber, name, address, dateOfBirth, borrowedItems
	Methods: BorrowItem(), ReserveItem(), RenewItem()		Methods: BorrowItem(), ReserveItem(), RenewItem()
	Relations: (many-to-one with Library)		Relations: (many-to-one with Library), (one-to-many with LoanItems)
Section	Attributes: classificationMark	Section	Attributes: classificationMark, subject
	Methods: None		Methods: None
	Relations: (many-to-one with library)		Relations: (many-to-one with library), (one-to-many with LoanItem)
LoanItem	Attributes: barCode (unique)	LoanItem	Attributes: barCode ()
	Methods: none		Methods: none
	relations: (many-to-one with section)		relations: (many-to-one with section)
LanguageTape	Attributes: title, language, level	LanguageTape	Attributes: title, language, level
	Methods: None		Methods: None
	Relations: Inherits from LoanItem		Relations: Inherits from LoanItem
Book	Attributes: title, authors	Book	Attributes: title, authors
	Methods: None		Methods: None
	Relations: Inherits from LoanItem		Relations: Inherits from LoanItem

5.2 Confusion Matrix

To evaluate the efficacy of ClassDiagGen, we conducted a comprehensive analysis of the error rate (considering the identification of misclassified results) using a confusion matrix. This two-dimensional matrix displays the count of accurate and inaccurate predictions made by the ClassDiagGen tool [31]. The predictions given by ClassDiagGen were contrasted against the actual classifications created by a human analyst. The true positive (TP), false positive (FP), and false negative (FN) situations utilized for the evaluation of extracted classes, characteristics, methods, and connections are described as follows:

The elements that ClassDiagGen accurately predicts.

The elements that ClassDiagGen predicts but are not present in actual classifications the analyst produces.

The elements that ClassDiagGen does not predict but are present in actual classifications the analyst produces.

We evaluate the efficacy of our tool using Recall, Precision, and F-measure.

Recall (R) measures the proportion of actual positive cases that were correctly identified. It quantifies how many of the total positive classes have been correctly predicted. It's calculated with the following formula:

\(Recall=\frac{True Positives}{False Negatives}\) Eq. (1)

Precision (P) is a measure that encapsulates the accuracy of positive predictions. It's computed by dividing the number of true positive outcomes predicted by the total number of positive predictions:

\(Precision= \frac{True Positives }{True Positives + False Positives}\) Eq. (2)

The F-measure, also known as the F1-score, is the harmonic mean of Precision and Recall. This metric considers both false positives and false negatives and gives equal weight to both Recall and Precision:

\(F1-score = 2\left(\frac{Precision * Recall}{Precision + Recall}\right)\) Eq. (3)

Table 4

Number of TP, FP, and FN for Classes, Attributes, Methods, and Relations in each Case Study
Case Study	Classes			Attributes			Methods			Relations
Case Study	TP	FP	FN	TP	FP	FN	TP	FP	FN	TP	FP	FN
Movie Rental Store	4	0	0	13	0	0	7	0	0	5	0	0
Car Rental Service	3	0	0	5	1	1	8	0	0	5	0	0
Library System	6	0	0	11	0	2	5	0	2	8	0	2

Table 5

Precision, Recall, and F-measure for Classes, Attributes, Methods, and Relations in each Case Study
Case Study	Classes			Attributes			Methods			Relations
Case Study	Precision %	Recall %	F-measure %	Precision %	Recall %	F-measure %	Precision %	Recall %	F-measure %	Precision %	Recall %	F-measure %
Movie Rental Store	100	100	100	100	100	100	100	100	100	100	100	100
Car Rental Service	100	100	100	83.3	83.3	93.6	100	100	100	100	100	100
Library System	100	100	100	100	84.6	91.7	100	71.4	83.3	100	80	88.9
Average	100	100	100	94.4	89.3	95.1	100	90.5	94.4	100	93.3	96.3
Total Precision Average	98.6
Total Recall Average	93.3
Total F-measure Average	96.5

As demonstrated in Table 4, our tool, ClassDiagGen, successfully obtained legitimate classes, attributes, methods, and relations, with only a few false positives and negatives in all examined case studies. The precision, recall, and F-measure results presented in Table 5 illustrate the tool's exemplary performance in accuracy, boasting an overall average precision, recall, and F-measure of 98.6%, 93.3%, and 96.5%, respectively. This data affirms the effectiveness of ClassDiagGen in extracting class diagram elements from the textual description.

To provide a more comprehensive evaluation of our tool's effectiveness, we pitted its performance against three prior studies that aimed to extract class diagram elements from text-based descriptions. The comparative results are tabulated in Table (6), detailing each study's precision and recall values. In the two metrics, our tool surpassed the performance of the other studies. This outcome underscores the exceptional capability of our tool in extracting elements of class diagrams from textual descriptions.

Table 6

Comparative Performance of Class Diagram Extraction Tools.
Related Studies	Precision %	Recall %
CM-Builder [14]	66%	73%
LIDA [13]	63.17%	71.32%
READ [21]	69.5%	93%
ClassDiagGen	98.6%	93.3%

The experimental evaluation of ClassDiagGen demonstrates its ability to extract UML class diagram elements from text-based requirement specifications with high accuracy. This section discusses the implications, limitations, and future directions for this research.

6.1 Implications

The significant implications of our research revolve around the improved precision and recall of class diagram element extraction from text-based requirement specifications. Our tool consistently produced high-precision results, averaging 98.6% across all case studies. This suggests that the ClassDiagGen tool can identify correct class elements while minimizing false positives. In addition, the average recall of 93.3% implies the system's ability to correctly identify a high percentage of class elements from the input data. Combined with the high F-measure of 96.5%, these results indicate that the tool successfully balances both precision and recall, which is essential in maintaining accuracy in practical applications.

The primary implication of these results is the potential for ClassDiagGen to streamline the software development process significantly. The tool reduces the manual workload by automating the extraction of class diagram elements from requirement specifications, leading to greater productivity and efficiency.

6.2 Limitations

While ClassDiagGen demonstrated strong performance in our experiments, some limitations must be acknowledged. The tool showed some difficulties with missing attributes, methods, and relations in the Library System case study, affecting its recall score. It indicates that the tool might struggle with more complex input specifications or those that contain implicit information. Furthermore, our research currently focuses on English language input. Hence, there may be challenges in generalizing the tool to handle requirement specifications in other languages.

6.3 Future Directions

In light of these results and limitations, future work has several potential directions. Improving the ability of ClassDiagGen to identify and handle implicit information in requirement specifications could be a beneficial direction for enhancing its recall capabilities. Also, fine-tuning an NLP model more recent than GPT-3 might improve the tool's performance. For instance, GPT-4 could further boost the precision and recall rates. Finally, considering the global nature of software development, an important future direction could be to make ClassDiagGen multilingual. This would entail adapting the tool to process requirement specifications in languages other than English, significantly expanding its potential user base.

In this research, we fined-tune the cutting-edge GPT-3 model, a revolutionary AI paradigm, to automate the extraction of class diagram elements from textual software requirements in our proposed tool (ClassDiagGen). Through a strategy of model fine-tuning, we used different case studies as training mechanisms. This approach equipped the model with the necessary prowess to execute the target task proficiently. Our findings underscored the transformative effect of fine-tuning the GPT-3 model using few-shot learning. Our tool, ClassDiagGen, demonstrated exemplary performance under testing and evaluations, achieving remarkable precision and recall scores that averaged 98.6% and 93.3%, respectively. The deployment of GPT-3 in automating intricate software engineering tasks, especially class diagram extraction from textual descriptions, marks a significant leap forward in the discipline. The practical implications of this development are manifold. It holds immense potential to refine the design and development workflow, yielding substantial time and resource savings while boosting precision and recall rates.

To sum up, our exploration signifies the immense potential of AI models such as GPT-3 in catalyzing significant enhancements in software development methodologies. Although ClassDiagGen constitutes a substantial advancement in this landscape, it marks merely the onset of a revolutionary journey. The domain stands poised at the threshold of limitless possibilities, beckoning further exploration to leverage AI's potential to transform the realm of software engineering practices fundamentally.

ClassDiagGen is designed to extract class diagrams from textual requirements automatically. It has been developed for use on Windows 64-bit systems and is provided as a compressed file. To facilitate ease of access and encourage empirical evaluation, we have made ClassDiagGen available online. You can access the tool via the following link:

https://drive.google.com/file/d/1k8jLRVlHLHZrGvDei2SxbGe_tFO6MYis/view?usp=sharing

A password is required to access the tool. Please contact the author directly at [email protected] to obtain the password. We encourage users and researchers to utilize ClassDiagGen in their software development and research projects. Feedback is highly appreciated to help us improve its functionality and user experience.

Savchuk TO, Pryimak NV (2017) Modeling of software development process with the markov processes. Eastern-European J Enterp Technol
Jebril EM, Imam AT, Al-Fayuomi M (2018) An algorithmic approach to extract actions and actors (AAEAA), in Proceedings of the International Conference on Geoinformatics and Data Analysis, pp. 13–17
Pérez B, Porres I (2019) Reasoning about UML/OCL class diagrams using constraint logic programming and formula. Inf Syst 81:152–177
Fauzan R, Siahaan D, Rochimah S, Triandini E (2021) Automated Class Diagram Assessment using Semantic and Structural Similarities. Int J Intell Eng Syst
Thevathayan C, Hamilton M (2017) Imparting software engineering design skills, in Proceedings of the Nineteenth Australasian Computing Education Conference, pp. 95–102
Alshammari FH (2022) Trends in Intelligent and AI-Based Software Engineering Processes: A Deep Learning-Based Software Process Model Recommendation Method, Comput. Intell. Neurosci., vol. 2022
Lund BD, Wang T, Mannuru NR, Nie B, Shimray S, Wang Z (2023) ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. J Assoc Inf Sci Technol
Nath S, Marie A, Ellershaw S, Korot E, Keane PA (2022) New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology. Br J Ophthalmol 106(7):889–892
Chen PP-S (1983) English sentence structure and entity-relationship diagrams. Inf Sci (Ny) 29:2–3
Karaa WBA, Ben Azzouz Z, Singh A, Dey N, Ashour AS, Ben Ghazala H (2016) Automatic builder of class diagram (ABCD): an application of UML generation from functional requirements. Softw Pract Exp 46(11):1443–1458
Al-Hroob A, Imam AT, Al-Heisa R (2018) The use of artificial neural networks for extracting actions and actors from requirements document. Inf Softw Technol 101:1–15
Mich L (1996) NL-OOPS: from natural language to object oriented requirements using the natural language processing system LOLITA. Nat Lang Eng 2(2):161–187
Overmyer SP, Benoit L, Owen R (2001) Conceptual modeling through linguistic analysis using LIDA, in Proceedings of the 23rd International Conference on Software Engineering. ICSE IEEE, 2001, pp. 401–410
Harmain HM, Gaizauskas R (2003) Cm-builder: A natural language-based case tool for object-oriented analysis. Autom Softw Eng 10:157–181
Bajwa IS, Samad A, Mumtaz S (2009) Object oriented software modeling using NLP based knowledge extraction. Eur J Sci Res 35(01):22–33
Deeptimahanti DK, Sanyal R (2011) Semi-automatic generation of UML models from natural language requirements, in Proceedings of the 4th India Software Engineering Conference, pp. 165–174
Herchi H, Ben Abdessalem W (2012) From user requirements to UML class diagram, arXiv Prepr. arXiv1211.0713
More P, Phalnikar R (2012) Generating UML diagrams from natural language specifications. Int J Appl Inf Syst 1(8):19–23
Sharma R, Srivastava PK, Biswas KK (2015) From natural language requirements to UML class diagrams, in 2015 IEEE Second International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), IEEE, pp. 1–8
Abdelnabi EA, Maatuk AM, Abdelaziz TM, Elakeili SM (2020) Generating UML class diagram using NLP techniques and heuristic rules, in 20th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), IEEE, 2020, pp. 277–282
Bashir N, Bilal M, Liaqat M, Marjani M, Malik N, Ali M (2021) Modeling class diagram using nlp in object-oriented designing, in National Computing Colleges Conference (NCCC), IEEE, 2021, pp. 1–6
Imam AT, Altawaiha I (2023) The Use of the Pre-Trained BERT and GPT-3 Models to Automate the Composing of Use Case Descriptions. Authorea Prepr
Bajaj D, Goel A, Gupta SC, Batra H (2022) MUCE: a multilingual use case model extractor using GPT-3. Int J Inf Technol 14(3):1543–1554
Ge J, Lai JC (2023) Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun, 7, 4
Gasparetto A, Marcuzzo M, Zangari A, Albarelli A (2022) A survey on text classification algorithms: From text to predictions. Information 13(2):83
Child R, Gray S, Radford A, Sutskever I (1904) Generating long sequences with sparse transformers, arXiv Prepr. arXiv10509, 2019
Kaplan J et al (2001) Scaling laws for neural language models, arXiv Prepr. arXiv08361, 2020
Gao T, Fisch A, Chen D (2012) Making pre-trained language models better few-shot learners, arXiv Prepr. arXiv15723, 2020
Mahabadi RK, PERFECT (2022) : Prompt-free and efficient few-shot learning with language models, arXiv Prepr. arXiv2204.01172
Brown T et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

ClassDiagGen Tool: Fine-Tuning the GPT-3 Model for Auto- mated Class Diagram Generation from Textual Descriptions

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Generative Pre-Trained Transformer-3 (GPT-3) Model

3.1 Fine Tuning GPT-3 Model

3.2 Dataset

4. ClassDiagGen Tool

5. Experimental Evaluation

5.1 Testing and Results

5.1.1 Case Study 1: Movie Rental Store

5.1.2 Case Study 2: Car Rental Service

5.1.3 Case Study 3: Library System

5.2 Confusion Matrix

6. Discussion

6.1 Implications

6.2 Limitations

6.3 Future Directions

7. Conclusion

8. Tool Availability

References

Additional Declarations

Status:

Version 1