Semantic Classication for Diagnostic Decision Support

Background: Clinical history and exam ndings obtained during the initial evaluation of a patient presenting with neuromuscular complaints are often pivotal in the formulation of diagnostic and therapeutic approach. However, disorders that have diverse overlapping manifestations may pose a diagnostic challenge for clinicians. Although case reports can be valuable sources for guiding clinical decisions, retrieval of applicable information from them is not straightforward. Methods: In this paper, we propose a feature-extraction-based method to improve the ecacy and effectiveness of this process. The accuracy of the method in linking clinical presentations to correct diagnoses is examined using 30 case reports of amyotrophic lateral sclerosis, monomelic amyotrophy, and inclusion body myositis, obtained from PubMed Central Open Access Subset. Results: The results show that feature extraction not only augments semantic classication with explainability but also improves the performance. Conclusions: If developed further, the approach can be used to provide clinicians with decision support in challenging clinical situations.


Introduction
Medicine encompasses rare disorders with diverse and overlapping clinical manifestations that necessitate use of differential diagnostic approach. An example is amyotrophic lateral sclerosis (ALS), a heterogeneous disease characterized by a progressive degeneration of upper and lower motor neurons that can lead to a variety of symptoms [1]. There are a number of disorders with different treatment options and prognostic outcomes that can mimic amyotrophic lateral sclerosis [2,3]. Yet, in the absence of a speci c biomarker for the disease, ALS remains a clinical diagnosis [4]. Although diagnostic errors may have disastrous consequences, over 40% of patients with amyotrophic lateral sclerosis are initially diagnosed incorrectly [5].
Case reports can provide clinicians with a rich representation of the spectrum a disorder can manifest throughout its clinical course, and thereby serve as a valuable source of guidance for clinicians in face of challenging cases [6]. Clinicians may use case reports during the formulation of differential diagnosis to identify and differentiate potential causes that can account for the patient's symptoms. However, retrieval of pertinent information from narrative texts that comprise case reports through manual inspection of individual reports can be a prohibitively time-consuming task for busy clinicians.
Advancements in natural language processing may greatly increase the clinical utility of case reports by facilitating information extraction from text. One of the recent breakthroughs in natural language processing was the introduction of transformer-based natural language models in 2017 [7]. Among the many language processing tasks with potential clinical utility that transformer-based models have excelled at is that of semantic classi cation [8]. The sentence embeddings in the form of xed-length vectors innately facilitate the text comparison process, and similarities between texts can be trivially obtained by applying distance metrics directly to the embeddings [9].
However, mere vector similarity between the text embeddings has limited applicability in the clinical context. Clinical reasoning involves attending to key differentiating elements to draw possible explanations for a given clinical presentation in accordance with contemporary clinical criteria [10]. The text similarity methods that do not distinguish features of importance from the rest and do not offer any explanations are di cult to be used in clinical context.
Although language models can be ne-tuned for individual target tasks to overcome such limitations, it is not possible to develop models for every sub eld of medicine. Medicine is a rapidly evolving eld comprised of distinct sub-elds each with vastly different terminologies and focuses. At present the performance of the transformer models can be measured only empirically, and studies show that ne tuning of general language models such as BERT does not necessarily improve their performance on target tasks [11].
In this paper, it is demonstrated that feature extraction using vector difference between Universal Sentence Encoder (USE) embeddings can not only augment semantic classi cation with interpretability but also improve the classi cation accuracy.

Data
The data for the present study come from PubMed Central Open Access Subset, which is a collection of articles that are available under a Creative Commons or similar licenses [12]. The rst step in obtaining the data was to search the subset for relevant case reports. Keywords and lters described in Table 1 were used to collect scienti c articles containing clinical descriptions of amyotrophic lateral sclerosis (ALS), monomelic amyotrophy (MMA), and inclusion body myositis (IBM). The articles were manually inspected to identify case reports that contain adequate details about the clinical course. Once the case reports had been obtained, the clinical history and exam ndings were extracted from the cases. A total of 30 cases were used for analysis described in the following section.

Analysis
For analysis, USE [13] trained with transformer model was used to obtain embeddings of the text data [14]. To examine whether informative features can be extracted from the embeddings using vector subtraction, cosine similarities between the following pairs were measured: 1) Difference between a pair of sentence embeddings that contain a discriminatory feature of interest.
2) Average of embeddings for each clinical disorder.
The results are displayed in Table 2.

Evaluation
To examine whether feature extraction can be used to improve accuracy of semantic classi cation, the USE embeddings of clinical cases were projected to a 4-dimensional feature vector subspace using the sentence pairs shown in Table 2. The performance of a nearest mean model using the feature vectors for classifying case reports by diagnosis was compared to that using that using the raw USE embeddings with k-fold cross validation (k = 5), and the results are shown in TABLE 3. Table 2 shows interesting results. For instance, although the description of weakness was highly variable and irregular across the case reports, USE was able to capture the difference in distribution of the weakness between the disorders within the embeddings. The results verify that differences between sets of documents can be extracted using vector subtraction.

Results
The results shown in Table 3 show that feature extraction can be used on the embeddings to improve accuracy of semantic classi cation. The improvement may be attributed to the removal of redundant dimensions in the original embeddings that encode nondiscriminatory features.

Discussion
In this paper, we demonstrated that features of interest can be extracted from USE embeddings and used to improve the performance of semantic classi cation. The idea behind the proposed method is to extend and take advantage of the generalizability of transformer-based models that is pretrained on massive text corpora. The proposed method is simple and exible. It can easily be applied to a range of clinical tasks other than formulation of differential diagnosis. As the feature vectors encode each clinical element with simple numerical values that are easily interpretable, the method may also be applied to clinical text summarization tasks. The method may in the future potentially be applied to electronic health records by healthcare organizations to identify and track hidden trends and biases that are di cult to analyze using conventional data analysis methods.

Conclusion
Features of interest can be extracted from sentence embeddings with simple vector arithmetic operations to provide clinicians with decision support in challenging clinical situations.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
The Python script and the data used for this study will be made available by the author (https://github.com/pushkin05/bert).

Competing interests
The author has no competing interests as de ned by BMC, or other interests that might be perceived to in uence the results and/or discussion reported in this paper.

Funding
The author received no speci c funding for this work.
Authors' contributions J.J. conceived of the presented idea, designed the study, performed the experiments, analyzed the data, and wrote the manuscript.