Background: When patient distances are calculated based on phenotype, signs and symptoms are often converted to concepts from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric often dominates the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks.
Methods: We converted the neurological signs and symptoms from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient signs and symptoms as the machine learning features . We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics.
Results: Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric.
Conclusion: Using patient diagnoses as labels and patient signs and symptoms as features, we did not find improved classification accuracy or improved cluster quality with semantically augmented distance metrics. Semantic augmentation reduced inter-patient distances but did not improve machine learning performance.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
The full text of this article is available to read as a PDF.
Loading...
On 12 Aug, 2020
On 10 Aug, 2020
On 09 Aug, 2020
On 09 Aug, 2020
Received 03 Aug, 2020
On 23 Jul, 2020
Received 21 Jul, 2020
On 21 Jul, 2020
On 21 Jul, 2020
On 21 Jul, 2020
Received 21 Jul, 2020
On 20 Jul, 2020
Invitations sent on 20 Jul, 2020
On 19 Jul, 2020
On 17 Jul, 2020
Posted 21 Jul, 2020
On 06 Aug, 2020
On 05 Aug, 2020
On 05 Aug, 2020
Posted 19 Jun, 2020
On 06 Jul, 2020
Received 03 Jul, 2020
Received 03 Jul, 2020
Received 03 Jul, 2020
Received 02 Jul, 2020
Received 30 Jun, 2020
On 22 Jun, 2020
On 22 Jun, 2020
On 22 Jun, 2020
On 21 Jun, 2020
On 21 Jun, 2020
On 21 Jun, 2020
On 20 Jun, 2020
Invitations sent on 19 Jun, 2020
On 18 Jun, 2020
On 17 Jun, 2020
On 17 Jun, 2020
On 17 May, 2020
Received 05 May, 2020
Received 05 May, 2020
Received 02 May, 2020
Received 01 May, 2020
On 14 Apr, 2020
On 13 Apr, 2020
On 12 Apr, 2020
On 11 Apr, 2020
Invitations sent on 10 Apr, 2020
On 30 Mar, 2020
On 29 Mar, 2020
On 29 Mar, 2020
On 27 Mar, 2020
On 12 Aug, 2020
On 10 Aug, 2020
On 09 Aug, 2020
On 09 Aug, 2020
Received 03 Aug, 2020
On 23 Jul, 2020
Received 21 Jul, 2020
On 21 Jul, 2020
On 21 Jul, 2020
On 21 Jul, 2020
Received 21 Jul, 2020
On 20 Jul, 2020
Invitations sent on 20 Jul, 2020
On 19 Jul, 2020
On 17 Jul, 2020
Posted 21 Jul, 2020
On 06 Aug, 2020
On 05 Aug, 2020
On 05 Aug, 2020
Posted 19 Jun, 2020
On 06 Jul, 2020
Received 03 Jul, 2020
Received 03 Jul, 2020
Received 03 Jul, 2020
Received 02 Jul, 2020
Received 30 Jun, 2020
On 22 Jun, 2020
On 22 Jun, 2020
On 22 Jun, 2020
On 21 Jun, 2020
On 21 Jun, 2020
On 21 Jun, 2020
On 20 Jun, 2020
Invitations sent on 19 Jun, 2020
On 18 Jun, 2020
On 17 Jun, 2020
On 17 Jun, 2020
On 17 May, 2020
Received 05 May, 2020
Received 05 May, 2020
Received 02 May, 2020
Received 01 May, 2020
On 14 Apr, 2020
On 13 Apr, 2020
On 12 Apr, 2020
On 11 Apr, 2020
Invitations sent on 10 Apr, 2020
On 30 Mar, 2020
On 29 Mar, 2020
On 29 Mar, 2020
On 27 Mar, 2020
Background: When patient distances are calculated based on phenotype, signs and symptoms are often converted to concepts from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric often dominates the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks.
Methods: We converted the neurological signs and symptoms from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient signs and symptoms as the machine learning features . We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics.
Results: Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric.
Conclusion: Using patient diagnoses as labels and patient signs and symptoms as features, we did not find improved classification accuracy or improved cluster quality with semantically augmented distance metrics. Semantic augmentation reduced inter-patient distances but did not improve machine learning performance.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
The full text of this article is available to read as a PDF.
Loading...