Despite advances in cancer treatment, the five-year mortality rate for oral cancers (OC) is 40%, mainly due to the lack of early diagnostics. To advance early diagnostics for high-risk and average-risk populations, we developed and evaluated machine-learning (ML) classifiers using metatranscriptomic data from saliva samples (n=433) collected from oral premalignant disorders (OPMD), OC patients (n=71) and normal controls (n=171). Our diagnostic classifiers yielded a receiver operating characteristics (ROC) area under the curve (AUC) up to 0.9, sensitivity up to 83% (92.3% for stage 1 cancer) and specificity up to 97.9%. Our metatranscriptomic signature incorporates both taxonomic and functional microbiome features, and reveals a number of previously known and novel taxa and functional pathways associated with OC. For the first time, we demonstrate the potential clinical utility of an AI/ML model for diagnosing OC early, opening a new era of non-invasive diagnostics, enabling early intervention and improved patient outcomes.

Figure 1

Figure 2

Figure 3
The full text of this article is available to read as a PDF.
Yes there is potential Competing Interest. Several of the authors are employees of Viome Inc, a commercial for-profit company. For the other authors there is no conflict of interest to be best of our knowledge.
This is a list of supplementary files associated with this preprint. Click to download.
Details of the discovery cohort.
Saliva sample processing procedure for NGS workflow
Distribution of Spearman correlations for inter and intra donor saliva sample pairs processed concurrently and independently (a) taxonomy data (b) functional data.
Distribution of the number of reads
ROC AUC of the classifier for Cohort C using (a) only taxa as the features and (b) only the KOs as the features.
Loading...
Posted 19 Aug, 2020
Posted 19 Aug, 2020
Despite advances in cancer treatment, the five-year mortality rate for oral cancers (OC) is 40%, mainly due to the lack of early diagnostics. To advance early diagnostics for high-risk and average-risk populations, we developed and evaluated machine-learning (ML) classifiers using metatranscriptomic data from saliva samples (n=433) collected from oral premalignant disorders (OPMD), OC patients (n=71) and normal controls (n=171). Our diagnostic classifiers yielded a receiver operating characteristics (ROC) area under the curve (AUC) up to 0.9, sensitivity up to 83% (92.3% for stage 1 cancer) and specificity up to 97.9%. Our metatranscriptomic signature incorporates both taxonomic and functional microbiome features, and reveals a number of previously known and novel taxa and functional pathways associated with OC. For the first time, we demonstrate the potential clinical utility of an AI/ML model for diagnosing OC early, opening a new era of non-invasive diagnostics, enabling early intervention and improved patient outcomes.

Figure 1

Figure 2

Figure 3
The full text of this article is available to read as a PDF.
Yes there is potential Competing Interest. Several of the authors are employees of Viome Inc, a commercial for-profit company. For the other authors there is no conflict of interest to be best of our knowledge.
This is a list of supplementary files associated with this preprint. Click to download.
Details of the discovery cohort.
Saliva sample processing procedure for NGS workflow
Distribution of Spearman correlations for inter and intra donor saliva sample pairs processed concurrently and independently (a) taxonomy data (b) functional data.
Distribution of the number of reads
ROC AUC of the classifier for Cohort C using (a) only taxa as the features and (b) only the KOs as the features.
Loading...