Background
Despite an epidemic increase in the prevalence of Human Papilloma Virus (HPV) related Oropharyngeal Squamous Cell Carcinomas (OPSCCs) in Northern America and parts of Europe, there is virtually no information about the natural history of these cancers. The lack of well–defined precursor lesions and limited data on oral HPV persistence and clearance rates, poses a challenge for disease modelling. We propose a novel mathematical modelling approach to estimate the conditional probability of developing HPV related OPSCCs following a prevalent HPV infection and other covariates.
Methods
We developed a double-Bayes method, whereby a Bayesian machine learning model first estimates the probability of an individual having an oral HPV infection, given OPSCC and other covariate information. The model is then inverted using Bayes’ theorem to reverse the probability relationship. The mathematical model was derived from two datasets representing the adult population in the United States (US), the Surveillance Epidemiology and End Results Program (SEER) Head and Neck with HPV Status Database and the National Health and Nutrition Examination Survey (NHANES) 2011-2014.
Results
The model dataset contains 8,623 subjects of which 70.7% had a prevalent oral HPV positive infection. When stratified by age, sex, marital status and race/ethnicity, the model estimated higher conditional probability for developing OPSCCs following an oral HPV infection in non-Hispanic White males and females compared to other race/ethnicities. Non-Hispanic White males with an oral HPV infection had nearly two fold higher risk of developing OPSCC than non-Hispanic White females (10.6 cases per 10,000 thousand vs 5.05 cases per 10,000) in the age range 50-60.
Conclusion
We have employed a novel statistical approach to estimate the conditional probability of developing OPSCCs following an oral HPV infection and covariates age, sex, ethnicity and marital status in the US population. We recognise that at best this is a first guess estimate of a natural history model of HPV driven OPSCCs within the existing limitations of the model.