The 2019 novel coronavirus (SARS-CoV-2) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. To diminish the short-term and long-term impacts of coronavirus (CoV), we investigated CoV differences at the nucleotide and protein level and CoV genomic variation associated with epidemiological variation and geography. We divided the CoV genome into 29 constituent regions for this analysis. Our results highlight the variation of CoV variants of lineage and show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation, which makes these two proteins potential targets for treatments. S protein variation is highly correlated with nsp3, nsp6, and 3'−to−5' exonuclease. Country of origin and time since the start of the pandemic were the most influential metadata in these differences. Host sex and age are the lowest in terms of explaining the virus genome variation. We quantified variation explained by regions of the CoV genome across different CoV viruses including, SARS-CoV-2, Middle East respiratory syndrome coronavirus (MERS-CoV), other severe acute respiratory syndrome coronavirus SARS-CoV (SARS-related), and bat-derived severe acute respiratory syndrome (SARS)-like coronaviruses (Bat-SL-CoV). We found that Spike protein and nsp3 explain most of the variation among these viruses; they are also among the genomic regions with the highest number of sites under natural selection. Our results provide a direction to prioritize genes associated with outcome predictors, including health, therapeutic, and vaccine outcomes, and to inform improved DNA tests for predicting disease status.