Background. Influenza A viruses (IAV) exhibit vast genetic mutability and have great zoonotic potential to infect avian and mammalian hosts and are known to be responsible for a number of pandemics. A key computational issue in influenza prevention and control is the identification of molecular signatures with cross-species transmission potential. We propose an adjusted entropy-based host-specific signature identification method that uses a similarity coefficient to incorporate the amino acid substitution information and improve the identification performance.
Results. Validation with a set of H5N1 PB2 sequences from 1996 to 2006 results in adjusted entropy having a 40% false negative discovery rate compared to an 80% false negative rate using unadjusted entropy. Simulations across a range of phenotypic divergence levels show a false negative rate of no higher than 10% while unadjusted entropy ranged from 9% to 100%. In addition, under all levels of divergence adjusted entropy never had a false positive rate higher than 9%. Adjusted entropy also identifies important mutations in H1N1pdm PB2 previously identified in the literature that explain changes in divergence between 2008 and 2009 which unadjusted entropy could not identify.
Conclusions. Based on these results, adjusted entropy provides a reliable and widely applicable host signature identification approach useful for IAV monitoring and vaccine development.