1. Benjamin EJ, Virani SS, Callaway CW, et al. Heart Disease and Stroke Statistics—2018 Update: A Report From the American Heart Association. Circulation. 2018;137:e67–e492.
2. Arch AE, Weisman DC, Coca S, et al. Missed Ischemic Stroke Diagnosis in the Emergency Department by Emergency Medicine and Neurology Services. Stroke. 2016;47:668–673.
3. Tirschwell DL, Longstreth Jr WT. Validating Administrative Data in Stroke Research. Stroke. 2002;33:2465–2470.
4. Benesch C, Witter D, Wilder A, et al. Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease. Neurology. 1997;49:660–664.
5. Weiskopf NG, Hripcsak G, Swaminathan S, et al. Defining and measuring completeness of electronic health records for secondary use. Journal of Biomedical Informatics. 2013;46:830–836.
6. Mo H, Thompson WK, Rasmussen LV, et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. Journal of the American Medical Informatics Association. 2015;22:1220–1230.
7. Shivade C, Raghavan P, Fosler-Lussier E, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assn. 2014;21:221–230.
8. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assn. 2013;20:117–121.
9. Carroll RJ, Eyler AE, Denny JC. Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis. Amia Annu Symposium Proc Amia Symposium Amia Symposium. 2011;2011:189–96.
10. Peissig, P, Costa, V, Caldwell, M, Rottscheit, C, Berg, R, Mendonca, E, and Page, D, “Relational machine learning for electronic health record-driven phenotyping,” Journal of Biomedical Informatics, vol. 52, 260–270, 2014.
11. Chen, Y, Carroll, R, Hinz, E, Shah, A, Eyler, A, Denny, J, and Xu, H, “Applying active learning to high-throughput phenotyping algorithms for electronic health records data,” Journal of the American Medical Informatics Association, vol. 20, no. e2, e253–e259, 2013.
12. Yu S, Chakrabortty A, Liao KP, et al. Surrogate-assisted feature extraction for high throughput phenotyping. J Am Medical Informatics Assoc Jamia. 2016;ocw135.
13. Ning W, Chan S, Beam A, et al. Feature Extraction for Phenotyping from Semantic and Knowledge Resources. Journal of biomedical informatics. 2019;103122.
14. Yu S, Ma Y, Gronsbell J, et al. Enabling phenotypic big data with PheNorm. Journal of the American Medical Informatics Association : JAMIA. 2017;
15. Agarwal V, Podchiyska T, Banda JM, et al. Learning statistical models of phenotypes using noisy labeled training data. Journal of the American Medical Informatics Association : JAMIA. 2016;23:1166–1173.
16. Halpern Y, Horng S, Choi Y, et al. Electronic medical record phenotyping using the anchor and learn framework. Journal of the American Medical Informatics Association : JAMIA. 2016;23:731–40.
17. Murray, SG, Avati, A, Schmajuk, G, and Yazdany, J. “Automated and flexible identification of complex disease: Building a model for systemic lupus erythematosus using noisy labeling.,” Journal of the American Medical Informatics Association: JAMIA, vol. 26, no. 1, 61–65, 2019.
18. B. K. Beaulieu-Jones, C. S. Greene, and Pooled Resource Open-Access ALS Clinical Trials Consortium, “Semi-supervised learning of the electronic health record for phenotype stratification,” Journal of Biomedical Informatics, vol. 64, pp. 168–178, 2016.
19. C. Walsh and G. Hripcsak, “The effects of data sources, cohort selection, and outcome definition on a predictive model of risk of thirty-day hospital readmissions,” Journal of Biomedical Informatics, vol. 52, pp. 418–426, Dec. 2014.
20. A. Perotte, R. Pivovarov, K. Natarajan, N. Weiskopf, F. Wood, and N. Elhadad, “Diagnosis code assignment: Models and evaluation metrics,” Journal of the American Medical Informatics Association, vol. 21, no. 2, 231–237, 2014.
21. Y. Zhang, “A hierarchical approach to encoding medical concepts for clinical notes,” Association for Computational Linguistics, 67–72, 2008.
22. C. G. Walsh, K. Sharman, and G. Hripcsak, “Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk,” Journal of Biomedical Informatics, vol. 76, pp. 9–18, Dec. 2017.
23. Ni Y, Alwell K, Moomaw CJ, et al. Towards phenotyping stroke: Leveraging data from a large-scale epidemiological study to detect stroke diagnosis. Plos One. 2018;13:e0192586.
24. Imran TF, Posner D, Honerlaw J, et al. A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program. Clin Epidemiology. 2018;10:1509–1521.
25. V. Abedi, N. Goyal, G. Tsivgoulis, N. Hosseinichimeh, R. Hontecillas, J. Bassaganya-Riera, L. Elijovich, J. E. Metter, A. W. Alexandrov, D. S. Liebeskind, and et al., “Novel screening tool for stroke using artificial neural network.,” Stroke, vol. 48, no. 6, 1678–1681, 2017.
26. Z. Chen, R. Zhang, F. Xu, X. Gong, F. Shi, M. Zhang, and M. Lou, “Novel prehospital prediction model of large vessel occlusion using artificial neural network.,” Frontiers in aging neuroscience, vol. 10, p. 181, 2018.
27. W. Hersh, M.Weiner, P. Embi, J. Logan, P. Payne, E. Bernstam, H. Lehmann, G. Hripcsak, Hartzog, J. Cimino, and J. Saltz, “Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research,” Medical Care, vol. 51, Aug. 2013.
28. J. M. Overhage and L. M. Overhage, “Sensible use of observational clinical data,” Statistical Methods in Medical Research, vol. 22, no. 1, pp. 7–13, Feb. 2013.
29. R. M. Kaplan, D. A. Chambers, and R. E. Glasgow, “Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias,” Clinical and Translational Science, vol. 74, pp. 342–346, 2014.
30. S. Schneeweiss and J. Avorn, “A review of uses of health care utilization databases for epidemiologic research on therapeutics,” Journal of Clinical Epidemiology, vol. 58, no. 4,323–337, Apr. 2005.
31. N. G. Weiskopf, G. Hripcsak, S. Swaminathan, and C. Weng, “Defining and measuring completeness of electronic health records for secondary use,” Journal of biomedical informatics, vol. 46, no. 5, 830–836, 2013.
32. Weiskopf NG, Hripcsak G, Swaminathan S, et al. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform. 2013;46:830–836.
33. Sinnott JA, Cai F, Yu S, et al. PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. Journal of the American Medical Informatics Association : JAMIA. 2018;
34. Sinnott JA, Dai W, Liao KP, et al. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records. Human genetics. 2014;133:1369–82.
35. Bastarache L, Hughey JJ, Hebbring S, et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359:1233–1239.
36. Son JH, Xie G, Yuan C, et al. Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. American journal of human genetics. 2018;103:58–73.
37. Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. Journal of the American Medical Informatics Association : JAMIA. 2017
38. Reich, C., and Ryan, P.B., and Belenkaya, R., Natarajan,K., and Blacketer, C. OMOP Common Data Model v6.0 Specifications. https://github.com/OHDSI/CommonDataModel/wiki Accessed September 2019
39. 2018 ICD-10 CM and GEMs. U.S. Centers for Medicare & Medicaid Services. https://www.cms.gov/medicare/coding/icd10/2018-icd-10-cm-and-gems.html. Accessed February 2018.
40. HCUP CCS-Services and Procedures. Healthcare Cost and Utilization Project (HCUP). March 2017. Agency for Healthcare Research and Quality. https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed March 2019.
41. Boehme AK, Esenwa C, Elkind M. Stroke Risk Factors, Genetics, and Prevention. Circ Res. 2017;120:472–495.
42. Benjamin EJ, Blaha MJ, Chiuve SE, et al. Heart Disease and Stroke Statistics-2017 Update: A Report From the American Heart Association. Circulation. 2017;135(10):e146. Epub 2017 Jan 25
43. Hripcsak G, Levine ME, Shang N, et al. OUP accepted manuscript. J Am Med Inform Assn. 2018;
44. Polubriaginof F, Vanguri R, Quinnies K, et al. Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell. 2018;173:1692-1704.e11.
45. Sudlow C, Gallacher J, Allen N, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. Plos Med. 2015;12:e1001779.
46. Woodfield, R., Group, U. B. S. O., Group, U. B. F. and O. W. & Sudlow, C. L. M. Accuracy of Patient Self-Report of Stroke: A Systematic Review from the UK Biobank Stroke Outcomes Group. PLOS ONE 10, e0137538 (2015).