Prediction models are commonly used to estimate risk for cardiovascular diseases; however, performance may vary substantially across relevant subgroups of the population. Here we investigated the variability of performance and fairness across a variety of subgroups for risk prediction of two common diseases, atherosclerotic cardiovascular disease (ASCVD) and atrial fibrillation (AF). We calculated the Cohorts for Heart and Aging in Genomic Epidemiology Atrial Fibrillation (CHARGE-AF) for AF and the Pooled Cohort Equations (PCE) score for ASCVD in three large data sets: Explorys Life Sciences Dataset (Explorys, n = 21,809,334), Mass General Brigham (MGB, n = 520,868), and the UK Biobank (UKBB, n = 502,521). Our results demonstrate important performance heterogeneity of established cardiovascular risk scores across subpopulations defined by age, sex, and presence of preexisting disease. For example, in CHARGE-AF, discrimination declined with increasing age, with concordance index of 0.72 [ 95% CI, 0.72–0.73 ] for the youngest (45–54y) subgroup to 0.57 [ 0.56–0.58 ], for the oldest (85–90y) subgroup in Explorys. The statistical parity difference (i.e., likelihood of being classified as high risk) was considerable between males and females within the 65–74y subgroup with a value of -0.33 [ 95% CI, -0.33–-0.33 ]. We observed also that large segments of the population suffered from both decreased discrimination (i.e., <0.7) and poor calibration (i.e., calibration slope outside of 0.7–1.3); for example, all individuals 75 or older in Explorys (17.4%). Our findings highlight the need to characterize and quantify how clinical risk models behave and perform within specific subpopulations so they can be used appropriately to facilitate more accurate and equitable assessment of disease risk.