UK Biobank has ethics approval from the North West Multi-centre Research Ethics Committee [25]. According to this approval, researchers do not require separate ethics applications. All participants provided written informed consent. This study was carried out in accordance with relevant guidelines and regulations.
Study population
We analysed data from the UK Biobank study that included more than 500,000 middle-aged (38-73 years) adults recruited from 22 sites across England, Wales, and Scotland. We used baseline data collected between 2006 and 2010 [26, 27]. Socio-demographic, lifestyle (smoking, alcohol consumption, dietary intake, physical activity, sitting time and sleep duration) and medical history were collected using the touchscreen questionnaire [27].
Disease categories
The UK Biobank collected self-reported medical information, such as CVD based on physician diagnosis. To define participants' CVD status, we used data on vascular/heart problems diagnosed by a doctor (Field ID = 6150). Under this Field ID, four CVDs were reported: heart attack, angina, stroke, and high blood pressure. For this analysis, participants who were reported to have at least one of these diseases were classified as having CVD, not otherwise. A total of 283,172 participants without missing data were included.
For the 174,030 participants without CVD, we computed a 10-year CVD risk score [28] using the Framingham risk score function from the CVrisk package [29] in R [30]. The variables included in the 10-year CVD risk calculation were age, gender, total cholesterol, HDL cholesterol, systolic blood pressure, blood pressure medication, smoking and diabetes status [28].
Lifestyle behaviours
This analysis used six lifestyle behaviours: smoking, physical activity, fruit and vegetable consumption, alcohol intake, screen and driving time, and sleep duration.
Physical activity
The UK Biobank collected data on physical activity using adapted questions from the short International Physical Activity Questionnaire (IPAQ) [31] that includes the frequency, intensity and duration of walking, moderate and vigorous activity. UK Biobank data on time spent on moderate and vigorous activity was added and converted to a metabolic equivalent of task (MET) score. Participants were classified as active if they had ≥750 MET min/week or inactive (<750 MET min/week), based on the 2019 AHA guideline [6].
Fruit and vegetable intake
The UK Biobank collected data on dietary intake using the Food Frequency Questionnaire [32]. The NHS guideline recommends every individual to eat at least 5 portions of a variety of fruit and vegetables every day [33, 34]. Data on fresh fruit (pieces), dried fruit (pieces), salad/raw vegetable (heaped tablespoons) and cooked vegetable (heaped tablespoons) were combined to calculate portions. Participants consuming at least 5 portions of fruits and vegetables per day were considered to have adequate fruit and vegetable intake.
Alcohol intake
Participants were asked for the number of pints of beer, glasses of wine, and measures of spirit consumed in the last week. Since alcoholic drinks differ in the amount of alcohol content, we converted each drink into equivalent standard units (1 unit contains 10 ml of ethyl alcohol) [35]. We calculated total weekly units of alcohol consumption by adding the units of beer, wine, and spirits. Based on the NHS guidelines [35], we grouped participants as low-risk drinkers (≤14 units per week) or high-risk drinkers (>14 units per week).
Smoking
To measure smoking status, participants were asked, "Do you smoke tobacco now?". Response options were “Yes, on most or all days”, “Only occasionally” and “No”. Those who responded “yes” or “smoke occasionally” were coded as 1, current smoker, while those who responded as “no” were coded as 0, not a current smoker.
Prolonged sitting
Total sitting time was calculated from the sum of self-reported hours spent watching television, using the computer, and driving during a typical day. Based on the estimated total sitting time, participants were categorized as low risk sitting (<8 hours/day) or prolonged sitting (≥8 hours/day) [36, 37]. This was based on the evidence of greater mortality risk for each increased sitting time category compared with <8 hours/day [36, 37].
Sleep duration
To measure sleep duration, the UK Biobank asked participants ‘About how many hours sleep do you get in every 24 h? (please include naps)’. Sleep duration was split using predefined thresholds from the literature; <7 hours, 7–8 hours and > 8 hours [13]. Based on these cut points, participants were grouped as having ‘poor sleep’ (<7 or >8 h/night) and ‘good sleep’ (7–8 h/night).
Socio-demographic variables
Socio-demographic characteristics (age and gender), and Townsend deprivation index (TSDI) were included in the latent class analysis (LCA) model. TSDI was used to measure participants' deprivation [38]. The index combines information on housing, employment, car availability and social class, with higher values indicating greater deprivation [38].
Statistical analysis
Descriptive statistics were performed on socio-demographic characteristics, lifestyle behaviours and medical conditions. The Mplus version 8.8 software [39] was used to estimate a distal outcome model to identify latent classes (LCs) of lifestyle risk behaviours, and the association between LC membership and distal outcomes (CVD, and CVD risk) (Fig 1). To select the number of LCs that best fit the data, we first fitted a two-class latent model and successively increased the number of LCs by one, up to a six-class latent model. Model evaluation was performed using the Bayesian Information Criterion (BIC) and Akaike information criterion (AIC) [40]. Model selection was made based on statistical criteria (with lower AIC and BIC) and interpretability of the estimated LCs [40]. Based on these criteria, four LCs for CVD, and three LCs for CVD risk were selected (see Supplementary Tables).
LC analysis with a distal outcome [22-24], covariate, and LC mediator [41] was run to identify and estimate: (1) the effect of LC membership on distal outcomes (Fig 1 – left hand side), and (2) the direct, indirect (LC membership mediated effect) and total effect of covariate(s) on distal outcomes (Fig 1 – right hand side). Gender, age, and smoking were not used in the CVD risk distal outcome model – since we used them in the computation of the CVD risk score. For distal outcome models, the Bolck–Croon–Hagenaars (BCH) method [22, 42] outperforms other methods. In addition, the BCH approach gives more accurate mediation estimates in LC analysis mediation models [41]. The BCH method avoids shifts in LC in the final step and performs well when the variance of the auxiliary variable differs substantially across classes [22]. To estimate the model, the BCH method uses weights that reflect the measurement error of the LC variable [22]. Two versions of the BCH method were implemented in Mplus – the automatic and two-step manual BCH versions [22]. The automatic BCH method is restrictive. In this analysis, we used the manual BCH two-step approach to estimate auxiliary models with continuous (CVD risk) and categorical (CVD) distal outcomes. We estimated the LC measurement model and saved the BCH weights in the first step. The second step estimated the general auxiliary model conditional on the LC variable using the BCH weights. Continuous (mean difference - MD) and binary (odds ratio - OR) outcomes were reported with 95% confidence intervals (95% CI).