For all the analysis performed in this article, we used the second cohort of the Singapore Longitudinal Ageing Study (SLAS-2), a longitudinal study of aging and health of community-dwelling Singaporeans aged 55 or more at the start of the study, as previously described . It excludes individuals unable to participate because of severe physical or mental disabilities. It includes 3200 residents of the southwest and central south of Singapore starting in 2010. The study received ethical approval from the National University of Singapore Institutional Review Board and written consent was obtained from all participants (response rate of 78%). The study followed the Strengthening the Reporting of Observational Studies in Epidemiology reporting guidelines . Although the dataset presents longitudinal components, the flow cytometry data needed for this study were only available cross-sectionally in all participants.
Health outcome metrics
In our analysis, we looked at the predictive power of our models on 20 health or health-status-related measures: (1) Age; (2) Mortality; (3) Self-assessed health measured on a five-point Likert scale, based on the question “Generally would you say your health is: Excellent, Very good, Good, Fair or Poor”; (4) Frailty evaluated on the 5 criteria from Fried’s phenotypic scale : weakness, slowness, weight loss, low physical activity and exhaustion; (5) Global cognitive function as quantified via the Mini Mental State Evaluation (MMSE); (6) The number of comorbidities from a list of 23 based on self-report, medication and physical or laboratory tests; (7) High blood pressure; (8) High cholesterol; (9) Diabetes; (10) Stroke; (11) Heart attack; (12) Atrial fibrillation; (13) Eye problem; (14) Asthma; (15) Arthritis; (16) Osteoporosis, (17) Gastrointestinal problems; (18) Thyroid problems; (19) Cancer; (20) Depression. Most of these metrics are dichotomic, but age, self-assessed health, frailty, MMSE, and comorbidities are discrete measures with multiple values. Religion was also included as a negative control.
Cell surface markers
The surface markers used in this article are 6-Sulfo LacNAc (Slan), CD19, Pan-GDT, TCRVg1, TCRVa7.2, CD45RO, CD127, CD56, HLADR, CCR6, CD45, CRTH2, CD34, CD38, CD57, CD25, CD16, CD123, CD27, CD3, CD8, CD14, CXCR3, TCRVg2, IgD, CD4 and CD161. The markers CD19 & Pan-GDT, TCRVg1 & TCRa7-2, CD8 & CD14, and TCRVg2 & IgD were paired together respectively on the same channel, during the panel design Flow Cytometry, as these markers are located on different cell types (mutually exclusive)
Preprocessing and statistical analysis
The Flow Cytometry data were analyzed with primary gating to exclude debris using the FSC-A/SSC-A gate, the FSC-A/FSC-H gate to keep only single cells and excluding cells absorbing the LIVE/DEAD™ Fixable Blue Stain (ThermoFisher Scientific). Finally, cells expressingCD45+ were kept. This gating enabled to work on single living leukocytes for the rest of the analyses.
For the non-gated model, since the number of cells varied between individuals but could often approach half a million, we randomly sampled 5000 cells for each individual to ensure equal representation and reduce computational time. Before this sampling, the first and last 10% of each individual file were removed to limit inconsistencies during the Flow Cytometry acquisition. Since a few extreme negative outliers were observed for most of the markers, a threshold was set at -50 000 relative fluorescence units for all markers and all cells with markers below that limit were removed. This was done to prevent these outliers from weighing too much on the model, since it is based on distribution. Then, for each individual, the distribution of fluorescence intensity of each marker was divided into 102 different sections. All values below the 2.5th percentile and above the 97.5th percentile were put together into the two lowest and highest sections, respectively, in order to avoid outliers having too strong of an impact on the results. The rest of the distributions were separated into 100 sections of the same width on the absolute scale. The number of cells present in each of these sections was then stored and used as input in the model. The individuals were split into groups of 300 for the calibration and 267 for the validation.
The non-gated model is therefore composed of 23 sets (one for each surface marker) of 102 inputs, each followed by a dense layer of 75 neurons, another dense layer of 50 neurons, another dense layer of 25 neurons, and then a dense layer of 1 neuron for the marker studied. The number of neurons in each layer was selected to be lower than the initial input layer and to form a decreasing gradient so that the later layers represent more generalized patterns. The last 23 layers of 1 neuron are then added and passed to a last dense layer of 1 neuron which gives the final output (fig. 1A). For the first 3 layers of 75, 50, and 25 neurons, the activation function is the exponential linear unit and for the two layers of 1 neuron, the activation function is linear. The non-gated model was run with an epochs of 25000 and a batch size of 100.
For the gated model, 67 mutually exclusive different cell types were obtained via a gating strategy shown in Supplemental figure X. The individuals were split into 300 for the calibration and 267 for the validation. The model is composed of 67 inputs, followed by a dense layer of 50 neurons, a dense layer of 30 neurons, a dense layer of 15 neurons, and a dense layer of 1 neuron which gives the final output (fig. 1B). The first three layers of 50, 30, and 15 neurons have an exponential linear unit activation function and the last layer of 1 neuron has a linear activation function. The same reasoning as for the Continuous model was applied to the selection of the number of neurons in each layer for this model. For the gated model, an epochs of 10000 was used since the model converged more easily and a batch size of 100.
For both models, individuals that had missing data in any of the measures were removed to keep the number of people used to calibrate and evaluate each model the same. Models were generated 100 different times using the same settings to create replicates to consider the random variation that can occur during the generation of the model. Both models used the Adam algorithm for there optimisation. All analyses were conducted using R v3.6.3, Python 3.7.6  and TensorFlow 1.14.0 .
Success of predictions was assessed based on the comparison of the root mean squared error (rmse) score and the mean value. An rmse for a health measure with no predictive capacity would be close to or higher than its mean value. Successful predictions were considered to be health measures for which the rmse was less than one third of the mean value.