Nuclear morphology is a deep learning biomarker of senescence across tissues and species


 Cellular senescence is a critical component of aging and many age-related diseases, but understanding its role in human health is challenging in part due to the lack of exclusive or universal markers. Using neural networks, we achieve high accuracy in predicting senescence state and type from the nuclear morphology of DAPI-stained human fibroblasts, murine astrocytes, murine neurons, and fibroblasts derived from premature aging diseases in culture. After generalizing this approach, the predictor recognizes an increasing rate of senescent cells with age in H&E-stained murine liver tissue and human dermal biopsies, suggesting that alterations in nuclear morphology is a universal feature of senescence. Evaluating corresponding medical records reveals that individuals with a higher rate of senescent cells have a significantly decreased rate of malignant neoplasms, lending support for the protective role of senescence in limiting cancer development. Additionally, we find a positive association with lower significance for other conditions, including osteoporosis, osteoarthritis, hypertension, cerebral infarction, hyperlipidemia, and hypercholesteremia. In sum, we introduce a predictor of cellular senescence based on nuclear morphology that is applicable across tissues and species and is associated with health outcomes in humans.


Abstract
24 Cellular senescence is a critical component of aging and many age-related diseases, but 25 understanding its role in human health is challenging in part due to the lack of exclusive or 26 universal markers. Using neural networks, we achieve high accuracy in predicting senescence 27 state and type from the nuclear morphology of DAPI-stained human fibroblasts, murine 28 astrocytes, murine neurons, and fibroblasts derived from premature aging diseases in culture. 29 After generalizing this approach, the predictor recognizes an increasing rate of senescent cells 30 with age in H&E-stained murine liver tissue and human dermal biopsies, suggesting that 31 alterations in nuclear morphology is a universal feature of senescence. Evaluating 32 corresponding medical records reveals that individuals with a higher rate of senescent cells 33 have a significantly decreased rate of malignant neoplasms, lending support for the protective 34   Figure 1 Nuclear morphology is an accurate senescence predictor in cultured cells. a Analysis workflow. b Sample nuclei for controls, replicative senescence (RS) and ionizing radiation (IR) induced senescent cells. c Area of identified nuclei (n=6,976-68,971, mean ± 95% CI). d Convexity of identified nuclei (n= 6,976-68,971, mean ± 95% CI). e Aspect ratio of identified nuclei (n= 6,976-68,971, mean ± 95% CI). f Scatter plot of individual nuclei, with overall distributions for each at the top and right margins. g Cell cycle analysis after exposure to several doses of IR; mn: multinucleated cells. h Accuracy of a deep neural network (DNN) predictor on validation data. i Receiver operating characteristics (ROC) curve of the DNN. j Percent of nuclei in each state classified as senescent for independent cell lines. k Distribution of prediction probabilities for several doses of IR for three fibroblast cell lines (n=38,284-106,132). l Distribution of p21 intensities for several doses of IR for three fibroblast cell lines (n=38,284-106,132). m Distribution of PCNA intensities for several doses of IR for three fibroblast cell lines (n=38,284-106,132). n Correlation between predicted senescence and nearby SA-β-gal regions, showing all and 90% confidence predictions only for RS and IR groups. o Correlation between predicted senescence and multiple markers, showing all, filtered for markers with strong signals, and filtered with 90% confidence predictions only. p Accuracy of DNNs trained and predicting after different normalization methods. q Correlation between morphological metrics and predicted senescence by class, BG: background. the area of the IR senescent cells was bimodal, with the lower mode matching RS and a higher 118 mode at almost twice the area of the RS, perhaps suggesting IR induced aneuploidy or stalling 119 at the G2 checkpoint of the cell cycle (Fig. 1f, upper histogram distribution of joint scatter plot). 120 To further explore this hypothesis, we induced senescence with multiple IR doses and utilized 121 flow cytometry to study the cell cycle. Remarkably, we observed a dose-dependent increase in 122 G2 and corresponding loss of G1 and S-phase cells 10 days after the IR treatment (Fig. 1g), 123 indicating that IR induction leads to G2 stalled senescent cells as previously suggested 22,23 . 124 Simple nuclear morphological measures appear to be a viable method for assessing cellular 125 senescence in culture. 126 127 128

Deep Learning Classifiers Accurately Predicts Senescence Based on DAPI staining 129
To better characterize the performance with senescent phenotypes induced by multiple levels of 144 stress, we applied the DNN predictor to cells exposed to different doses of radiation. All levels 145 were predicted to be senescent, and there was a 9.7% mean increase between 5 and 10 Gy, 146 but 10 Gy to 20 Gy show a similar prediction score (Fig. 1k). PCNA declines with increasing 147 dose and p21 increases (Fig. 1l, m). Predicted senescence and the two markers align with 148 experimental conditions, but the predictor appears to track p21 expression more closely than 149 PCNA. This experiment indicates that the treatment dose influences the senescent phenotype 150 up to 10 Gy, a dose commonly used for senescence induction. 151

152
In another experiment, a deep neural network was trained to detect control compared to 153 different senescent types, IR and RS. Xception, trained like the dual state model above, 154 produced a mean class accuracy of 78.6% in detection of the three states, with 83.3% for 155 controls, 75.7% for RS, and 76.8% for IR (Fig. S2d, e). It achieved a relatively high AUC of 0.9 156 for RS and 0.95 for IR. In sum, nuclear morphology represents a strong predictor of both 157 replicative and DNA damage induced senescence. To confirm the accuracy of the DNN, we evaluated the correlation between predictions and 164 traditional markers of senescence, including SA-β-gal, p16, p21, and p53. Training a deep 165 neural network to recognize SA-β-gal regions, we found SA-β-gal near nuclei for 64.1% of IR 166 and 65.8% of RS compared to 19.6% for control, which roughly matches published rates for RS 167 and controls 26 . A correlation analysis revealed a Pearson coefficient of 0.39 for IR and 0.31 for 168 RS between predicted senescence and SA-β-gal detected nearby, but when restricting to the 169 treated cells with nearby SA-β-gal and controls without it, the correlation rose to 0.83 for IR and 170 0.67 for RS (Fig. 1n). Applying a 90% confidence filter (see sections below on confidence and 171 deep ensemble methods), correlation rose to 0.96 for IR and 0.90 for RS, indicating the 172 predictor is highly effective at recognizing senescence with detected SA-β-gal. On a per cell 173 basis, we found moderate correlation between p16, p21, and p53 stain intensities and predicted 174 senescence, but after applying a threshold to classify as positive markers and filtering out nuclei 175 near the threshold (due to the broad overlap in the distribution of intensities, Fig. S1g, h, i), the 176 correlation rose significantly to 0.69 for p16, 0.59 for p21, and 0.63 for p53 (Fig. 1o). We also 177 applied confidence filtering, restricting nuclei to those with high predictive confidence, and found 178 unclear what the deep neural network is using as its basis for assessment. Nuclear area, 188 staining intensity and even the image background itself could contain a signal that the neural 189 network is picking up on. To provide some insight into how much these potential factors 190 contribute to senescence classification, we trained several models based on reduced forms of 191 the cutout library. Our base model already includes brightness standardization. First, the 192 background of the nuclei was masked, by excluding all areas outside of the U-Net detected 193 nuclear region. Next, we applied size normalization, such that the greater of the width and 194 height was set to a standard pixel size. Finally, we converted the interior of nuclei to a single-195 color value, essentially masking all internal structure. With each reduction, we observed a slight 196 decrease in classification accuracy when applied to independent test lines (Fig.1p). The 197 background masking produced 86% for the f1-score and 88% for accuracy, a small reduction 198 indicating limited reliance of the background. With background masked and size normalized, a 199 trained model produced 87% for f1-score and 88% for accuracy, showing area and size played 200 little role in senescent detection. This model was further reduced by completely masking the 201 internal structure of the nuclei, which led to an f1-score of 80% and accuracy of 78% (Fig. S2f, 202 g). While masking was a significant reduction in accuracy, it is remarkable that so much 203 information could be removed from nuclear images and still obtain a relatively accurate 204 classification of senescence. These experiments suggest that classification is largely based on 205 the overall shape of the nuclei. We explored this further by evaluating Pearson correlation 206 between predicted senescence and several morphological metrics, finding that area was 207 moderately correlated (despite being standardized by the predictor) but convexity and aspect 208 ratio were weaker (Fig. 1q). The deep learning model appears to be picking up on the nuclear 209 shape in a more sophisticated manner than simple morphometrics. The final reduced model yields an overall accuracy of 78%, and it shows an imbalanced per 212 class accuracy of 73.9% for control, 69.3% for RS, and 91.4% for IR. It maintains a good AUC 213 of 0.88. With similar reductions, the three-state senescent type detector model shows overall 214 accuracy of only 58% (Fig. S2h, i) performed adequately for raw cutouts, but it would not train well for the masked/normalized 232 nuclei. We partially converted Xception to utilize Flipout nodes 28 , leaving the separable 233 convolutions as point estimate nodes. We also fully converted InceptionV3 as an alternative 234 model. Our partial BNN of Xception produced an f1-score of 84%, accuracy of 86%, and AUC of 235 0.92 (Fig. S2j, k). The full BNN for InceptionV3 gave an f1-score of 79%, accuracy of 80%, and 236 AUC of 0.87 (Fig. S2l, m) consulting a collection of experts (or interpreted as the "wisdom of the crowd"). To achieve this, 247 we trained an ensemble with random initial weights, potentially allowing convergence to different 248 local minima. We found that there is consistent agreement for the majority of samples, however, 249 there is a significant percent of edge cases with a high variance in predictions among the model 250 instances (Fig. 2a). 251 252 We therefore speculate that using an ensemble of deep models for inference and aggregating 253 the results provides predictions with less bias and higher confidence (Fig. 2b). Evidently, some 254 models balance the accuracy of each class in the middle of the range (75-80%), while other 255 models skew toward one class at the expense of the other (for example, obtaining ~85% on one 256 but ~70% on the other). While ensembles have benefits like a BNN, they can be less biased 257 since each ensemble member can specialize around a solution, while a BNN is confined to a 258 single local minima in solution space. Accordingly, we obtained good results with the ensemble 259 method, with an f1-score of 91%, accuracy of 94%, AUC of 0.98 (Fig. 2c, d). More importantly, 260 the ensemble provides a higher confidence and less biased approach by combining multiple 261 models that specialize in predicting different classes.

An ensemble of neural networks outperforms Bayesian neural networks 263 264
We also tried Bagging, where bootstrapping with replacement selects a subset of the samples 265 to use in training independent models. This method did not provide a significant improvement 266 over the basic deep ensemble method (Fig. 2e). The BNN models can be used to improve 267 confidence but sacrifice performance, while the ensemble models provide both (Fig. 2e). We 268 therefore further evaluated the deep ensemble method with masked and normalized samples. 269 This produced an f1-score of 80%, accuracy of 82%, and AUC of 0.89 (Fig. 2f, g), which 270 improved upon the single model. The ensemble method was also applied to the tri-state model 271 to distinguish senescent type, which achieved overall accuracy of 66% and AUC of 0.81 for RS 272 and 0.92 for IR (Fig. 2h,  classified IR with high accuracy, but the RS-only model recognized RS with ~13% higher 281 accuracy, while the IR-only misclassified those as control (Fig. 2j, k). Ensembles of deep neural 282 networks clearly allow for greater accuracy for senescence prediction. should not be interpreted as model confidence, but by sampling from a BNN or deep ensemble, 291 we can utilize the distribution to determine uncertainty 27 . We evaluated the predictions for the 292 BNN and deep ensemble (Fig. S3a, b). Correct predictions are indeed oriented toward the 293 lower and higher range of the softmax output, representing greater certainty about a sample's 294 state. In both cases, the incorrect predictions are clustered toward the center near the 0.5 295 threshold. Different models could be biased toward either state by shifting those ambiguous 296 samples across the threshold. 297

298
We can assume higher confidence in a model's predictions by raising the classification 299 threshold (of both one-hot states, thereby filtering the predictions in the middle). We therefore 300 evaluated the accuracy using a range of thresholds from 0.5 up to 0.95 in the single model, the 301 Xception BNN, the ensemble of models, and the ensemble of fully normalized models (Fig. S3c,  302 d, e, f). In all cases, we see a significant increase in accuracy as the threshold is raised, due to 303 the ambiguous samples being discarded. By raising the threshold, the Xception-based BNN 304 goes from 85.6% to 96.0%, while the ensemble of normalized models goes from 81.6% 305 accuracy to 97.2%. A similar approach was applied to other models, including the IR-only and 306 RS-only models (Fig. S3g, h). Raising the threshold, these also showed a gain in accuracy of 307 10-15%. Unfortunately, this led to a significant reduction in the number of samples considered. 308 There is a tradeoff between number of predictions and accuracy, which must be balanced for 309 each application to ensure suitable power for analysis. To better understand the development of the senescent phenotype and how nuclear 315 morphology changes over time, we analyzed fibroblasts induced to senescence by 10 Gys of IR 316 and imaged at several time points, including 10, 17, 24, and 31 days. The predictor identifies 317 senescence at all four times points with probability that increases from days 10 to 17 but 318 declines by day 31 (Fig. S4a). Interestingly, examining the probability distribution of the 319 predictor it was apparent that a growing peak of non-senescent cells appear after day 17, either 320 suggesting that the predictor is unable to accurately predict those cells or that a small number of 321 cells may have escaped senescence and are eventually overgrowing the non-senescent cells 322 (Fig. S4b). Indeed, when investigating markers of proliferation, we see that over the time 323 course, PCNA declines until day 17 after which the expression starts to return (Fig. S4c). p21 324 follows an inverse pattern with stain intensity increasing initially and then declining slightly by 325 day 31 (Fig. S4d). We also saw a decrease in DAPI intensity for days 10 and 17, indicating 326 senescence, but a reversion to control level by day 31 (Fig. S4e). To confirm that the predictor 327 accurately determined senescence even 31 days after IR, we evaluated if markers of 328 proliferation and senescence correlated with predicted senescence. Accordingly, cells with 329 predicted senescence had higher p21 levels, lower PCNA and lower DAPI intensities and vice 330 versa ( Fig. S4f, g, h). Morphologically, area and aspect are higher for predicted senescence 331 while convexity is lower (Fig. S4i, j, k). Finally, a simple nuclei count confirms growth, following 332 IR treatment (Fig. S4l). Overall, the senescence predictor captures the state during 333 development in agreement with multiple markers and morphological signs. damage markers gH2AX and 53BP1 30,31 . We characterized the DNA damage foci for our cell 341 lines and investigated how these foci relate to predicted senescence. Our base data set 342 including control, RS, and IR lines were examined for damage foci. Using high content 343 microscopy, we counted DNA damage foci per nuclei and found the mean count of gH2AX and 344 53BP1 foci to be below 1 each (0.9 and 0.6, respectively) for controls, while RS had 4.0 gH2AX 345 and 2.0 53BP1 foci and IR had 3.4 Hg2AX and 3.0 53BP1 foci ( Fig. 3a, b, S5a). To study how 346 the presence of damage foci relates to predicted senescence, we calculated the Pearson 347 Correlation between predicted senescence and gH2AX and 53BP1 foci counts. We found that 348 across all conditions there is a moderately strong correlation of around 0.5 (Fig. 3c). This 349 association is also visible when simply plotting foci counts and senescence prediction which 350 shows predicted senescence flipping from low to high, along with shifts in foci counts (Fig. S5b). 351 Within senescent subtypes RS and IR, the correlation is slightly weaker, perhaps indicating that 352 the senescent probability score for each subtype has some correlation with foci count. Our 353 feature reduction including masking means that internal nuclear structure was not used in 354 assessment, but it is nonetheless notable that senescence prediction (overall and by subtype) 355 correlates with foci count. We also compared the correlation between predicted senescence and 356 area. Here too, we see a correlation of around 0.5, and slightly weaker for the subtypes. In sum, 357 there is a considerable correlation between foci counts and senescence. 358

Progeria Cell Lines Display Increased Senescence 362 363
Patients suffering from premature aging, or progeria, represent genetically well-defined models 364 to understand the molecular basis of aging 32,33 . To test if cell lines from progeria patients 365 display accelerated aging in culture, we applied the senescent classifier to primary fibroblasts 366 isolated from Hutchinson-Gilford progeria syndrome (HGPS), ataxia telangiectasia (AT) and 367 Cockayne syndrome (CS) (Fig. 3d). Evaluating the area of the nuclei of progeria cells, we found 368 that in general their mean is significantly larger than controls. Notably ataxia-telangiectasia cells 369 have the largest nuclei at 25% higher than controls, while Hutchinson-Gilford progeria and 370 Cockayne syndrome are both 15% higher (Fig. 3e). We also investigated DNA damage foci and 371 observe that most prematurely aged lines have higher gH2AX and 53BP1 foci counts ( Fig. 3f, g, 372 S5c). Further, despite diverse mechanisms, the classifier recognized these cell lines having 373 significantly greater probability of senescence (Fig. 3h). All progeria lines have high mean 374 probability of senescence at 0.7, indicating that the average cell in each group is considered 375 senescent, while controls are below the standard threshold at 0.3. Evaluating SA-β-gal activity, 376 we find 35-60% of nuclei have positivity and overall correlation of 0.5 between predicted 377 senescence and having nearby SA-β-gal (Fig. 3i). When predictions are filtered to higher 378 confidence levels, there is an increase in correlation up to 0.9 (Fig. 3j), indicating high 379 confidence predictions are capturing the senescent state. DAPI intensity also suggests that all 380 progeria lines have higher senescence compared to controls (Fig. 3k). These observations 381 indicate that our classifier may be able to discriminate rates of aging in cultured cells. To broaden the applicability of our classifier we speculated that it might apply to nuclei from 388 other cell lines and species. We therefore evaluated the model on mouse primary astrocytes 389 and neurons treated with IR (Fig. 3l). While astrocytes are known to senesce with cell cycle 390 arrest, post-mitotic neurons also exhibit a senescence-like state 34 . We first compared the nuclei 391 area and found that the IR-treated astrocytes had slightly but significantly larger nuclei than 392 controls while IR-treated neurons had reduced area, unlike other cell types we studied (Fig. 3m,  393 n). Evaluating DNA damage foci, we see that IR treated astrocytes and neurons have 394 substantially higher foci count as expected (Fig. 3o, p, q, r). We next applied the ensemble of 395 deep models and found that the IR treated astrocytes had a 7.7% higher probability of 396 senescence than controls while IR-treated neurons have 6.3% higher probability (Fig. 3s, t). We applied the predictor to H&E-stained liver tissue from C57Bl6 mice at taken at 48, 58, and 402 78 weeks of age. After imaging the tissue sections at 20x, we used a deep learning 403 segmentation model trained on 18 tiles to extract nuclei from 16,187 tiles (Fig. 4a). Our training 404 set included samples of hepatocytes only, and this cell type was primarily selected during 405 automated segmentation. We first analyzed morphological metrics, finding an insignificant 406 increase in nuclear area (Fig. 4b). However, we saw a significant decrease in convexity and 407 increase in aspect ratio, both indicating increased senescence with age ( Fig. 4c, d). Nuclei 408 were evaluated for senescence using the normalized RS-only and IR-only models, of which the 409 RS model indicated increasing senescence with age while the IR model did not (Fig. 4e, f). 410 Using the probability, we calculated the percent of senescent cells, finding ~36% for RS and ~99% for IR. The predictor is trained on DAPI-stained cultured fibroblasts representing a 412 considerable difference in context, it is therefore likely that the algorithm should be tuned to 413 evaluate other data sources. Applying thresholds of 0.6 and 0.9 for RS and IR, respectively, the 414 percent was brought down to roughly 8-10% to match the percent reported, roughly adjusted for 415 difference in age and split between IR and RS 35 . With these thresholds, the percent of 416 senescent cells per mouse increased with age (Fig. 4g, h). To determine the predictor's ability 417 to identify senescent hepatocytes in liver tissue, we also stained tissue sections from the same 418 specimens with DAPI and p21, identified hepatocytes with segmentation, and predicted 419 senescence of those nuclei. We found that the mean predicted senescence per animal for the 420 p21+ cells was significantly higher compared to p21-for both RS and IR models (Fig. 4i, j). 421 Given the differences in human and mouse nuclei as well as between cell types, it is notable 422 that the senescent state can be captured through the relative difference in assessed probability. 423 It therefore appears that our predictor may be able to determine senescence across cell types 424 and species. staining confirmed that each drug treatment led to a senescent state (Fig. 4k). A morphometric 434 analysis showed that all three drug treatments expanded nuclear area, decreased convexity, 435 and increased aspect ratio (Fig. 4l, m, n). Additionally, DAPI intensity decreased significantly for 436 all three treatments, indicating senescence (Fig. 4o). The predictor model (trained on IR and RS methods) recognized senescence in the nuclei treated with doxorubicin but did not detect 438 senescence in treatment with antimycin A or ATV/r (Fig. 4p). We speculate that doxorubicin 439 treatment more closely resembles the DNA damage caused by IR-induced and replicative 440 senescence. To address this limitation of our model, we trained new models for each new type, 441 including doxorubicin-only, Antimycin A-only, and ATV/r-only (Fig. 4q, r, s). In addition, we 442 trained on a broader data set, including IR, RS, doxorubicin, antimycin A, and ATV/r. Tested on 443 validation data held out from training, we find the expanded model can now recognize antimycin 444 A with 66.0% accuracy, ATV/r with 64.3% accuracy, and doxorubicin with 62.3% accuracy, 445 which exceeds performance for each individual predictor (Fig. 4t) To determine if the predictor could be used with human dermis, we analyzed samples from an 454 independent data set 39 , stained with hematoxylin and DAB for p21 (Fig. S6a). Nuclei were 455 detected using image segmentation with U-NET trained on the hematoxylin nuclei, and the 456 predictor generated senescence probability scores for the extracted nuclei. After calibrating the 457 p21 detection threshold to roughly match published rates, we found the mean predicted 458 senescence of p21+ nuclei was 5.9% higher than those without p21 for the RS and IR models 459 (with p=0.005 for both), while other models showed no difference (Fig. 5a, S6b, c, d, e). As the 460 confidence threshold was raised above the standard 0.5, p21+ was clearly separated from p21-. 461 With increasing confidence, the p21+ nuclei generally showed higher predicted probability for 462 IR, while the p21-nuclei showed lower predicted probability for RS. The percent difference between mean p21+ and p21-probability also increased with higher confidence. Notably, all 464 three other models (for doxorubicin, ATV/r, and antimycin-A) showed no separation between 465 p21+ and p21-, indicating that they are picking up on other type-specific aspects of senescence. slide scanner at 20x. We used U-Net to detect nuclei, extracted nuclear regions, and converted 474 the nuclei to the normalized and masked form (Fig. 5b). We first evaluated several 475 morphological metrics, including area, convexity, and aspect ratio. Across age, we see no 476 change in area (Fig. 5c), an insignificant change in convexity (Fig. 5d), and a significant change 477 in aspect ratio (Fig. 5e). We considered that different pathologies could be related to various 478 forms of senescence (senescence caused by diverse mechanisms such as DNA damage, 479 telomere attrition, mitochondrial dysfunction, and so on), so we evaluated multiple senescence 480 predictor models developed here. We found the probability of senescence increases with age of 481 patients for RS but is relatively flat for IR and declines for ATV/r, antimycin-A and doxorubicin 482 (Figures S6f, g, h, i, j). We expect a percent of human dermal nuclei to be senescent, ranging 483 from ~1% in young to ~15% in old 39 , so we selected thresholds to calibrate the model with 0.7 484 for RS and 0.85 for IR, leading to an overall predicted percent of ~6% and showing an age-485 dependent increase in percent of senescence (Fig. 5f, g). Both IR and RS models predict a 486 statistically significant increase with age, while doxorubin, Antimycin-A, and ATV/r appear 487 decoupled from age (Fig. S6k, l, m). We also evaluated the correlation between morphological 488 metrics and predicted senescence and found moderate correlation for several metrics, but RS 489  was more correlated with convexity while IR was more correlated with area and aspect ratio, 490 perhaps indicating morphological aspects of each type of senescence in vivo (Fig. S6n, S7a). 491 Interestingly, we found that area was anti-correlated with both predicted IR and RS, but 492 predicted IR was inverse to aspect ratio. This indicates difference between senescence in 493 culture and in tissue sections and affirms that the IR and RS model are picking up on different 494 aspects of senescence. We considered whether the age-dependent increase in predicted 495 senescence could be related to change in proportions of detected cell types, so we compared 496 the distribution of cell area and aspect ratio for broad age groups, individuals below 40 and 497 those over 60. A shift in cell types should be reflected by a change in these metrics, but we 498 found no noticeable difference in the distribution of these metrics for predicted senescent and 499 non-senescent cells between age groups (Fig. S7b, c). Comparing each group by mean area 500 and aspect ratio of individuals, a t-test shows non-significance (p=0.94 and p=0.51, 501 respectively), indicating that each group has a similar proportion of cell types. in the study. We looked for associations between individuals with diagnosed conditions grouped 511 by ICD-10 chapters and predicted senescence above or below the age-dependent mean (those 512 above or below the trendline in Fig. 5f, g and S6f-m specifically using residuals from linear 513 regression of predicted senescence versus age), using the chi-squared test and Fisher's exact 514 test for the frequency of occurrence between the two groups ( Fig. 5h-m). Remarkably, we found a significant correlation between a rate of senescence below the age-matched mean and the 516 presence of ICD-10 Chapter II Neoplasm diagnosis codes for both RS and IR, with p-values of 517 0.002 and 0.005, respectively (Fig. 5n). Narrowing down the analysis we determined the 518 association was based on malignant (versus benign or unknown) codes within ICD-10 Chapter 519 II Neoplasm with IR p-value at 0.018 and RS at 0.058. Notably, RS better represents replicative 520 senescence which occurs naturally with age, while IR better represents DNA damage, although 521 there is considerable overlap in predictions between the two with this model. We also scanned 522 individual ICD-10 clinical codes and found several other conditions associated with senescence, 523 including osteoporosis, osteoarthritis, hypertension, cerebral infarction, hyperlipidemia, 524 hypercholesteremia, and hearing loss, which were all significant when evaluated individually but 525 non-significant when applying multiple test correction, such as the Bonferroni (Fig. 5n, S7d-o,  526 S8a). All of these conditions were associated with higher levels of predicted dermal senescence 527 except for cancer and hearing loss, which were associated with lower levels of predicted 528 senescence. They draw from different models, for example neoplasms are particularly 529 significant with RS, while hypertension only appear in the Antimycin-A model. Overall, we found 530 that high assessed senescence corresponds to fewer neoplasms and malignancies, while also 531 indicating increased frequency of osteoporosis, osteoarthritis, hypertension, and other 532 conditions. 533 534 535 Discussion 537 538 In this paper we present a neural network classifier that can predict cellular senescence based 539 on nuclear morphology. Trained on fibroblasts maintained in cell culture, the classifier achieves 540 very accurate results, which was confirmed by applying it to independent cell lines. We also 541 trained models to correctly distinguish between senescence caused by radiation induced 542 damage and replicative exhaustion. By training additional models on samples with reduced 543 features, we infer that the shape of the nucleus alone provides a significant signal to indicate 544 senescent state. DAPI-stained nuclei with background removed, size normalized, and internal 545 structure masked are still classified with high accuracy. These feature reduction methods serve 546 a secondary purpose, making a model robust to technical variation -our neural network trained 547 on reduced samples can make predictions on nuclei that were prepared in other experimental 548 and imaging contexts. Indeed, the predictor distinguished senescent astrocytes and neurons, 549 predicted an age-related increase in senescent liver cells, and confirmed senescence in cell 550 lines from patients suffering from premature aging. Although it is still debated if universal 551 markers of senescence exist, our findings suggest that at least morphological alterations in 552 nuclei may be common across some tissues and species. 553 554 We present several predictor models, including those that combine IR, RS and other methods, 555 and those that specialize on each for improved accuracy. The base model trained on IR and RS 556 can identify either type along with senescence induced by doxorubicin, indicating that the 557 predictor has identified features found in multiple types related to DNA damage. Our base 558 model did not accurately identify ATV/r and antimycin A, but a new model trained on all five 559 methods could accurately identify senescence induced by these diverse mechanisms. The unified model could be identifying a common signature or simply recognizing multiple 561 phenotypes. 562 563 Our data shows that individuals with a predicted higher rate of senescent cells have reduced 564 neoplasms and malignant cancer, in comparison to those with a lower rate of senescence. 565 This is highly consistent with the notion that senescence is a likely mechanism to control cancer 566 development by limiting uncontrolled proliferation 18 . Further, premalignant tumors express 567 markers of senescence, which are absent in malignancies, and malignant tumors can regress 568 and undergo senescence by switching off oncogenes 17 , supporting the protective role of 569 senescence in blocking the progression of neoplasms to malignancies. In addition, loss of 570 central senescence inducers such as p16 is very common in many cancer types 40 . Of note, 571 there is also evidence suggesting that cellular senescence promotes malignancy through the 572 inflammatory senescence associated secretory phenotype or SASP 41 , that senescent cells may 573 appear in areas where tumors tend to subsequently develop 42 , and that senescent cells and 574 SASP induced by cancer treatment led to worse survival and healthspan 43 . While the role of 575 senescence in cancer is highly complex, our results based on clinical data support the overall 576 protective role for senescence in human health with regards to cancer. We also found several 577 other conditions often associated with senescence, including osteoporosis, osteoarthritis, 578 hypertension, cerebral infarction, hyperlipidemia, and hypercholesteremia, which appear more 579 frequently in individuals with a higher predicted rate of senescence. 580

581
We also investigated how our deep learning predictor results correspond to other measures of 582 senescence. Nuclear area is known to expand during senescence 13,44,45 , and we confirmed this 583 fact in our cell culture data set, with significant differences in IR and RS senescent cells. On a 584 per nuclei basis, we found a moderate correlation between area and predicted senescence. 585 However, due to our size normalization, it is unlikely this classic feature is the primary signal for our deep learning model (at least for the size-normalized version). We also identified convexity 587 and aspect ratio as key morphological properties that differ between control and senescent cells 588 in culture and found moderate correlation between each of these properties and predicted 589 senescence. Interestingly, we found no increase in area with age in the human dermis, but a 590 significant increase in aspect ratio and significant decrease in convexity, indicating nuclei 591 becoming stretched and irregular with advancing age in humans. These observations confirm 592 that size normalization is necessary to generalize our neural network classifier. It also 593 demonstrates the value of our feature-neutral approach, where the neural network is trained to 594 identify senescence from rich image data, and it is later reduced through feature removal. 595

596
In sum, our deep neural network model is capable of accurately predicting the senescent state 597 and type from nuclear morphology using several imaging techniques and has been 598 demonstrated with several diverse applications. We applied the predictor to human skin 599 samples and observed an age-dependent increase in senescence. Remarkably, individuals who 600 appear to have higher rates of senescent cells show reduced incidence of malignant 601 neoplasms. This supports the long-standing hypothesis that senescence is a mechanism to limit 602 cancer. Further, we find association between higher predicted senescent cell burden and other 603 conditions, including osteoporosis, osteoarthritis, hypertension, cerebral infarction, 604 hyperlipidemia, and hypercholesteremia. 605 Induction of cellular senescence by ionizing radiation, doxorubicin, antimycin A and 630 atazanavir/ritonavir was performed according to 46 . Briefly, control fibroblast cells at early passages were seeded in 96 well plates (Corning, 3340) in a density of 2 000 cells per well. Day 632 after cells either were exposed to 10Gy of ionizing radiation or treated with 250 nM doxorubicin 633 for 24h and cultured for the next nine days. Medium was replaced every two days. Three days 634 before radiated or doxorubicin-treated cells reached senescence state, fibroblast cells from the 635 same stock were seeded (2 000 cells/well) as mock-irradiated or DMSO-treated controls. 636 Mitochondrial dysfunction-induced senescence was achieved by treating control fibroblast cells 637 with 250 nM antimycin A every two days within ten days. 25 μM atazanavir/ritonavir was given 638 to control fibroblast cells every two days within fourteen days to develop senescence 639 phenotype. Corresponding DMSO-treated controls were cultured in parallel and seeded in 96-640 well plate three days before terminating the experiment.. Several images were selected arbitrarily from each group for a total of ~20 samples, and all 700 nuclei in the training samples were annotated by selecting the nuclear region. U-NET, a 23-layer 701 fully convolutional network for image segmentation, was trained using the samples, learning to 702 associate the DAPI images with annotation masks indicating nuclear regions. Our 703 implementation of U-NET is largely based on the original U-NET 47 , but includes a dropout layer 704 after each of the convolutional and deconvolutional layers to reduce overfitting. After training for 705 1000 epochs, the U-NET model was used to detect nuclei for all 4796 tiles (1199 images x 4 706 tiles/image), producing output images of predicted nuclei regions. The images with predicted 707 nuclei were scanned for recognition regions of area between 500 and 15,000 pixels. Each detected nucleus was extracted along with its surrounding context as a centered 128x128 pixel 709 region and used to assemble a base library of 95,152 nuclei. In addition, the recognition region 710 itself was cutout, providing a two-color reduction of the detected nuclei, and assembled into a 711 secondary library of nuclei masks. 712 713

Nuclear Morphology 714
An analysis of the nuclei was performed to assess morphological properties. The two-color 715 mask library was used, since it provided a universal representation of the detected nuclei (with 716 U-NET detector models that have good coverage of the nuclei region). Nuclear morphology was 717 assessed using several metrics, including area, perimeter, moments, convexity, and aspect 718 ratio. Convexity is the ratio of perimeter to convex hull perimeter, which provides a size-neutral 719 measure of boundary regularity. The convex hull is a polygon that connects the outer edges of 720 nuclei like an envelope. 721 722

Senescent Classification 723
After assembling a library of senescent cells, a deep neural network was trained to classify 724 DAPI-stained nuclei as senescent or non-senescent. The training set was based on several cell 725 lines GM22159, GM03349, GM05757, while additional cell lines GM22222 and AG08498 were 726 used for testing. Training samples were randomized and split into 80% for training and 20% for 727 validation. Due to experimental setup, the sample classes are unbalanced, with 75.2% control, 728 11.2% RS, and 13.6% IR. The samples were balanced during training by applying class weights 729 with inverse proportion to the class abundance (for example, senescent samples composed of 730 IR and RS were fewer in number and therefore valued 3x higher than controls). Image samples 731 were normalized for brightness/intensity by adjusting each image's mean intensity to 0 and 732 standard deviation to 1. Augmentation was also applied during training, randomly modifying 733 samples: adjusting size from 80% to 120%, changing normalized brightness from 70% to 130%, flipping horizontally and vertically, and rotating up to 180 degrees. For each epoch, one 735 augmentation cycle was performed. Training was done with Xception, a 48-layer model, 736 initialized with ImageNet weights but set to allow weight adjustment of all layers during training. 737 The top layer was replaced by a layer of one-hot nodes to indicate the state as controls or 738 senescent (or as a tri-state model with controls, IR, or RS to indicate the type of senescence). 739 With this minor adjustment, the model provided 37,640,234 trainable parameters. Training was 740 done using Adam with the learning rate set to 1x10 -4 for 10 epochs, in which time accuracy 741 rapidly converged to a steady level. In addition, a simpler custom model was tested, with three 742 convolutional layers with ReLU activation and two dense layers with L1/L2 regularization of 743 0.05/0.05 and 30% dropout. This model required 713,296 parameters. For both network 744 designs, we trained with raw images along with several modified image sets, where the 745 background was removed, the nuclei were size normalized, and the inner details of nuclei were 746 entirely masked (Fig. 1A). All three techniques were based on the detected nuclei. To remove 747 the background, the area outside of the nuclei was set to 0. Size was normalized by rescaling all 748 nuclei so the larger of the two dimensions was a standard size of 80 pixels. Finally, the size-749 normalized detection region was used for the masked nuclei set. 750 751

Bayesian Neural Network 752
We used Tensorflow Probability to create a Bayesian neural network (BNN). We first converted 753 the simple custom model, replacing nodes with the comparable FlipOut version 28 , which 754 assumes that the kernel and bias are drawn from a normal distribution. During a forward pass, 755 kernels and biases are sampled from posterior distribution. Targets were encoded as above, 756 and the loss function used was cross entropy plus KL divergence divided by number of batches. 757 We also partially converted Xception to a BNN by replacing all dense and convolutional layers 758 to FlipOut nodes, leaving separable convolutions unconverted since a FlipOut version was not 759 available. In addition, we fully converted InceptionV3 for evaluation. Inference was done by evaluating the model 20 times to produce a distribution of predictions, and then taking the mean 761 probability for each sample. All comparisons with between groups of samples were made using one-way ANOVA f-tests to 775 evaluate differences in the means, followed by pair-wise tests using Tukey's HSD (Honest 776 Significant Difference) to calculate p-values between groups. Linear regression methods were 777 evaluated with R and p-value statistics. Groups of patients were compared using the chi-778 squared test and Fisher test to detect significant differences between frequencies. Correlation 779 was evaluated using the Pearson colocalization coefficient. 780 781 Pathology sample selection 782 The individuals were sampled from patients for whom samples of naevi on non-sun exposed 783 skin had undergone pathology without malignant findings at a major pathology department in 784 Copenhagen. The patient sample was selected to have flat distribution of age. We