The study cohort
A total of 182 extremely preterm infants born before 28 weeks of gestation from the Mega Donna Mega study were selected based on the availability of longitudinal serum samples (figure. 1a)19. Altogether 105 boys and 77 girls were included. Serum samples were collected repeatedly at nine planned time points (visits) from birth to term-equivalent age. Forty (22%) infants had nine complete samples, and 165 (91%) infants had at least six samples. Among the 182 infants included in the study, 177 (97.3%) survived to 40 weeks PMA. The enrolled infants were classified into three groups depending on GA at birth: group 1, born at less than 25 + 0 (weeks + days) (N = 61); group 2, born at 25 + 0 to 26 + 6 (weeks + days) (N = 81); and group 3, born at 27 + 0 to 27 + 6 (weeks + days) of gestation (N = 40) (Fig. 1b). The birth weights varied from 425 to 1345 grams (Supplementary Fig. 1a). Almost all (98.4%) of the infants had a birthweight appropriate for gestational age with standard deviations (SDS) > -2 (Supplementary Fig. 1b)20. Vaginal delivery was more common among infants born at lower gestational age (Mann–Whitney U-test, P = 0.017). We observed no significant differences in sex distribution or postnatal growth between GA groups (Fig. 1b, Supplementary Fig. 1b-d). The clinical characteristics of the three GA groups were summarized in Supplementary Data 1.
Proteome profiling demonstrated dynamic changes in blood proteins after birth
We analyzed 538 unique protein targets measured by six Olink PEA panels, including cardiometabolic, cardiovascular II and III, development, inflammation, and metabolism, for all 1335 collected blood serum samples (Supplementary Data 2). Protein levels measured as Normalized Protein Expression (NPX) were determined for each target and sample. An example of the protein expression determined by the Olink PEA technique can be seen in Figs. 2a, where the levels of fibroblast growth factor 21 (FGF-21) is shown from birth to full-term age (PMW 40). FGF-21 is a protein involved in metabolism and growth by regulating insulin sensitivity and glucose uptake21. We observed higher levels with increasing postnatal age of the infant (Fig. 2a), which is consistent with our previous report that FGF-21 serum levels were elevated after birth9. No differences in FGF-21 expression between sexes could be seen.
To explore variance of protein levels, inter-individual and intra-individual variations were calculated for each protein across all 182 infants and nine visits (Fig. 2b, Supplementary Data 3). Most of the proteins were observed with considerable variability in both, and FGF-21 was the most variable protein in the analysis. To get a comprehensive overview of the postnatal changes in the blood proteins, we analyzed the time-course expression patterns of the variable blood proteins from birth to 40 weeks PMA. Over time, differentially expressed proteins (451 out of 538) were identified using linear mixed-effect modeling with Benjamini-Hochberg adjusted p-value < 0.01, including 196 up-regulated proteins and 255 down-regulated proteins across the study visits (Fig. 2c). The longitudinal changes in protein expressions for each differentially expressed protein can be seen in the circular heatmap in Fig. 2d. Unsupervised hierarchical clustering analysis was further performed on the longitudinal expression profiles of the proteins based on Pearson correlation. A total of eight separate clusters, ranging in size from 34 to 84 proteins, were identified with variable time-course patterns (Fig. 2d and 2e, Supplementary Data 4). As seen in Fig. 2e and supplementary Fig. 2a and 2b, five clusters display overall declining protein levels and three increasing trends. The proteins most strongly changed over time included leptin (LEP), LDL-receptor, and several placenta elevated proteins, including Fc fragment of IgG receptor IIa (FCGR2A) and CGA (supplementary Fig. 2c). The effect of GA at birth on the clustering trends was further explored. As seen in Supplementary figures S3a and S3b, protein levels in the three GA-groups were almost the same at the birth, indicating similar expression patterns of proteins in different GA groups at birth.
Functional analysis of the protein clusters
The functions of the proteins in each of the eight identified clusters were explored. Tissue specificity of the proteins in the clusters was analyzed based on the Human Protein Atlas (HPA) classification22,23. This classification, elsewhere described, considers the level of gene expression in each tissue to determine the degree of specificity. Of the 451 proteins in the eight clusters, 101 proteins (22%), were annotated as tissue enriched proteins according to the HPA classification (Fig. 3a and Supplementary Data 5). The analysis showed that most of the liver, lymphoid tissue, and salivary gland enriched proteins were increased after birth, indicating the development of hepatic functions and immune and metabolic shifts during the neonatal period. Two examples include carboxylesterase 1 (CES1), a primary liver enzyme that functions in liver drug clearance24, and Fc fragment of IgE receptor II (FCER2), which has essential roles in B cell growth and differentiation, as well as the regulation of IgE production (Fig. 3b)25. Many proteins that decreased after preterm birth were associated with the placenta, pancreas, and bone marrow, consistent with our previous findings (Fig. 3a)9. For example, Carboxypeptidase A1 (CPA1), a pancreas-enriched protein, is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary protein26. Hepatocyte growth factor (HGF) is an acidic protein with a strong mitogenic effect on hepatocytes. Still, it is also enriched in the placenta with strong expression in the villous syncytium, extravillous trophoblast, and amnionic epithelium (Fig. 3b)27,28.
To explore the postnatal development of the immune system in preterm infants, we investigated the immune cell specificity of the proteins with longitudinal dynamic changes. The cellular specificity is determined based on the gene expression levels in the 18 different immune cell types from the HPA (Fig. 3c and 3d)23. Here, 43 (9.5%) proteins were annotated as immune cell type enriched according to the HPA classification, which were at least four-fold higher expressed in one cell type than all other cell types (Supplementary Data 6). A large fraction of the proteins in clusters 6, 7, and 8 with increasing levels after birth are enriched in plasmacytoid dendritic cells. In contrast, most proteins with decreasing trends (clusters 1, 2, and 5) are enriched in basophils or neutrophils. The proteins in cluster 3 with an increasing trend followed by decreased protein levels have a mixed immune cell origin, including proteins enriched in T-cells, eosinophils, neutrophils, and basophils. This suggests that the expression activity of the dendritic cells increases after birth in these preterm infants, while the expression of proteins from basophils and neutrophils decreases.
In addition, functional enrichment analyses were performed to explore the modulated pathways for each identified cluster (Supplementary Fig. 4 and Supplementary Data 7a and 7b). As expected, multiple immune-related pathways were activated after preterm birth, including the T cell activation, inflammatory response pathway, interleukin-17 signaling pathway, and hematopoiesis pathway. Interestingly, the receptor for the advanced glycation end products (RAGE) pathway, which plays a vital role in leukocyte recruitment and have relatively high blood levels in extremely premature infants29, was deactivated during postnatal development.
Variance analysis of blood protein profiling after birth
To assess how clinical aspects affect the blood protein expression levels, we established a linear regression model for each protein target and included four factors: PNA, GA at birth, sex, and mode of delivery (see Fig. 4a and 4b). The model revealed that the protein expression variations were primarily associated with PNA. This indicates that regarding protein variation, postnatal time is dominant compared to gestational age at birth. Nonetheless, GA at birth is the second most explanatory factor. Sex and delivery mode impacted a few specific proteins; however, overall substantially less influential compared to postnatal age. The contribution of the predictor variables was summarized in Supplementary Data 8. Moreover, to further expand the evaluation of impact of PNA on blood proteome expression, we investigate the potential of blood proteome as a predictor for PNA. We employed generalized linear models with an elastic-net penalty and identified a ‘blood proteomic clock’ comprising 151 proteins (Supplementary Data 9). The predictive PNA had a high level of consistency with chronological age, as demonstrated by a Pearson correlation coefficient of 0.98 (Fig. 4c). This suggests that the blood proteins are reliable measurements for estimating PNA of preterm infants. Interestingly, it was observed that among the samples that were predicted to be younger than the actual PNA, the weight gain tended to be lower than normal (Supplementary Fig. 5).
The proteins with the highest effect within each analyzed factor have been highlighted in Fig. 4d-g. The influence of PNA was most prominent in the delta-like non-canonical Notch ligand 1 (DLK-1), with 75.1% of the serum protein level variance explained by PNA (Fig. 4d). DLK-1, also known as preadipocyte factor 1 (PREF1), is a marker of preadipocytes and inhibits adipogenesis30. It has been proposed that its function is to shift metabolism from lipid storage to peripheral lipid oxidation and act as a mediator of metabolic adaptation in early life31. DLK-1 levels were constant up to 30 weeks PMA, whereas after they decreased considerably, as shown in Fig. 4d.
As mentioned above, PNA seems more important than GA at birth in determining protein variance; however, for some proteins, variance is more associated with the GA at birth (seen to the right in Fig. 4a). One example was the tissue factor pathway inhibitor (TFPI), the primary inhibitor of the extrinsic coagulation pathway32. As illustrated in Fig. 4e, apparent differences in protein expression for TFP1 were observed between the three GA groups. The infants born at younger gestational ages had a persisting higher expression until full-term, when the expression levels converged. Furthermore, we show that the glycoprotein hormones alpha chain (CGA) levels decreased rapidly during the first days after birth and with a more significant decline in males, resulting in longitudinally lower levels in the male infants (Fig. 4f). CGA is one of the subunits that form the hormones human chorionic gonadotropin (hCG), luteinizing hormone (LH), follicle-stimulating hormone (FSH), and thyroid-stimulating hormone (TSH)33. Several proteins related to the mode of delivery were identified (Fig. 4a). The strongest association was seen for surfactant protein D (PSP-D, also called SP-D). An elevated level of PSP-D was observed in infants delivered by cesarean section (Fig. 4g).
Distinct and coherent evolution of blood protein profiles over time
To investigate the global molecular dynamics of preterm infants, we performed several dimensional reduction analyses, including Uniform Manifold Approximation and Projection (UMAP),, based on the longitudinal protein expression profiles of all 182 infants (Fig. 5a, Supplementary Fig. 6a and 6b). Visualizing all 1335 samples the UMAP results revealed a distinct and stereotypic evolution of blood protein profiles from birth to term-equivalent age. The majority of the infants' proteome followed a predestined pathway, regardless of sex and mode of delivery (Supplementary Fig. 7a and 7b). Correspondingly, the results of principal component analysis (PCA) (Supplementary Fig. 8a) and the related diffusion map (Supplementary Fig. 8b) both demonstrated similar results with the samples following a clear pattern based on time since birth.
Interestingly, the protein profile trajectory was most coherent right at birth and at full term (Fig. 5b). Moreover, the most pronounced diversity in protein expression was observed at one week postnatal age. This suggests that the infants start life with similar protein profiles, followed by an interval where internal or external factors might be more influential before most infants converge their protein expressions again.
To examine the observed protein evolution in relation to neonatal immaturity, the UMAP result was investigated based on GA at birth as a proxy for fetal maturation. As seen in Fig. 5c, GA at delivery seemed to be of minor importance for protein expression at birth, as no distinction between GA-groups can be seen on the first day of life. However, the degree of immaturity plays an increasing part in the differentiation, with a peak around one to two weeks after birth where the samples are clearly separated depending on GA at birth. Further on, the profiles converge once again as the infants grow older. From PMA 30 weeks, and especially 32 and persistently to 40 weeks PMA, no separation can be seen between GA-groups. In addition, comparing the growth trajectories of infants in three GA groups, it was observed that those with smaller GA tended to have a slower rate of growth (Fig. 5d).
Gestational effects on blood protein profiling
The importance of GA on protein expression was additionally illustrated in Fig. 6a, where the numbers of proteins affected by the factors GA, sex, and delivery mode are presented per sampling time point. Consistent with Fig. 5c, GA at birth is most influential at one-week of PNA, with a drastic decline at later postnatal ages. To further explore how the GA group differs, the differentially expressed proteins (DEPs) between GA groups on PNA day 7 were analyzed by ANOVA (Supplementary Data 10) and visualized in a volcano plot (Fig. 6b). In total, 86 DEPs were identified, with some examples of proteins with decreased or increased levels in the infants with more advanced GA (Fig. 6c). The top 30 most significant DEPs were further analyzed in the radar plots (Fig. 6d and Supplementary Fig. 9). This analysis revealed the same pattern as seen in Fig. 5b and 6a. The three GA groups have similar protein profiles at birth, diverge into clearly varying trends at one-week PNA but converge at full-term.