In this study, we propose an expanded concept of HKGs that goes beyond their traditional definition of ubiquitous expression, emphasizing their vital role in maintaining phenotypes during tissue differentiation. Our findings indicate that the proximity of retroelements, especially Alu elements, to TSSs significantly influences gene expression stability. This finding underscores the importance of epigenetic mechanisms, such as DNA methylation and chromatin organisation, in stabilising HKG expression32,37,38. The observed variability in transcript quantification, especially in CpG-island genes associated with Alu elements, highlights the complexity of their regulation and the significant impact of retroelement proximity on gene expression stability. By incorporating these elements into the HKG definition, we offer a more comprehensive understanding of their regulatory roles, emphasising their contribution to maintaining transcriptome balance and ensuring consistent phenotypic outcomes in progeny cells during differentiation 26,37.
Our findings indicate that the architecture surrounding TSSs, particularly within CpG islands, is crucial for modulating HKG expression patterns. The central regions of CpG islands are protected from methylation-induced deamination, thus preserving the essential cellular functions of HKGs39,40. Conversely, the margins of CpG islands are susceptible to methylation, largely due to the presence of densely methylated retroelements near genes26,37,41. In embryonic stem cells, the presence of Alu elements adjacent to CpG islands may help maintain an open chromatin state and hypomethylation, facilitating high gene expression15,42. During tissue differentiation, tissue-specific genes become strongly upregulated, leading to subtle downregulation of numerous HKGs, which is vital for maintaining transcriptome balance. Alu elements not only produce suppressive Alu RNAs that bind to downregulated genes but also influence nearby methylation processes35,36,43,44. Given the cell-division-dependent nature of methylation changes, retroelement-associated methylation patterns may spread into nearby downregulated HKGs, stabilising phenotypic traits in daughter cells, preventing dedifferentiation and promoting robust tissue-specific terminal differentiation 26,28,37,45. This strategic arrangement of retroelements is essential for regulating genome-wide methylation and facilitating advanced tissue differentiation in multicellular organisms.
The HKG identification system shows potential as a marker for distinguishing various stages of differentiation in primary tissues. Although many markers have been suggested for identifying stem cells in primary tissues, their application in actual tissue differentiation has proven challenging46. The concentration of stem cell markers, which are produced early in differentiation, decreases due to asymmetric cell division, and stem cells exhibit various phenotypes46,47. However, epigenetic factors are inherited by progeny cells, maintaining epigenotypes and synchronising gene expression during differentiation27,48. Insufficient differentiation, often due to factors like inflammation or the recruitment of numerous stem cells, may result in an increased number of undifferentiated cells. In previous studies, cells with epigenetic instability were identified in the background mucosa of gastric cancer patients33,34. Recognising these unstable cells can help predict disease occurrence.
Our results demonstrate that CpG-island genes exhibit distinct expression patterns across different tissues, with particularly high expression levels in embryonic stem cells and significant downregulation in differentiated tissues such as the liver, whole blood and pancreas. In fibroblast cell lines, a marked increase in the expression of a particular CpG-island gene group, including fibronectin, suggests that these cells require destabilisation of CpG-island genes for in vitro expansion and trait transformation. Furthermore, compared to embryonic stem cells, germline tissues such as the ovary and testis, which undergo genome-wide hypomethylation, showed increased expression of Alu-adjacent CpG-island genes. These findings imply that the HKG identification system can be used to infer the stages of tissue differentiation. Moreover, our study addresses the issues with recent HKG identification approaches by proposing a unified framework that incorporates gene architecture and epigenetic regulation1. This approach minimises variations in interpretations among researchers and provides a deeper understanding of gene expression dynamics across different stages of cellular differentiation6–8. These insights have potential implications for identifying accurate biomarkers and therapeutic targets in developmental biology and disease treatment.
In this study, many of 56 tissues, ranging from embryonic stem cells to differentiated tissues, showed a gradual decrease in CpG-island gene expression, with variations observed based on the distribution of Alu retroelements. However, there are inherent limitations in directly analysing these patterns for tissue classification. Tissues contain a mixture of cell types with differing levels of differentiation. Additionally, DNA methylation changes after the formation of active chromatin under the influence of the tissue microenvironment45. This means that even CpG-island genes in active chromatin may exhibit strong gene expression and be less influenced by methylation49,50. These factors must be considered when analysing tissue-specific HKG stability. Future studies should employ refined experiments that consider tissue regions, single-cell levels, and differentiation stages to accurately infer HKG stability patterns in tissues.
While numerous studies have documented tissue-type-dependent methylation patterns, the specific effects of retroelements adjacent to genes, particularly at the margins of CpG island, remains underexplored29,31,51. The variability at these margins, often compounded by low GC content, poses significant obstacles to experimental reproducibility33,34,52. Furthermore, the epigenetic heterogeneity across different cells necessitates the use of small tissue samples to accurately detect subtle methylation changes53,54. Building on prior research, which involved a meticulous selection of methylation sites and stringent experimental conditions to reduce PCR amplification bias, foundational insights into genome-wide methylation changes were gained13,53–55. Notably, previous analyses revealed that age-related concurrent methylation changes in CpG-island genes are closely correlated with the type and proximity of adjacent retroelements 32–34. This work provides a deeper understanding of the complex interplay between retroelements and gene expression. The current analysis further confirmed the significant influence of retroelement distribution on expression patterns and highlighted its implications for understanding the complexities of genomic regulation across different tissue types.
We observed weak associations between L1 and LTR elements and the properties of HKGs. Notably, L1 elements, predominantly found in genes lacking CpG islands, appear to play a specialised role in modulating gene expression, similar to their involvement in X-chromosome dosage compensation, where they contribute to the repression of neighbouring gene expression56. During early cell differentiation, tissue-specific genes without CpG islands exhibited strong expression, facilitated by chromatin remodelling, and they were unaffected by L1-related methylation due to their low CpG density57. Additionally, L1 methylation may downregulate other genes weakly expressed within heterochromatin regions. LTR elements, which are prevalent throughout the genome and especially common in facultative heterochromatin37, display distinct differences in activity between species; they remain largely fossilised and inactive in humans but are actively expressed in mice, suggesting that LTRs have evolved divergent roles in genomic regulation across different mammalian lineages18,37,58. Previous studies demonstrated that LTRs are associated with slow methylation changes in CpG-island genes32–34, emphasising their pivotal role in long-term gene expression regulation.
Our investigation faced several constraints, notably the presence of alternative splicing variants1, which introduce variability into TSSs and affect associated retroelement distributions. The RNA sequencing libraries we used lacked the detailed information needed to effectively differentiate these variants, complicating our analysis of gene expression patterns. Furthermore, we did not extensively explore the sequence characteristics at the TSSs used to identify HKGs in this study. HKGs are reported to exhibit a broader range of sequence variations than previously understood, including variations in some promoter binding sites16. Due to the complexity of the data and the multitude of variables present, we prioritised retroelements as our primary focus. Preliminary observations suggested that shorter CpG islands might correlate with HKG attributes, but we did not thoroughly analyse that relationship. The proximity of retroelement may reduce CpG-island length, potentially skewing results. Future research should focus on enhancing the resolution of variant data in RNA sequencing libraries and expanding the analysis of sequence variations and CpG island characteristics for HKG identification.
Our research extends beyond the traditional view of retroelements as merely parasitic DNA, underscoring their pivotal role in the evolution of eukaryotic genomes. Whereas prior studies focused on their roles in gene silencing and as alternative promoters59–61, our findings highlight the profound influence of these elements, particularly Alu, during cellular differentiation. Comparative genomics between humans and mice reveals remarkable differences: Alu elements constitute approximately 10.7% of the human genome and form suppressive RNAs that bind to RNA polymerase, whereas in mice, SINE B2 elements constitute only 2.4% of the genome37,58. In humans, Alu elements are densely positioned around the TSSs of CpG-island genes, enhancing their regulatory effects on gene expression13,24. The higher prevalence of Alu elements in humans may confer evolutionary advantages, facilitating adaptability during cell differentiation, crucial for effective stem cell replacement and longevity37,62,63. These insights highlight the specialised regulatory functions of Alu elements, distinguishing the human genomic architecture from that of other species and reinforcing their essential role in promoting longevity through sustained cellular differentiation.
Maintaining phenotypic stability is crucial for multicellular organisms, and our study expands the concept of HKGs beyond their traditional definition of ubiquitous expression. We demonstrate that Alu elements, which are dominant in the human genome, play an essential role in regulating HKG expression associated with CpG islands during cellular differentiation. Notably, retroelement distributions that exceed critical thresholds at specific gene locations can induce significant changes in gene expression. This suggests that repetitive sequences, often seen as parasitic DNA, actually function as symbiotic components essential for precise tissue differentiation. By re-evaluating retroelements as key regulatory mechanisms, our findings highlight their pivotal role in maintaining transcriptome balance, influencing phenotypic stability and advancing our understanding of genomic architecture in evolutionary biology and development.