Assessment of suHKG Candidates. The two main objectives of this work were i) evaluating the suitability of a group of six classic HKGs acting as HAT suHKGs and ii) identifying genes with a stable, high expression profile that represent new HAT suHKG candidates. Our novel strategy has reviewed the role of HKGs by considering sex, species, and platform as variables in evaluated studies.
We performed our analysis on three different sample groups based on sex and species: female Hsa, male Hsa, and all Mmu samples. We did not analyze Mmu female and male samples separately due to the lack of reported female Mmu samples in the selected studies. HKGs displayed platform-dependent variability under all conditions, given that each microarray platform has its probe design and technical protocol. Previous studies on technology dependence concluded that this factor has less determining power than the differences in transcript expression levels caused by varying cell conditions .
Results exhibit considerable differences in gene stability, including stability differences in the six classical selected HKGs between Hsa female and male samples. PPIA, UBC, and RPL19 displayed high stability levels for samples from both sexes, while HPRT1 and 18S exhibited low stability levels in both sexes. Interestingly, GAPDH displayed high stability in male samples and low stability in female samples. In apparent contradiction, 18s presents high stability levels in Mmu, but this may be explained by the overwhelming presence of male samples in this group and the fact that this gene suffers a significant sex bias in mouse (Figure S6). The common absence of female samples in studies (as further evidenced by our systematic review) could explain the systematic reports of 18s as a stable HKG.
We propose a list of 195 suHKG candidates suitable for use as internal controls in HAT-based gene expression studies including male and female samples; these genes exhibit high expression (TPM > 20) and stability levels and a minimal influence of sex on expression patterns. As we could not reproduce the pipeline followed with human samples in mouse studies due to the lack of female mouse samples, we suggest the orthologs of proposed human suHKGs as mouse suHKGs.
We validated a selection of suHKG candidates experimentally to assess the robustness of our computational findings; overall, our gene expression analysis validated the in silico results (Table 3). PPIA, a widely used HAT HKG, and RPL19, used as an HKG in several cell types [30, 31, 47] and occasionally in HAT studies , have been validated as HAT suHKGs; however, experimental validation demonstrates that 18S, which is widely used as HAT HKG [7, 14, 16, 39, 43–45], displays significant levels of variability in both male and female samples and sex-specific expression patterns (Fig. 6). These results agree with the findings of other recently published studies  and correlate with those found in mouse adipose tissue. The use of 18s as a HKG induces apparent differences in the relative expression levels of several genes in males and females and wild type and Irs2-/- samples (Figure S6); instead, we suggest Rpl19 and Ppia as more optimal suHKGs in mouse adipose tissue analysis.
We identified several additional genes HAT suHKGs from the computational analysis, including RPS18, RPS8, and UBB (Table 3), that present characteristics such as appropriate stable and high expression levels. We also suggest the mouse orthologs of these human suHKGs as mouse suHKGs. To this end, we designed a web tool to customize the best suHKG for human or mouse adipose tissue experimental design.
Strengths and Limitations. Massive data analysis of gene expression represents a pivotal tool for understanding different biological scenarios, which may eventually help elucidate mechanisms affecting basic and biomedical research. Data analyses must be assessed in the laboratory by studying relative gene expression normalized to an adequately chosen HKG. Selection of an ideal HKG remains a challenging process, although this choice will help to ensure an accurate result and must consider all experimental conditions and biological variables. Incorporating sex-based analyses into research will improve reproducibility and experimental efficiency by influencing the outcome of experiments and must be accounted for as a critical biological variable. Sex must be considered to monitor sex-based differences and similarities for all diseases and biological processes that affect both sexes, which may help reduce bias, enable social equality in scientific outcomes, and encourage new opportunities for discovery and innovation, as evidenced by several studies analyzing this issue [20, 22].
Numerous lines of evidence suggest that the current status quo does not address fundamental issues of sex-based differences evident in gene expression. Up to date, many classic HKGs remain unevaluated when including sex as a biological variable; these include those commonly used in HAT studies (e.g., ACTB, GAPDH, and 18S) and additional HKGs such as PPIA, HPRT, RPS18, or RPL19. Using a HKG to normalize samples without assessing their behavior under the specific experimental conditions used in each study (including sex), may lead to a biased outcome. HKGs may remain stable in one sex but not in the other, as in the case of DDX39B and PLIN4 (stable in males) or NDUFB11 and RARA (stable in females), or may have stable yet distinct expression levels in both sexes, such as for 18s in mouse. Ignoring sex and choosing a non-optimal HKG may introduce confounding variables and the inability to assess whether differences in the data derived from the experimental design or the normalization process. This source of variability in the data would reduce statistical power, thereby making it more difficult to find significant results. In this study, we analyzed the role of six conventional HAT HKG considering sex as a variable for the first time.
Many published studies do not include a sex-based perspective by omitting animal sex from reporting of the animals or performing studies with animals of only one sex (typically males). Our systematic review found that 51% of Hsa studies and 49% of Mmu studies failed to include information regarding the sex of samples, with just 19% of Hsa and a striking 2% of Mmu studies including samples from both sexes. Of note, Mmu studies including only female samples represented just 5% of the total. The small number of Mmu studies, including female sample information, represented a significant limitation of the study and prevented the creation of a Mmu meta-ranking to select highly-expressed stable Mmu suHKG candidates as for Hsa. We evaluated the Mmu orthologs of the selected Hsa suHKG candidates experimentally to overcome this limitation, which confirmed their suitability as Mmu suHKGs.
Despite the widespread use of 18S RNA as a HKG, its annotation represents another limiting factor of this study; we failed to encounter this gene in the GTEx platform under any proposed alias from GeneCards. We also noted that identifiers for this gene are unstable or not included in reference assemblies. In addition, the DNA sequence of the RNA18SN5 gene (accession number NR_003286.4) has 99–100% identity with other ribosomal RNAs such as RNA18SN1, RNA18SN2, RNA18SN3, RNA18SN4, and RNA18SP3 (accession numbers NR_145820.1, NR_146146.1, NR_146152.1, NR_146119.1, NG_054871.1, respectively). Furthermore, 18S rRNA has different copy numbers among individuals and varies with age . Considering all these factors, and integrating experimental data assessing differential expression levels according to sex, makes the 18S gene less suitable as a HAT suHKG than other suHKGs proposed in this study.
Other limitations of the study included the filtering and pre-processing of biological information located in the GEO to identify the published studies with transcriptomic data of adipose tissue, and the classification of the samples depending on the sex. A primary limiting factor involved the absence of standardized vocabulary to tag sex in sample records of the studies. Even though the gene expression data in GEO is presented as a standardized expression matrix, the metadata (including sample source, tissue type, or sample sex) is reported through free-text fields written by the researcher submitting the study. The absence of standardized vocabulary and structured information constrains data mining power on large-scale data, and improvements in this regard could aid the processing of data in public repositories .
For the first time, this study presents a computational strategy that includes a massive data analysis capable to assess the sex bias in expression levels of classical and novel HKGs, over a large volume of studies and samples. This strategy revealed that an accurate experimental design for adipose tissue requires the adequate selection of a suHKG, such as PPIA, RPL19, or new options, such as RPS18 or UBB. In that context, we could finally avoid the common practice of pooling males and females or even discard the only male-presence effect. This study presents the relative expression stability of six commonly used HKGs and the variability levels of other genes covered by the analyzed microarray platforms. This same workflow is translatable from adipose tissue to other tissues, simply requiring modifications of the sample source at the advanced search step to collect data from GEO and the SQL queries of GEOmetadb to obtain sample information. This strategy is also aligned with the FAIR principles  (Findability, Accessibility, Interoperability, and Reusability) to ensure the further utility and reproducibility of the generated information.
Although limited to adipose tissue, our findings suggest that the sex bias in commonly used HKGs could appear in other tissues, thereby affecting the normalization process of gene expression analysis of any kind. Incorrect normalization may significantly alter gene expression data, as shown in the case of 18S, and lead to erroneous conclusions. This study highlights the importance of considering sex as a variable in biomedical studies and provides evidence that thorough analyses of HKGs as internal controls in all tissues should be promptly addressed.
Perspectives and Significance
Our results focus on the importance of taking into consideration sex as a biological variable when choosing the best HKG as reference in HAT gene expression analysis. Our novel computational strategy includes massive data analysis capable to assess the sex bias in expression levels of classical and novel HKGs to select sex-unbiased HKG. Conventionally reported HKG genes include several metabolic and ribosomal genes such as GAPDH, HPRT, PPIA, UBC, 18S and RPL19. However, our novel computational strategy based on meta-analysis techniques has proven that certain classical HKGs, like one of the most extended, 18S, may fail to function adequately as the reference gene as it differentially expressed in males and females, while others like PPIA and RPL19, succeeded as reference genes. Further, following selection criteria, several markers, like RPS8 and UBB are also proposed and an open web resource (https://bioinfo.cipf.es/metafun-HKG) offered for customized experimental design.
All these results provide new useful insight in evaluating gene expression analysis in human adipose tissue under several experimental conditions and with biomedical purposes. Using an incorrect HKG may lead to inappropriate results interpretation and applications, while using a suHKG will always provide a better experimental approach, either when taking into consideration male and females as separate groups, either included in the same experimental group but properly analyzed. This study highlights the importance of considering sex as a variable in gene expression analyses in HAT and provides evidence for future extensive tissues suHKG selection to be hopefully, promptly addressed.