Sperm Retrieval Gold Standard Predictors: TXNDC2, PRM1 and PRM2, Joint Molecular Markers Associated With Testis Pathology

The necessity of PRM1/2 in male azoospermia has been approved, but the association of TXNDC2 deciency with the phenotype, sperm retrieval and pathology has not been approved. Here we identied a joint cooperation of TXNDC2 and protamines in evaluating testis pathology and sperm retrieval. Extensive meta-analysis on human arrays of idiopathic, untreated and with unknown cause of azoospermia was done. After several steps of data quality controls, QC passed data were pooled and batch effect corrected. As Redox imbalance has been shown as a two edge sword toward fertility, wet lab studies began with candidated protamination and thioredoxin genes. Logistic regression model of TXNDC2 alongside PRM1 and PRM2 genes was built and collective ROC analysis indicated a sensitivity of 96.8% and specicity of 95.5% with the ROC value of 0.993 (SE=0.0075, 95% CI: 0.978-1.000). In conclusion, TXNDC2, PRM1 and PRM2 in joint, have a robust power to predict sperm retrieval and to correlate with severe pathology of azoospermia.


Introduction
Pathology uses the terms obstructive (OA) and non-obstructive azoospermia (NOA) respectively for normal and abnormal spermatogenesis. Pathology also classi es aberrant spermatogenesis into ve main patterns 1 : seminiferous tubules hyalinezation (SH), Sertoli cell-only syndrome (SCOS), early\late maturation arrest (e\lMA), and hypospermatogenesis (Hypo). We must note that testicular imperfection and seminiferous tubules obstruction status are the ultimate achievement of pathological analysis; however, the prediction of sperm retrieval (SR) could not be concluded solely based on the current approach.
Budget expenses and loss of golden time are two critics to treat azoospermic men, wishing to have their own biological children. Trustworthy, precise molecular markers with strong background toward spermatogenesis could be a booty for the heartbroken parents. To reduce the stress of infertile couples and to create a spark of hope especially for NOA men, we previously introduced KDM3A to PRM1 expression ratio as a reliable molecular indicator of SR 2 . But we could not spectacle any association between aforementioned genes and the pathological features of the biopsies. By scouting gene(s) with dual propensities we can kill two birds with one stone, predicting the success of SR, and also con rming testicular pathology. Having this bonus in our pocket, we can rst join pathology and genetics to double check for SR possibility and foremost, surgeons could be persuaded to explore NOA tissues to extract any residual sperms at the rst round of surgery.
Thioredoxines are intracellular and extracellular scavengers of oxidative stress system, reactive oxygen species (ROS) being one of their main targets, regulation of redox signaling plays pivotal roles in sperm fertility 3 . Thioredoxin domain containing 2 (TXNDC2, ENSG00000168454), is transiently expressed in haploid phase of spermatogenesis and as a sperm-speci c oxidoreductase, is only detected in round and elongating spermatids 45 . TXNDC2/TXNDC3 double inactivation in animal models was performed and impaired chromatin protamination was the output 6 . Protamination as the DNA-safeguard, not only condensates sperm chromatin, but also replaces most of the histones during spermiogenesis and male infertility is conclusively associated with impaired protamination 7 . As we know, protamination is started by the expression of transition protein 1 (TNP1) and is followed by protamines (PRM1 and PRM2) replacement in the nucleus 8 ; Thereafter, mature spermatozoa would be released into the lumen of seminiferous tubules 9 . So it seems that protamines and sperm-speci c thioredoxins are all together important for spermatozoa to start capacitation, bringing these proteins so necessary for male fertility 10 .
In this study, TXNDC3 was not evaluated as it is ubiquitously expressed in all tissues and is no longer considered testis speci c 11 . Considering the localization of TXNDC2 in the nucleus versus extracellular distribution of TXNDC8, the latter was also removed. Therefore, the aim of this study was to evaluate the expression levels of TXNDC2 together with protamination genes in different pathological status of azoospermia. We showed that PRM1 and PRM2, but not TNP1, are excellent indicators of SR. We also showed that TXNDC2 expression levels were in complete agreement with the status of tissue pathologies. Moreover, logistic regression model analysis of TXNDC2, PRM1 and PRM2 genes together was a robust predictor of SR by sensitivity of 96.8% and speci city of 95.5%.

Patients and samples
Azoospermic men were double questioned before and after the operation and samples were ruled out after the operation, if they were not willing to continue their participation in the study. An approximately 50mg of fresh testicular tissues, were submerged immediately under the RNAlater stabilizing reagent (Ambion Life Science, Austin, TE, USA, AM7024) according to the manufacturer instruction. Micro-TESE open surgery team was fully informed as the rst piece of testicular tissue was used for RNA extraction and the next pieces for pathology and SR. Submerged samples were stored at 4 °C for 24 h and then processed for further RNA extraction. Out of 50 samples included in, 40 were diagnosed as nonobstructive and 10 as obstructive-control individuals according to the pathological results. Inclusion and exclusion criteria were as follows: samples with weak RNA integrity, the ones with variable Cqs even after multiple rounds of separate analysis, and without clear pathology were omitted from this study.
Unfortunately, we omitted 9 samples as they were reported with unknown pathology.

Ethics statement
Written informed consents were taken and full explanation was donated to azoospermic men before sampling. The experimentations and consent forms were approved by institutional review board of Isfahan University ethical committee. All procedures performed in the study involving human participants were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

SR technique
Schlegel technique was adopted and an expert surgeon has done all the micro-TESE open surgeries under the microscope in order to lessen the obstruction of testicular vessels 12 . Meticulous sperm processing with initial mechanical dissection of seminiferous tubules was followed by extensive exercise to receive the maximum rate of retrieval 13 .

Histological analysis
Hematoxylin and eosin (H&E) staining of para n embedded tissues was performed according to the standard protocol 14 . Two microscopic slides containing at least 100 different tubular sections for each specimen were examined by a specialist pathologist and the results were reported as follows: (i) N: normal spermatogenesis with all types of spermatogenic cell lineages in sections, (ii) SH: seminiferous tubule hyalinization, (iii) SCOS: sertoli cell-only syndrome or germ cell aplasia, (iv) eMA: early maturation arrest, (v) lMA: late maturation arrest, (vi) Hypo: hypospermatogenesis. Individuals with normal spermatogenesis were considered as obstructive azoospermia (OA) and these were used as control individuals according to the previous reports 15 . Other pathologies with abnormal spermatogenesis were classi ed as non-obstructive azoospermia (NOA).

GEO Meta-Analysis
GEO database was explored with the keyword "azoospermia" for microarray datasets. A rigid inclusionexclusion criteria was applied as follow and a total of 9 datasets corresponding to homo sapiens species were found. Among these datasets those with any treatments and therapies were excluded. Samples with cryptorchidism phenotype and with detected mutations were also excluded. In this regard, GSE145467, GSE45885, GSE9194, GSE108886, GSE9210, GSE14310 were selected. All the candidated datasets were log2 scaled and quantile normalized if necessary. Hierarchical clustering of each dataset was illustrated using Euclidian distance. Principal component analysis (PCA) plot were drawn and outliers were detected and removed. GSE9194 and GSE9210 were excluded, respectively due to low quality and low feature intersection with other datasets. SVA 16 and Limma 17 packages were used to remove batch effects and subsequently, PCA and hierarchical clustering were used again to check the quality of the batch effect removal. Effect size of features was calculated using Limma package with Benjamini-Hochberg correction. We applied p values to determine the corresponding false discovery rates (FDR). Finally, the variation of testis-speci c thioredoxin gene2 (TXNDC2) alongside protamination genes (TNP1, PRM1, PRM2) were recorded. Testis-speci c thioredoxin gene 8 (TXNDC8) was not included in GSE14310 dataset, and meta-analysis was performed on the resting GSE45885 and GSE108886 datasets. R version 4.0.1 was used for meta-analysis. RNA isolation and cDNA synthesis RNA extraction was carried out as reported previously 2 . Nanodrop One (Thermo Scienti c, USA) was used for quanti cation and then, 1 μg of total RNA was treated with DnaseI (Thermo Scienti c, Lithuania; EN0522) according to the manufacturer instruction. TaKaRa PrimerScript II 1 st stand cDNA synthesis kit (TaKaRa, Otsu, Japan; 6210B) was used to random prime the rst strand of cDNA. The qualities of the extracted RNAs were checked by 2% conventional agarose gel electrophoresis stained with ethidium bromide (data not shown).

Reverse transcription quantitative real-time PCR (RT-qPCR)
Primers were adopted for RT-qPCR and their concentration was optimized according to our previous study. SYBR Premix Ex Taq II (TaKaRa; RR820L) was the quantifying dye in Corbett 6000 Rotor-Gene thermocycler (Corbett Life Science, Mortlake, Australia). Equal amounts of cDNA were ampli ed in triplicates and the average cycle of quanti cation (Cq) values were further analyzed.

Melting curve analysis
After the last run of ampli cation, melting curve analysis via green channel was performed according to the manual of Corbett 6000 Rotor-Gene machine. Gradual increase in temperature (1.0 °C/s) was applied from 65 to 95 degrees Celsius and the amount of uorescence emission was recorded continuously. The deviation of uorescence change over temperature on y axis was plotted against the temperature on x axis using Rotor-Gene embedded software 1.7.
Gene expression analysis GAPDH and RPL37 were used simultaneously as reference genes for RT-qPCR data normalization based on our previous nding 2 . REST2009 was used for statistical analysis.

Statistical analysis
Raw mean Cqs were exported to SPSS v.21.0 (IBM Corp., Armonk, NY, USA) and normalization of the data was carried out if necessary. Normalized mean Cqs of the genes were compared between individuals with positive and negative SR using t-Test. A one-way between subjects ANOVA-coupled with Scheffe Post hoc comparison was conducted to visualize the differences of mRNA expression levels between different pathological status. Multiple linear regression approaches were applied to model the relationship between the expression levels of PRM1,PRM2, and TXNDC2. A receiver operating characteristic curve (ROC) predictive model was obtained to demonstrate the predictive ability of the three expressed genes for SR. The area under the curve (AUC) was determined to assess the diagnostic accuracy. In all statistics, p values smaller than 0.05 were considered as signi cant.

Results
Data quality control and pre-processing: The assessment of data normalization revealed that parts of the data was log2 scaled, the resting were transformed and the second round of quality control was carried out to assess the quality of samples quantiles (supplementary Fig. 1). For each dataset, hierarchical cluster analysis of samples, based on Euclidian Distance of Pearson correlation coe cient, grouped similar objects into clusters. It was followed by the process of dimension reduction using the Eigen vector with the highest Eigen value (supplementary Fig. 2). The decision to remove 18 outliers out of 99 samples were based on sophisticated knowledge of biology, in combination with clustering and PCA (supplementary Fig. 3).
Consequently, a total of 71 samples were pooled for further analysis.
Limma and SVA algorithms were applied on the pool to correct their batch. Then after, hierarchical clustering and Principle component analysis were performed, and the outcome brought us the con dence about the correction (Fig. 1).

Meta-Analysis
The gene expression of pooled data with pathological phenotypes of SCOS, pre-meiotic arrest, meiotic arrest, and post-meiotic arrest were evaluated (Fig. 2). Based on the goal of this study, protamination genes (PRM1, PRM2, TNP1) with respect to testis-speci c thioredoxin genes (TXNDC2, TXNDC8) were analyzed and the data were presented in Table I

RT-qPCR data analysis
The mean expression level of GAPDH, RPL37, TXNDC2, PRM1, PRM2, and TNP1 were compared between positive and negative SR (sup. table I). Reference genes GAPDH and RPL37 showed the minimal mean differences between positive and negative SR individuals (0.59, 0.97 respectively). Considering positive SR as the control, high positive mean differences were recorded for TXNDC2, PRM1 and PRM2; but TNP1 showed a negative (-1.52) mean difference. Therefore, TXNDC2 was differentially expressed and in homology, protamination genes, PRM1 and PRM2. Unexpectedly, the expression of TNP1 was overlapping (Fig. 4). To test the signi cance of the observed differences, t-Test was performed on normalized data (TableII). A signi cant differential expression for TXNDC2, PRM1, and PRM2 (p=0.000) were observed between positive and negative SR, but not for TNP1 (p=0.558).
REST2009 relative expression analysis results were presented in Table III. Data analysis showed TXNDC2 signi cant downregulation with the expression ratio of 0.047 (p=0.000). PRM1 and PRM2 genes were also signi cantly (p=0.000) downregulated with the expression ration of 0.000. TNP1, on the other hand, was insigni cantly (p=0.301) upregulated with slight expression ratio of 4.078.

Discussion
Uninterrupted research interests toward introducing a suitable molecular marker to predict SR is a hot topic among researchers in the eld of andrology. In the rst attempt between different phenotype of azoospermia, only SCOS was successfully correlated with RBMY1, and DAZ genes and they suggested a signi cant positive association between these genes and successful SR 18 . BOll/GAPDH mRNA ratio was assessed in different pathological phenotypes of azoospermia, using the cut-off value of 0.5, sensitivity and speci city of 100% was achieved for SR 19 .
Technical improvements made the methodology of the mentioned studies challenging and therefore, demands have risen for accurate and precise methods with the ability to diminish the biases. In accompany with this urgency, RT-qPCR was introduced and applied by numerus upcoming researches.
ESX1 was the rst introduced reliable molecular marker of spermatogenesis with signi cant (p=0.04) concordance of 73.7% 20 . Their further test of seminal uid also con rmed the capacity of ESX1 as a molecular marker of SR with 84% sensitivity; albeit they declared discrepancies between molecular and clinical outputs 15 . In ful lment of the previous studies, we improved the sensitivity of SR up to 95.5% using KDM3A histone demethylase; However, we were also unable to bring concordance between our molecular markers and pathological phenotypes 2 .
In the present study, TXNDC2 was correlated with SH phenotype, while PRM1 and PRM2 showed additional association with GCA/SCOS (Table IV). It is worth noting that genome wide integration of transcriptomics and anti-body based proteomics declared TXNDC8 as a testis speci c protein as well, albeit as an extracellular equivalent of nuclear TXNDC2 2111 . It seems logical to consider TXNDC2 over TXNDC8, as protamine activation takes place in the nucleus. Furthermore, the association of PRM2 but not PRM1 with eMA was also interesting. In other words, these three genes could be altered at the very early stages of spermatogenesis; while when being expressed, it could be the sign of the existence of germ cells. As we know, protamine activation takes place before their DNA binding, the role that could be proposed for thioredoxin. After the release of protamine precursors, a round of sequential phosphorylation and dephosphorylation takes place to strengthen the binding power of protamines to wrap around the corresponding DNA. A key event after dephosphorylation, to complete the activation process, is the oxidation of protamine monomers to produce a head to tail dimer. Thioredoxins are oxidizing molecules acting on Cys residues, which are abundantly present in protamins. Therefore, synchronous downregulation of TXNDC2 and PRM1/PRM2 in SH and SCOS (the most severe phenotypes of sperm failure) could infer their importance for sperm production.
To future examine the observed synchronicity, linear regression model analysis was developed (Table V). TXNDC2 showed a strong correlation with both PRM1 (r=0.761) and PRM2 (r=0.767). The coe cient of determination, correlates up to 60% of PRM1 and PRM2 expression solely with TXNDC2 expression. Moreover, PRM1 was perfectly correlated (r=0.993, p=0.000) with PRM2; which means, the value of PRM2 could be anticipated from PRM1 by 98.6%. Mentioned observations proposed similar behaviors of two co-expressed protamines toward each other. In accompany, animal knockout models and our previous study, con rmed KDM3A as the transcription factor of PRM1 and PRM2, itself under the control of HIF1a 28 . It is also proved that the overexpression of thioredoxin could increase HIF1-a activity 22 .
To determine the clinical importance of our potential biomarkers of SR (TXNDC2 alongside PRM1 and PRM2), logistic regression model of the genes was built based on intact Cqs. Receiver Operator Characteristic (ROC) analysis was conducted afterward, to evaluate the predictive power of these 3 candidate biomarkers. Collective ROC analysis indicated a sensitivity of 96.8% and speci city of 95.5% with the ROC value (AUC) of 0.993(SE=0.0075, 95% CI: 0.978-1.000) (Fig. 5).

Conclusion
Taken together, it was cleared that TXNDC2 was differentially expressed between positive and negative SR. Moreover, TXNDC2 was correlated with severe phenotypes of azoospermia (SH and SCOS). A strong correlation of TXNDC2 with protamination genes was observed. ROC analysis applied on the multiple regression model of granted TXNDC2-PRM1-PRM2 as robust molecular markers of SR with sensitivity of 96.8% and speci city of 95.5%.

Data availability
The dataset (GSE145467, GSE45885, GSE9194, GSE108886, GSE9210, GSE14310) analyzed during the current study is available in the NCBI-Gene Expression Omnibus repository.