The high depth sequencing and the proposed pipeline allow the identification of different taxonomic groups from only one analysis, allowing inferences about the presence of several pathogenic microorganisms in a shorter time. This approach allowed us to observe a broader range of taxonomic groups, indicating the occurrence of the most common pathogens or others that so far had not been described in human semen.
In addition to the diversity of pathogenic bacteria (Figure 3A), the number of reads for Plasmodium, Trypanosoma, and Trichinella caught our attention. Although we have exhausted in silico confirmations, additional in vitro analyzes are usually required to confirm whether these eukaryotic pathogens are present in the samples. As seen in Figure 1, the reads of some eukaryotic groups showed redundancy with other taxonomic groups. This is common in this type of analysis, given the complexity of these eukaryotic genomes. In any case, we believe that our objective of using WGS as a prospective technique has been met since obtaining many results from a few analyzes speeds up the diagnosis. To achieve this same number of taxa adopting other approaches, combining techniques and most samples would be necessary.
An unexpected finding was that bacterial genera identified by WGS were better represented when compared to other classical approaches for bacterial identification. As no other studies are using WGS in seminal human samples, the most recent meta-analysis of the seminal human microbiome identified four studies were used 16s rRNA techniques 10,20,23,24, and, although not the same we used, they are culture-independent and seek to identify non-target organisms within the proposed group 9. From these, just three presented data for comparison 20,23,24. Our pipeline identified a significant number of bacterial genera above 0.1% of the total genera identified (Figures 3B-C). The coverage and similarity of genera identified by WGS concerning bacterial-target techniques indicate that the gain in identifying the other phyla outweighs the loss of target-specificity. In this way, despite the predominance of the abundance of nucleic acids dominated by the host's background 21, WGS could be an interesting strategy for bacterial analysis.
It is noteworthy that differences in results are expected even when participants and techniques are maintained due to the natural dynamic population balance. The results are consistent regarding technique, even respecting this premise, as prospecting from primers covering the V3-V4 region has confirmed more organisms than the primers for the V1-V2 region 25. The disadvantage of lower coverage must be balanced with the advantage of avoiding false positives depending on the objective of each study. In this intuit, we adopt rigorous processing of reads, as Phred Score 20.0 and reads with references to exclusive groups were adopted, as shown in the upset plot.
The identification Eukarya group was specifically more challenging due to the homology between gene sequences. For this reason, we confirmed the findings for Eukarya and virus groups by aligning to reference sequences in BlastX in order to decrease the risk of false positives due to sequence homology. This step is crucial since a higher number of different regions of the microorganism's genome reduces the chance of a false positive.
Reports on eukaryotic pathogens in human semen are scarce, with only reviews available 26,27. Our results presented two dominant genera: Plasmodium (64.4%), Trichinella (21.2%).
The Plasmodium was the organism that presented the higher number of hits identified, already described as potential agents in infertility 27. The first report of the destructive potential of Plasmodium in male human fertility was published in 1987 28 and, since then, researches have shown a reduction in testosterone levels, an increase in cortisol, decreased in the ratio of T-helper to T-suppressor cytotoxic cells, decreased sperm motility29, decreased sperm count and also adverse effect to antimalarial drugs 27,28,30. The impact of this genus in reproductive capacity is considered harmful in mice and may cause congenital transmission, lower pregnancy, reduced fertility, increased abortion, increased neonatal mortality, overproduction of inflammatory cytokines (tumor necrosis factor - α), and degeneration testicular 27,31,32.
About Trichinella, as much as they belong to different taxonomic families, helminths share the ability to modulate the host's immune response directed at themselves and bystander antigens, such as vaccines and allergens, with both advantageous and disadvantageous consequences 33,34. The coexistence of parasites affects the host organism uniquely 35, and, like the microbiome, this joint action requires further studies. Trichinella is the only known genus of the family Trichinellidae, and a minor human nematode parasite and the largest of intracellular parasites 36, with larvae already identified in body fluid and organ of the body, including lymph nodes, urine, placenta, mammary gland tissue, milk, skin, and virtually every tissue 37,38. Pawlowski 39 claims that "many aspects of clinical trichinosis remain unknown or vague due partly to the limited possibilities for studying trichinosis in man."
Hanseniaspora represented 5.4% of the eukarya identified; however, their pathogenesis and behavior remain unclear 40. Despite rare findings, H. uvarum has already been identified in nails 41, oral cavity 42, and epithelial lesion 40. Jankowski et al. 40 cite a finding in vaginal discharge, but we have not retrieved the data reliably. Batista et al. 43 identified Hanseniospora valbyensis in a patient's appendectomy sample and a report of onychomycosis by Hanseniospora in 1928 in the German medical literature. Using advanced detection/identification methods, the list of emerging opportunistic infections by unusual fungi is expanding rapidly worldwide 41.
The genus Caenorhabditis is a genus of taxonomy still under construction, with at least eight species described only in the last decade 44; some are not even formally classified 45. C. elegans, one of the most used models in research directed to human health for having 30-60% of the orthologous genes or strong homology with mammals 46, are identified. However, as this nematode has a free life, we consider the finding an artifact because most of the hits for amniotic have been removed and allocated to this species.
We consider the identification of Trypanosoma relevant, despite containing only 3.2% of the reads for Eukarya. Although few studies exist in humans, these point out that during trypanosomiasis infection, sterility or infertility, menstrual disorder, loss of libido, impotence, amenorrhea, and degeneration of seminiferous tubules and testis may occur 47,48. There is evidence that the pathogen also causes specific damage to the hypothalamic-pituitary axis 27,48; however, the mechanisms of action are unknown.
The considerable number of reads associated with retroviruses (5,051) draws attention. It leads us to reflect on the potential incorporation of the viral genome into the germ cell genome (and the consequent risk of transmission to offspring), already commented on by Dejucq & Jégou as worrisome 6. We consider the hypothesis that it is possible to have genetic variability benefits in this incorporation. It is important to note that the classification was performed by sequence homology comparison against a public database, whose data on virus-related genomes are incomplete and with a large number of "unknown" sequences since despite being the most abundant biological entities on the planet, it is estimated that only 1% of viral sequences are recorded in the reference databases 49,50.
About 8% of the human genome comprises a material remnant from viral infections known as endogenous human retroviruses (HERVs) 51. These elements were acquired during the evolution process by vertical inheritance, reaching ∼203,000 copies in the human genome, and it is assumed that about 30% of this material is active and transcribed elements 52. It is common to infer that HERVs only cause harm to the host. However, it is necessary to remember that they are expressed at low levels in all human tissues and can provide potential benefits to their hosts, such as the hypothesis that their cooptation by vertebrates prevents infection by other related exogenous viruses 53. A relevant example is the cooptation of an endogenous retrovirus envelope gene that started to form the syncytiotrophoblast during pregnancy 54 and its active expression in embryogenesis 51,52. Particularly the testicles and the placenta appear to be privileged tissues for the expression of HERV 55.
The sequence mapped to Y17832.2, HERV –K (C7), which is believed to be an allelic variant (YIDD-to-CIDD mutation) of a proviral sequence that carries all the ORFs that supposedly express a non-functional RT56. The HERV-K family is the most recently acquired 54 and the most active transcription 51. Almost a third of the proviruses in this family represent specific human inserts, of which 48% are polymorphisms 52. The high expression of HERV-K envelope proteins in placental cytotrophoblast cells suggests their potential involvement in placentogenesis and pregnancy 54,57−59, with many copies with full open reading frames (ORFs) transcribed and translated, especially in the initial embryogenesis 52 and differentiation of human fetal tissues 57. Unlike the HERV-K family, open reading frames encode functional proteins, including a fusogenic glycoprotein attributed to normal placental development. 60–62. However, it is unknown whether it represents an exogenous retrovirus with closely related endogenous elements or an endogenous replication-competent, virion-producing provirus 63. Also, the mechanisms by which neuroinflammation occurs are still unclear 64. The Gammaretrovírus presented 28% of identified viral reads. It is also known as onco-retroviruses for its leukemia-inducing properties and transducing properties in stem cells and progenitors 65.
Our findings allow considering the use of prophylactic protocols not only for bacteria but also for eukaryotes. We consider that the conclusions about favorable or unfavorable seminal microbiotas need more attention because a) the actual microbiota potentially present in the semen is unknown, and b) there is no statistically solid evidence or methods reproduced in a multicenter way. We believe that our efforts, coupled with multicenter prospective efforts added to machine learning, will allow for the elucidation of the functional microbiome of the male reproductive system and will bring a new view to fertility parameters and clinical practice.