Why we missed it? Computational analysis reveals distribution patterns of Malassezia furfur, the etiological agent of Pityriasis versicolor, in skin metagenomes

Malassezia furfur is the main causative species in pityriasis versicolor infections and is now widely believed to be a part of the skin microbiota, yet it was systematically missed in the early skin metagenomic studies. Here, we curated a specic set of M. furfur sequences and used them to reanalyze publicly available skin metagenomes to computationally investigate the distribution of M. furfur and its relative abundance at different skin sites. To this end, we used BLASTN to match and align these marker genes to the selected metagenomic datasets and estimated M. furfur relative abundance as the number of BLASTN hits per million metagenomic reads. We found a relative enrichment of M. furfur in the retro auricular crease, the antecubital fossa, and the forehead. Among skin categories, sebaceous areas were the most signicantly enriched in M. furfur, while in terms of exposure/occlusion, exposed areas had the highest abundance. This work will facilitate and allow the estimation and correction of past estimates of this important fungal species in shot gun metagenomic data sets.


Introduction
With a surface area of 1.8 m 2 , the skin is the largest organ in the human body and the rst barrier against invasion by pathogens [1].
The skin microbiota is the assemblage of commensal bacteria and fungi that reside on the skin. Like other components of the human microbiome, which are in a dynamic state of change-akin to a cloud of uncertainty [2], the skin microbiome composition differs according to the physiological nature of the skin site, which can be oily, moist or dry [3]. Skin diseases are correlated with the alteration of the normal microbiota, which starts with an imbalance in the microbial homeostasis that is termed dysbiosis [4].
Mycobiomics studies the fungal subpopulation of the microbiome. Mycobiome studies are key to determine the exact role of the human-associated fungal communities and their impact on human health state [5]. Fungal pathogenesis is an increasingly signi cant cause of human morbidity and mortality [6,7].
Mycobiome analysis for precisely reconstructing the fungal community during fungal infection is highly challenging, but with great clinical value for detection and curation of fungal pathogenesis [8].
While bacteriome studies largely rely on 16S rRNA gene pro ling to determine the bacterial and archaeal community composition in a given environment or body site [9], mycobiome analysis cannot rely on 18S rRNA amplicon sequencing with the same e ciency. Being eukaryotic microbes, fungi have similar ribosomes to those in human cells, and thus 18S rRNA amplicon data sets may be overwhelmed with human sequences outnumbering fungal sequences. Instead, the internal transcribed spacer areas (ITS1 and ITS2) of the rRNA-encoding genes are used as better markers for fungal pro ling [10].
Malassezia is a lipophilic dimorphic fungus known as pityrosporum. It is a commensal skin fungus, but in some conditions, it behaves as a pathogen. Malassezia species reportedly represent 50%-80% of the skin fungal microbiota, and are most common in the sebaceous areas of skin, such as the trunk, face, scalp and back. Malassezia furfur is the main causative species in pityriasis versicolor infections [11,12]. Surprisingly, M. furfur was systematically missed in the early skin metagenomic studies. Here, we curated a speci c list of M. furfur ITS and other ribosomal sequences, and used them to reanalyze publicly available skin metagenomic data sets to computationally investigate the distribution of M. furfur and its relative abundance at different skin sites.

Materials And Methods
Data sets: We retrieved publicly available skin metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive. Speci cally, we used the human skin metagenome BioProject PRJNA266117 [13], which includes 675 experiments, BioSamples and sequence runs (URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA266117/). We selected a subset of 92 biosamples, representing negative controls (clean swabs) in addition to 11 different skin sites ( Figure 1) in the human body. The different biosamples can be categorized according to their physiological nature ( proposed by Jagielski et al. [14], as a query set ( Table 2).

Bioinformatics analysis:
We used the basic local alignment search tool (BLAST) to screen metagenomic sequences for M. furfur marker genes. The stand-alone BLAST+ application [15] was used for nucleotide searches and mapping, with the BLASTN program. The nine marker genes listed above were the query set and the metagenomic data sets were used as "subject sequences" in the BLASTN searches. An expect value (E-value) of 10 -5 was used as a threshold.  For optimization and comparative studies, we calculated different metrics, which we later aimed at comparing: (i) total number of hits to any marker gene; (ii) total number of hits to each marker gene; (iii) number of best hits to any marker gene; (iv) number of best hits to each marker gene. FPMs were computed for all the above metrics ( Figure 2) and optimized as indicated in the Results section.
Statistical analysis: Statistical analysis was performed in the R environment [16], installed from URL: https://www.Rproject.org, with the RStudio [17] interface (URL: http://www.rstudio.com), with speci c use of the data analysis packages: ggplot2, corrplot, and beanplot. In some instances, the GraphPad Prism program (GraphPad Software Inc., San Diego, CA) was used for con rmation.

Results
To investigate the distribution of M. furfur in different body sites, we considered the normalized hits FPM as an estimate of M. furfur relative abundance. We statistically analyzed the relative abundance of M. furfur in skin samples with different physiological conditions (Figure 1). We also explored the effect of sample occlusion, subject age and subject gender on the relative abundance of M. furfur in the samples.

Optimization of metrics:
We started by testing each of the nine biomarkers to assess their diagnostic ability, reproducibility, and thus reliability in detecting M. furfur sequences within metagenomes.
The results obtained from aligning each of the nine markers, and their combination, as BLAST query sequences were quite reproducible ( Figure 3). Pearson correlation analysis of number of BLAST hits, represented as normalized hits FPM, indicated a high correlation near 1 in most cases (Pearson correlation coe cients > 0.9, reaching 10.0 at instances). Two sequences with accession number KC141972.1, KC141971.1 gave exactly the same results ( Figure 3).
Since the results obtained from normalized hits (FPM) for all genes were quite correlated with results from those of single genes (Figure 2 and Figure 3), as well as with best hits for all genes, we estimated the abundance as all gene FPM, as these values were comparable for small or large data sets, since they were normalized for the number of sequence reads.
Effect of skin site on the distribution of M. furfur in representative skin metagenomes ( Figure 4A) The type/location of the skin site signi cantly affected the relative abundance of M. furfur among the samples (Kruskal-Wallis rank, p-value = 1.261x10 -06 ).
Among the 11 skin sites (Figure 1 and Table 1) included in our analysis, the retro auricular crease, which is a sebaceous area behind the ear, had the highest relative abundance of M. furfur, followed by the antecubital fossa, a moist joint area, then the forehead, a sebaceous and exposed area. Next came the palm, which is an intermittently moist and exposed area ( Figure 4A).
The back and the subclavius area, which are both sebaceous areas that are covered most of the time had intermediate abundance. Finally, the axilla, plantar heel, toe web space, umbilicus and volar forearm areas had the lowest M. furfur abundance. Interestingly, all these sites are neither sebaceous nor in a joint area and they are covered by clothes most of the time.
Expectedly and reassuringly, the control samples had the lowest number of hits, con rming the speci city of the methodology. These were included in the published study to normalize the reading errors in the results.
Effect of skin nature on M. furfur abundance: The relative abundance of M. furfur was signi cantly higher in the sebaceous skin sites, followed by the rarely intermittently moist sites, then the moist sites, and nally the dry sites (Kruskal-Wallis test, p-value = 1.523x10 -08 , Figure 4B).
From another perspective, whether the skin is occluded or exposed signi cantly affected the relative abundance of M. furfur (Kruskal-Wallis rank, p-value = 0.0004654). The highest abundance was observed in the exposed sites, followed by occluded then intermittently occluded ( Figure 4C).
Sex and age had no signi cant effect on the relative abundance of M. furfur among different skin sites (Kruskal-Wallis rank, p-value = 0.6458, 0.1498, respectively) ( Figure 5).

Discussion
Malassezia species are the causative agents of many skin diseases, and their dysbiosis has been linked with many skin disorders and complications of other infections, e.g., HIV [18]. Some researchers reported that Malassezia species may have a role in skin cancer [19] or to the progression of internal organs cancers (esophageal, gut, pancreatic and prostate); in particular, M. furfur was related to the prognosis of prostate cancer [20,21].
The metagenomic study we chose to analyze here (BioProject PRJNA266117) was mainly concerned with the double stranded-DNA virome and its relationship with the human skin microbiota, and focused to a lesser extent on the bacteriome and mycobiome [13]. In terms of fungi, that study only reported the presence of M. globosa but overlooked other Malassezia species, especially M. furfur, which has higher clinical value than M. globosa [13]. This lack of detection of M. furfur is not surprising and is most likely attributed to the absence of a well-developed molecular biomarker to differentiate between Malassezia species at the time of the study (before 2015), given that the earliest NCBI record of a reference ITS sequence for M. furfur (NCBI accession: NR_149347. Because the marker set is rather small in number (nine sequences, Table 2), which would strongly affect E-values, underestimating true positive results (i.e., the E-values would be exaggeratedly high if the BLAST database is as small as nine short sequences), we used the marker sequences as a BLASTN query to screen metagenomes (which were used as a search space or 'BLAST subject sequences'). Either way, we examined all alignments visually to make sure the results were valid and reliable.

Conclusion
Based on the above ndings, we conclude that the physiological nature and occlusion of skin site (as exposer to air and joint bending location) are signi cant factors affecting M. furfur distribution and its relative abundance. On the other hand, sex and age have a minor effect. We can also conclude that alteration of the skin condition may affect the transformation of M. furfur from commensal to pathogenic; we need more investigations to see if this could be applied for other Malassezia species and the whole fungal comminutes in the skin as well.

Declarations
Ethics statement: This study is merely computational and involves no patient records or experimental work on human subjects or laboratory animals. All raw data used and analyzed in this study are publicly available.  Scatter plots for all-genes hits vs. single-gene hits, all-genes FPM vs. single-gene FPM, all-genes FPM vs. all-genes hits vs. all-genes best hits FPM. FPM = fraction of hits per million metagenomic reads. Pearson correlation coe cients for the four panels, were 0.999, 0.998, 0.938, and 0.996, respectively.

Figure 3
Correlation matrix between number of best hits and best hits FPM obtained form each one the ninemarker genes as query and all of them as a one query (described as 'multi' in the matrix).