Gut microbiome is associated with thyroid nodule and functions


 Background: Thyroid nodule affects nearly half of the adult population. Accumulating evidence suggests that gut microbiota plays an important role in the thyroid disorders and functions, yet the association between gut microbiota capacity and thyroid nodule and function have not been studied comprehensively.Methods: We performed a gut microbiota whole-genome wide association study in 537 thyroid nodule participants and 725 controls from the general Chinese population. Thyroid nodules were with the Thyroid Imaging Reporting and Data System assessment.Results: Participants with the highest grade thyroid nodule had decreased gut microbial species and functional richness compared with controls and thyroid nodule with lower grade. Microbiota composition of the low-grade thyroid nodule was similar to that in controls rather than high-grade thyroid nodule. We found significant alternations of overall microbial functional composition in high-grade thyroid nodule but not the species composition, only a small portion of microbioal species were associated with high-grade thyroid nodule. The high-grade thyroid nodule gut microbiome was characterized by greater amino acid degradation and glycolysis, also with fewer Butyrivibrio and lower butyric acid production. Thyroid functions were positively associated with L-histidine and L-tyrosine related biosynthesis pathways, which are essential components of thyrotropin-releasing hormone and thyroxine, also corelated with broad nutrition metabolism pathways including fatty acids, B family vitamins and nucleotides. High-grade thyroid nodule discrimination models based on the gut microbiome and demography were validated using 294 independent samples and showed stable classification power.Conclusion: Our study comprehensively described the gut microbiome characteristics in thyroid nodule and suggested potiental role of the key gut microbiota-driven functional pathways in thyroid nodule and functions metabolism.


Background
Thyroid nodule (TN) is in increasing incidence worldwide, such that they are now present in 20-76% of the adult population. [1] Although the majority of TNs are asymptomatic and benign, 15% were found to exhibit significant increases in size during a 5-year follow-up [2] and 5-15% can be malignant [3] , making this the type of cancer that is most rapidly increasing in incidence globally [4] . Age, sex, the metabolic syndrome, and iodine intake are risk factors for the development of TNs [5,6] , and several human genome loci, including TRPM3, EPB41L3, and AP005059 have been found to be associated with TNs [7] .
The importance of thyroid function for appropriate metabolic regulation has been recognized for decades, but this is affected by numerous environmental factors [8] and is partially genetically determined in humans [9] . However, the aetiology of TNs and the role of genetics in thyroid function remain to be fully elucidated.
Evidence is accumulating that the gut microbiome plays an important role in thyroid disorders and function by regulating mineral metabolism and enterohepatic cycling [10] .
The importance of gut microbiota on human health has been underlined for more than one decade, however, our understanding of its role in common conditions such as thyroid nodules and thyroid functions are still very limited. A recent study showed gut microbial dysbiosis in 18 patients with TN and associations between the microbiome and thyroid function [11] . However, the significance of these findings was restricted by the small number of patients studied and a suspicion of batch bias [12] , which might have been caused by the use of 16S ribosomal RNA (rRNA) gene amplicon sequencing.
For the present study, we recruited 537 participants with TN and 725 controls as a primary analysis cohort from the Chinese Healthy Integrative-omics project (CHIP). 670 of these 6 participants underwent serum thyroid function testing, including the measurement of free triiodothyronine (FT3), free thyroxine (FT4), and thyroid stimulating hormone (TSH) concentrations. In addition, whole genome shotgun sequencing was performed on stool samples. Using these data, we aimed to define the gut microbial characteristics associated with TNs and thyroid function.

Characteristics of the primary cohort
The primary cohort included 1,262 participants (38.43% women) after all exclusions (see method for details). In 537 (42.55%) participants at least one thyroid nodule was detected. The mean ages of the control and TN groups were 42 and 47 years old. The participants were then classified into four groups according to the thyroid ultrasonographic findings: (1) 725 demonstrated no remarkable findings (controls); (2) 172 had Thyroid Imaging Reporting and Data System (TI-RADS) level 2 (TN2); (3) 328 had TI-RADS level 3 (TN3); and (4) 34 had TI-RADS level 4a and 3 had TI-RADS level 4b (TN4), of which six TI-RADS level 4a and 3 TI-RADS level 4b underwent fine-needle aspiration biopsy, which did not identify malignancy. TI-RADS 4b were not included in the downstream analysis, despite having no malignancy. Participants with more than one thyroid nodule were classified according to the highest TI-RADS level. The demography and laboratory test data are shown in Table S1.
Age and blood pressure were higher in the TN group. In addition, when compared with female controls, female TN had higher body mass index (BMI), waist and hip circumference(WC and HC), triglyceride (TC), total cholesterol (TG), high-density lipoprotein-cholesterol (HDL-C) concentrations, and red blood cell counts (RBC). No significant differences were found between the TN2 and control groups, or between the TN3 and TN4 groups, except for higher systolic blood pressure in female TN4 and higher 7 RBC in male TN4. Thyroid function and lifestyle habits, including smoking and drinking, were not related to TN incidence. Principal component analysis (PCA) based on demographics identified the principal components PC1 and PC2, which correlated with TI-RADS. Pulse pressure (PP) and age were with accordant contribution, together with BMI, WC and HC ( Figure S1).

Gut microbial characteristics of the primary cohort
A total of 928 species presented in at least five samples were used for downstream analysis. 343 species of these were viruses, six were Archaea, and 575 were bacteria. 61 species had a median relative abundance >0.01%, comprising 76.4% the total microbiome. 839,310 gene families and 459 metabolic pathways were represented in at least five samples ( Figure S2). The representation of metabolic pathways was more equal than that of species, demonstrated by mean Gini indexes of 0.768 and 0.983, respectively (Table S2). In addition, the intra-group and inter-group similarities and distances standard deviations were lower, with respect to the metabolic pathways represented, compared with the similarities based on species composition. (Figure S3). Principal coordinates analysis (PCOA) conducted on the basis of metabolic pathway composition showed that the principal coordinates PCO1 and PCO2 correlated with the presence of TNs and TI-RADS (Figure 1a), but not species composition.
TN2 were more similar to controls when inter-group distances were compared with TN3 and TN4 (Figure S4), except with respect to Spearman coefficient distance (SCD) based on species composition. When the distribution of PCOs was analysed, few significant differences were found between the control and TN2 groups (Table S3 and Table S4).
There were significant differences between TN2 and TN3 in PCO1 for metabolic pathways (Table S4). Because >99% of TNs with TI-RADS level 2 are benign and do not require treatment [13] , we then classified control and TN2 as having no or low grade TIRADS (Group 8 LTN) and those with TN3 or TN4 as having advanced grade TIRADS (Group HTN). 1259 individuals could be clustered into two enterotypes on the basis of species composition ( Figure S5). Enterotype 1 had higher microbial diversity, and contained a higher proportion of women (41.1%, compared with 30.9% in enterotype 2). None of TN incidence, TI-RADS level, or thyroid hormone concentrations (serum FT3 and FT4) were significantly associated with enterotype, except that men with enterotype 1 had higher serum TSH concentrations ( Figure S6).

Lower microbial richness in TI-RADS level 4a
The number of observed species (OBS) was significantly lower in TN4 than in the other groups (Wilcoxon's rank sum test, p = 0.005, p = 0.01, and p = 0.011, compared with control, TN2, and TN3, respectively). 56.91% (247 of 434) of the species not detected in TN4 were viruses, and these were less abundant than other species in all the participants: the median fold-differences (common species versus species not found in TN4), based on the relative abundance in all the participants (including TN4), were 121.52 and 2.67 for bacteria and viruses, respectively. The majority of the viruses not detected in TN4 were phages that might be hosted by Enterobacteriaceae. The number of gene families identified and viruses ratio were also lower in TN4 (Figure 1b-e).
The OBS was significantly higher in women in the control and TN2 groups, and was also associated with the age of the women and the BMI of the men in the control group ( Figure   S7). In considering female ratio, BMI and age were higher in the TN4 group than in the control and TN2 groups. In addition, the TN4 group was smaller; therefore, we used bootstrap sampling to generate sample number-, sex-, age-and BMI-matched subsets of controls and TN2, such that each subset contained 40 randomly chosen samples. However, the OBS in the TN4 group was still significantly lower after resampling (Wilcoxon's rank sum test, median p = 0.0365 and median p = 0.0165, compared with controls and TN2, 9 respectively) ( Figure S8). We also used analysis of variance (ANOVA) to confirm the lower microbial richness in the TN4 group, after adjusting for age, sex, and BMI (p = 0.005, p= 0.014, and p= 0.017, compared with the control, TN2, and TN3 groups; Table S5). No ultrasonographic parameters were associated with OBS, except that male TN without thyroid echo had higher OBS (p = 0.004) ( Figure S9).

Gut microbiota changes in thyroid nodules
We analysed 418 prokaryote species and 440 metabolic pathways that had been identified in ≥20 samples. Sixteen species and 24 pathways withstood adjustment for multiple covariates (sex, age, and BMI) and false discovery rate(FDR) controlling multiple testing to show distinct relative abundances between LTN and HTN, by using multivariate association with linear models(MaAsLin2, see method, q value < 0.25) (Figure 2a and Table S6), and these comprised 0.67% and 4.68% of the total numbers of microbial species and pathways, respectively ( Figure S10).
We found that butanoate formation pathways (P163-PWY and PWY-5676) and Butyrivibrio (unclassified) were over-represented in LTNs (Figure 2c), as was the essential transfer RNA charging pathway (TRNA-CHARGING-PWY), which incorporates amino acids into growing polypeptides, could promote butyrate formation by the intestinal microbiota [14,15] . L-isoleucine biosynthesis (PWY-3001) was enriched in HTNs, and in vitro studies of ruminal fermentation have shown that butyrate production increases with isoleucine addition [16] .
The metabolic pathways that were perturbed in HTNs were mainly involved in amino acid degradation and glycolysis. Several metabolomic studies have shown the accumulation of L-glutamate [17] and lactate [18,19] in benign cases of TN, which might be contributed to by the L-histidine and 4-amino butanoate (GABA) degradation (by PWY-5030, HISDEG-PWY and GLUDEG-I-PWY), glycolysis (Glycolysis-I, PWY-5484) and homolactic fermentation (ANAEROFRUCAT-PWY) pathways (Figure 2b). The human intestinal microbiome can ferment lactate to generate butyrate in the presence of polysaccharide [20] . However, the accumulation of lactate and the resulting low pH environment would inhibit butyrate formation [21,22] . In addition, inter-individual variations in short-chain fatty acid (SCFA) synthesis ability are typical [23] . Although L-glutamate degradation (P162-PWY) could also yield butyrate via butanoyl-CoA, P162-PWY was far less abundant (mean relative abundance 0.0077%) than PWY-5676 (0.108%, detected in 867 LTN and 341 HTN), as was another LTN-enriched butyrate synthesis pathway (P163-PWY; 0.00018%) and a neutral pathway CENTFERM-PWY (0.012%) that can also ferment pyruvate to produce butanoate ( Figure S11). The gut microbiota participating in PWY-5676, which was the main butanoate produce pathway via acetyl-CoA fermentation, could not be classified into any known species, but were positively associated with several important butyrate-producers (Table S7 and Figure S12). As a well-recognized histone deacetylase inhibitor [24,25] , sodium butyrate could induce growth arrest in a variety of normal cell types [26,27] , enhance radiosensitivity [28] , and regulate the expression of several tumour-suppressor genes [29,30,31] , as well as increasing I 125 uptake [32] , which has been shown in thyroid follicular carcinoma cells to occur by restoring human sodium-iodide symporter function (nHIS). It has also been shown to increase thyroid hormone receptor expression in GH3 [33] and glial C6 [34] cells.
Furthermore, we identified Bacteroidesplebeius, which is of marine origin and was found to be ubiquitous in a Japanese cohort [30] and with porphyran-degrading ability [31] , to be enriched in male LTNs after covariates (BMI and age) adjusting (multiple linear regression, adjusted p < 0.05). B. plebeius positively correlated with BMI [35,36] , despite LTNs having lower BMI. Numerous in vitro studies have demonstrated that red or brown seaweeds can be fermented to produce butyrate in the human gut, but porphyran or green seaweeds are not effective as substrates [37] . Increases in the populations of Bifidobacterium and Lactobacillus were accompanied by the loss of butyrate-producing bacteria in patients with inflammatory bowel disease [38] . We found that the populations of two Lactobacillus species (L. salivarius and L. sanfranciscensis) were higher in whole HTNs (see method, q < 0.25) and that Bifidobacterium longum was enriched in female TN4 (multiple linear regression, adjusted p < 0.05, Table S8 and Figure S13).

Association of gut microbiota and thyroid function
Data from 624 participants with normal thyroid function were used for association analysis. All the thyroid parameters measured correlated with age and sex (Table S9).
200 pathways and ten species were associated with at least one thyroid parameter after adjustment for covariates by using multivariate association with linear models(MaAsLin2, see method, q value < 0.25)( Table S10). TSH inversely correlated with FT3 and FT4 concentrations, and the microbial features that positively correlated with TSH also showed inverse associations with FT4 and FT3, but for the majority no significant associations were identified (Figure 3a).
Thyroid function was associated with a broad range of metabolic pathways (Figure 3b). Lhistidine biosynthesis (HISTSYN-PWY) was positively associated with serum TSH, perhaps because it is an essential component of thyrotropin releasing hormone (TRH) and essential amino acid that is not synthesized de novo in humans, and therefore could increase TSH synthesis and secretion from the anterior pituitary [39] . Tyrosine biosynthesis (PWY-6629, PWY-6630) and biosynthesis (PWY-6628) were also positively associated with FT3, tyrosine is composed of iodinated tyrosine residues (Figure 3c), phenylalanine is an essencial amimo acid which could be converted into tyrosine via hydroxylation in human. Previous studies have shown that low serum tyrosine is associated with reduced circulating FT3 concentration and thyroid dysfunction [40,41,42] . Otherwise, the majority of amino acid and adenosine biosynthetic pathways negatively correlated with FT3. Caloric restriction could reduce thyroid hormone concentrations by inhibiting 5'-deiodinase activity, which might be caused by lower production of NADPH (PWY-7269) [9] , accompanied by lower ATP content and synthase activity [43,44] . Conversely, administration of thyroid hormones could reduce plasma total free amino acid concentration [45] and inhibit ATP hydrolysis [46] .
Ubiquinol (Coenzyme Q, CoQ), menaquinone (Vitamin K2), various B vitamins (including B9, B12, B1, and B7) which are only synthesized by plants, yeasts and bacteria but not by human host, and some essential pathways involved in CoQ and vitamin biosynthesis were positively associated with FT3 (Table S10). We found Biotin (vitamin B7), which is a commonly used supplement in practice, might induce abnormal thyroid function test results [47] , via its direct effect to increase thyroid hormone and reduce TSH concentrations [48,49] .

Models to distinguish advanced TIRADS thyroid nodules
To evaluate the potential for the gut microbiome to be used to distinguish HTN, we constructed two types of classification models: a) Demographic models based on age, gender, BMI and SP. b) Mixed models based on the microbial species present and their metabolic pathways, together with the demographics for 1,259 samples. All the available 180 LTNs and 114 HTNs from the CHIP cohort were used as an independent validation cohort; the corresponding samples were still in sequencing while we were analyzing the primary cohort data. No obvious batch effect bias was detected between the train and the independent validation cohorts, so we did not filter out any samples (Figure 4a). Model performance was assessed by receiver operating characteristic (ROC) curve analysis using the area under the curve (AUC).
We used a combination of gradient boosting machine (GBM) feature importance ranking and random forest backward elimination to remove irrelevant features that might have hampered model performance in a 10-fold cross-validation in train samples (Figure S14

Discussion
TN is the most common watchful waiting endocrine disorder in clinical practice, normally it is not considered as "sick" but with complex and unknown aetiology. Here, we described the gut microbial characteristics of patients with TN and their relationships with thyroid function using a large number of samples collected from a Chinese cohort. Low-grade TN (TN2) were similar in their demographics and gut microbiota characteristics to controls, and they were therefore analysed together as the LTN group, which is appropriate considering that nearly 100% of TN2 nodules are benign and require no treatment. We 14 found that the presence of highest grade TN (TN4), which was with undetermined malignancy risk (5-10%) [50] was associated with lower gut microbiota species and function richness, though most of vanished species in TN4 might be passenger microbiota with low relative abundance. Overall microbiota function composition of HTN was alterted but not at the speices level, only a small proportion of the gut microbiota species were associated. Zhang [11] and colleagues reported increased microbiota species richness in TN and dramatic alternations in the overall microbial structure based on 16 s rRNA gene sequencing and operational taxonomic units (OTU) profiling. TI-RADS or other thyroid nodule assessments were not available and it is hard to compare these inconstant foundings based on different sequencing methods.
We identified the accumulation of key metabolites including glutamate and lactate by gut microbes in HTNs, which were constant with previous metabolomics studies [17,18] . Our findings indicated that lower butyrate production by gut microbes might be associated with HTN because of the resulting capability to increase iodine-125 uptake in thyroid follicular carcinoma cells [32] , iodine deficiency is one cause of thyroid nodules development [6] , however no similar evidence were showed in normal cell lines. In addition, we did not identify specific butyrate formation microbiota reduced in HTN but an unclassifed Butyrivibrio specy, and it was not significantly associated with detected butyrate production pathways, thus reduced butyrate formation capacity might be from non-specific population. Bifidobacterium longum had been recognized as a probiotic in the general population, was more common in female TN4, and therefore should be administered more cautiously, because it can reduce the population of butyrate-producing microbes 38 .
Various nutrient biosynthesis pathways were associated with thyroid function, indicating 15 the importance of gut microbiota for metabolism [51,52] . L-histidine and L-tyrosine biosynthesis pathways were positively associated with serum thyroid hormone concentrations, and are essential components of TRH and thyroxine. In addition, high-dose biotin (vitamin B7) were used to increase thyroid function in the clinic [48] . Hypothyroidism is associated with vitamin D [53] , B6, and B12 [54] deficiency and vitamin supplementation might ameliorate hypothyroidism by promoting levothyroxine uptake [55] , also gut commensals are highlighted as roles of reservoir producers for B family vitamins which could not be synthesized by human host [56,57] ,but no evidence showed these vitamin supplementary improved thyroid function in normal people. Finally, discriminative diagnostic models based the composition of the gut microbiome and demography showed stable discriminative power in an independent validation cohort, also were models based only on demographic features, although ultrasonic elastography should remain the first choice method for TN diagnostic screening [50] .
Limitations of this study should be mentioned. Those associations identified are not specific and causal, for example, butyrate metabolism has also been shown to be associated with several disorders including type 2 diabetes, inflammatory bowel disease, and obese [58] . In addition, foundings will still at least partially be confounded by demographic parameters even after strict adjustment.

Conclusion
In conclusion, the present study described gut microbiota characteristics in thyroid nodule and its relationship with human serum thyroid functions. Our findings indicated alternations in gut microbiota functions in high-grade TN but not species composition.
Notably, we found several gut microbiota driven metabolism pathways were associated with thyroid nodule and functions, which directly participate in metabolites biosynthesis, importantly, these metabolites are with capability of cell disturbance might be associated with thyroid nodule and are essential components of thyroid hormones, and the majority could not be synthesized de novo in humans. Although these associations were not causal and needed further evaluation, we have demonstrated that the gut microbiome is linked to high grade TN and thyroid functions, relationships that are likely to be mediated via microbial nutrition metabolism, which suggested its broad impact on human common conditions.

Study population
The participants in this study were recruited from the Chinese Healthy Integrative-omics project (CHIP), which was approved by the Ethics Committee from the First Affiliated

Sample collection
Stool samples were obtained from recruited participants. Samples were immediately stored at −20 °C and frozen at −80 °C within 30min before transport to the laboratory.
Blood and stool samples were all obtained on the same day at the hospital before 10 am.
All blood samples were drawn in the morning before 10:00. During blood drawing and fecal retention, participants maintained fasting.

Laboratory test
Lavender-Top tubes that contained EDTA as an anticoagulant were used to test the blood routine examination with 1-2ml of venous blood and the tubes were shaken up and down Coulter was used to test the blood samples.

Demographic Feature Measurement
Participants stood on the footprint of the scale with bare feet and face the scale to measure height and weight. After hearing the voice prompts, participants kept their eyes level, chest straight and their stomach in place. After measuring for about 5 seconds, the scale showed their height and weight, and then participants left the scale. BMI in Kg/m2 was calculated by dividing weight in kilograms by the square of height in meters.
Participants stood on a horizontal floor to measure waist and hip circumference with their feet joined together and their arms sagged naturally, spreading their weight evenly across their legs. When measuring waist circumference, participants were told to expose the abdominal skin and relax (belly without clothes), breathing normally, no sucking in their belly or holding their breath. Placing the tape around a horizontal surface about 1cm above the navel so that it closed to the skin but did not press against the skin. When measuring the hip circumference, the tape should be tightly against pants at the widest part of the buttocks and wrapped in the surface but should be avoided getting trapped.
The accurate of waist circumference and hip circumference were 0.1cm. During the measurement, participants straightened their back without pressing their abdomen and leaning forward, slightly to the left of their front, extended their arms into the test site and rest their elbows on the elbow pads. When the measurement was ready, press the switch and keep relaxed. Before displaying the results, participants were reminded not to speak, not to move their arms, and to wait for their blood pressure to appear. All the measurement was repeated three times according to the above operation, and the average value of three times was taken to be the final data.

Thyroid scan
The affiniti50 color Doppler ultrasound system from Philips was used to check the thyroid gland with a linear array probe and a frequency of 5-12MHz. Participants took a conventional supine position with their heads back, heads down, and necks high to allow full exposure of the anterior neck area. Continuous top-down cross-section, longitudinal section, and multi-section scans were performed to record the overall thyroid and intrathyroidal nodules in detail. The description of intrathyroidal nodules covered the site, echo, number, size, aspect ratio, margin, border, morphology, stun, internal structure, calcification, blood flow signal, etc. Thyroid nodules were classified as different levels according to Kwak-TIRADS [59] criteria.

Fine-needle aspiration (FNA) biopsy
Color Doppler ultrasound is used to examine the thyroid when performing the FNA. The first step was to observe the target nodule and locate it on the body. The patient was in a supine position with the shoulders raised so that the head was overextended and fully was applied to filter out low-quality using overall accuracy (OA≥0.8) control strategy as previously described [60] . The high-quality reads were aligned to hg19 using SOAPaligner/soap2 to filter out human reads (identity ≥ 0.9) (RRID:SCR_005503). On average 12.5 Gb (± 3.3 Gb) high-quality non-human data was generated per sample.
Totally we got 15.73 TB data for downstream analysis.

Microbiota composition profiling
Taxonomic annotation and quantification were performed based on MetaPhlAn2 with default settings [61] , generating gut microbial profiling including bacteria, archaea, eukaryotes and viruses. Taxon-specific community functional profiles were further generated using HUMAnN2 (the HMP Unified Metabolic Analysis Network 2). [62] Principal coordinates analysis (PCoA) PCoA were performed based on species or pathway relative abundance by using R program "ade4" package. Four distances were used including Hellinger distance, (JSD), spearman coefficient distance(SCD) and pearson coefficient distance (PCD).

Enterotype analysis
We used analysis flow introduced by Arumugam.M [63] to identify enterotype number.
Combination of four distances (Hellinger, JSD, SCD and PCD) and two clustering methods (complete and k-means) were used, with different clustering assessment indices from "Numpy" package, finally major-rule was used to identify enterotype number as two.

Microbiota diversity and gini index calculation
We used R program "vegan" package to calculate shannon, simpson index and observed species number (OBS) in each sample. Gini index was calculated with "ineq" package. Gini index of specy composition was calculated based on 459 randomly chosen species, for fair comparison in considering more species were detected than pathways.

OBS comparison based on resampling subsets
40 samples were randomly chosen from TN2 and TN3 respectively. Hypothesis tests of 21 gender ratio (fisher's exact test), BMI (student's t test) and age (student's t test) were performed .Only if no significant differences (p-value >0.4) were detected in all three features, then this subset was used to compare OBS between TN4 and TN2 or Control, finally we used median value of all p-value (Wilcoxon rank sum test) as adjusted p-value.

Association between microbiota and thyroid nodule or thyroid function
MaAsLin2 (Multivariate Association with Linear Models) [64] was used to associate microbiota and thyroid nodule or functions with default parameters except for "-p 0.2", while controlling for demographic covariates (gender, age, BMI). Multiple adjusting was built-in BH method. Positive coefficients of microbial species or pathways with HTN group (or TSH, FT4, FT4) were considered with positive associations. Covariates indicate BMI, age and gender unless with otherwise mention in Method section.
Univariate covariates adjusting: R program "glmnet" package and analysis of variance model(AOV) was used to assess correlation between microbiota (including OBS) and covariates with group(TI-RADS or HTN group).

Microbiota inner-interaction assessment
We used spearman correlation coefficient and microbiota relative abundance, to identify microbial inner-interactions, ie, interactions between pathway and specy, specy and specy, pathway and pathway. We randomly subsampling 80% of all samples to bypass possible outliners and repeat this process 100 times, finally used mean spearman correlation coefficient and mean p-value as inner-interaction measurements.

Statistic analysis
Participants' demographic, lab test and life style characteristics were summarized with a standardized statistical significance test method [65] , where the test pathway mainly adopts one way ANOVA and chi-square test (Figure S16), the results includes control vs 22 nodule, control vs TN2, TN3 vs TN4, LTN (control+TN2) vs HTN (TN3+TN4) for female and male (Table S1). Categorical variables were shown as counts and percentages (n(%), missing n(%)), and associations were tested using a chi-square test. Continuous variables were shown as mean (± standard deviation), and differences between groups were analyzed in detail using one way ANONA and normality test (Shapiro.test), then homogeneity test (Bartlett.test or Levene.test), and finally parameter test (aov.test or kruskal.test) and non-parametric test (oneway.test). Statistical analyses on cohort characteristics were performed on R version 3.6.1. A two-sided p value<0.05 was considered statistically significant. R program "lars" package was used to assess correlation between covariates with LTN/HTN group and thyroid functions by using LASSO (Least absolute shrinkage and election operator) regression.

Discriminative models construction
We designed a whole process to predict thyroid nodule by merging 1244 participants' demographic information, species profiles, and pathway ( Figure S14). This process mainly includes data preprocessing, feature selection, and disease prediction. Specifically, for data preprocessing, data standardization makes all values between 0 and 1. For feature selection, three methods were used to identify the representative features, (1) Filter method: constant variable and variance minimum variable elimination method reduced the original 1392 features to 596 features, strongly correlated variable elimination method further filtered 100 features, and 496 features were preserved. (2) Wrapper method: we trained a Random Forest using tenfold cross-validation to obtain prediction accuracy and adopted backward elimination method to obtain 104 features from the previous feature sets (Figure S15a). (3) Embedded method: we split 1244 participants into train data and test data in a ratio of seven to three, used a Gradient Boosting Machine (GBM) with tenfold cross-validation for train data to compute relative influence of each feature, and finally identify 53 features by ranking and accumulating the relative influence up to 85% model contribution (Fig. 4a). For disease prediction model, GBM was used to predict thyroid nodule [66] . (1) Model training: we trained a GBM using tenfold cross-validation for train data, set the distribution function as "Bernoulli" (logistic regression for 0-1 outcomes) and gridding parameters (e.g., interaction.depth, shrinkage, n.minobsinnode, and bag.fraction) to search the optimal parameter values for minimum error, and the loss function is defined , where is the true value and is predictive value.

Ethics approval and consent to participate
This study was approved by the Ethics Committee from the First Affiliated Hospital of Zhengzhou University and informed consent was obtained from all subjects.

Consent for publication
Not applicable.

Availability of data and materials
All shotgun whole genome sequences have been uploaded into The European Nucleotide Archive PRJEB36271. Source data including microbial species and functional composition, analysis and related code are deposited in https://github.com/xiaoshubaba/CHIP-thyroid.
Demographic features and serum thyroid functions used in this study are available for only acedemic usage, please contact Professor Suying Ding (fccdingsy@zzu.edu.cn) for metadata request.

Competing interests
The authors declare that they have no competing interests.