Loss of function variant in SMIM1 is associated with 1 reduced energy expenditure and weight gain 2

Blood group antigens are the archetypal example of human genetic variation. Here, 50 we characterised the functional metabolic consequences in individuals homozygous 51 for a 17bp deletion in SMIM1 (rs566629828; minor allele frequency 0.0147) and thus 52 lacking the protein defining the Vel blood group. Our analysis, in separate cohorts of 53 SMIM1 -/- individuals (UK Biobank, NHS Blood and Transplant, Danish Blood Donor 54 Study, Copenhagen Hospital Biobank) and a mouse model, identified an increase in 55 body weight accompanied by a range of metabolic differences, including dyslipidemia, 56 changes in the leptin-adiponectin ratio, increased liver enzymes and lower total thyroid 57 hormone levels. These changes in the metabolic state were at least in part due to a 58 reduction in resting energy expenditure, as assessed during an in-depth clinical The MAF of rs566629828 is at the interface between common and rare variation, and has one of the largest effects on weight ( 𝛽(cid:4632) =0.22) and BMI ( 𝛽(cid:4632) =0.27) reported so far with the exception of extremely rare variants directly implicated in lipid metabolism 22 . Our findings show that SMIM1 -/- individuals (Vel negative blood group) exhibit a combination of metabolic features including increased fat mass, inflammation, triglycerides and altered lipoprotein metabolism due, at least in part, to reduced energy expenditure, a major risk factor in obesity 23,24 . Some of these associations, like urate and GGT, were driven by the effect being stronger in one sex; others, like SHBG and LDL, were found only in one of the two sexes, females and males respectively. In the more extreme cases, these effects the Meta-Analysis of Glucose and Insulin-related Traits Consortium (MAGIC)*

sample size, the significance threshold was not reached (Table S4). We replicated the 130 associations between SMIM1-/-for increased average levels for ALT and AST with the 131 same order of magnitude as observed in UKB (Fig.2, Table S4). We also found 132 associations between SMIM1-/-and increased leptin to adiponectin ratio (LAR;133 =0.53,, and an increase in free fatty acids (FFA; =1.18, 134 FDR=1.42e-06), two indices of increased fat mass and insulin resistance ( Fig.2A and  135 2B) 13,14 . Moreover, we found that SMIM1-/-individuals have lower average levels of 136 total triiodothyronine and thyroxine (T3; =-0.86, FDR=9.87e-04; T4; =-0.74, 137 FDR=2.84e-03; Fig.2A and 2B) and that levels of thyroid stimulating hormone (TSH) 138 seemed to be lower (Table S4). 139 The above findings prompted us to invite 12 SMIM1-/-individuals belonging to the 140 NHSBT cohort for a 2-day metabolic assessment (Fig.S2). We estimated the effect of 141 the absence of SMIM1 on resting energy expenditure (REE; a marker of whole body 142 metabolic activity) by indirect calorimetry and body mass composition by dual-energy 143 X-ray absorptiometry (DXA), using a well-established protocol 15 (Methods). These 144 studies showed that SMIM1-/-individuals had a lower REE adjusted for lean mass 145 ( Fig.2C, x-axis; Welch Two Sample t-test; P=7.62e-04, Table S5), whilst there were 146 no differences in average lean mass compared to 310 unselected controls (Table S5). 147 Average free T3, but not free T4, measurements were lower in the 12 SMIM1 -/-than 148 in the controls group (Table S3 and S4) and changes in physique seen in the 90 149 SMIM1-/-UKB participants were reflected in abnormal body composition visualised by 150 DXA scans (Fig.2D). Because of the effect on REE and T3 and T4 levels, we explored 151 the possible involvement of SMIM1 in the hypothalamic-pituitary-thyroid axis 16 . We 152 analysed the single-cell RNA-sequencing data in studies that dissected the transcript 153 levels of these tissues in multiple organisms. In the mouse hypothalamus 17 154 (GSE113576), Smim1 was expressed at low levels in mature oligodendrocytes and 155 some, but not all, inhibitory neurons (Fig.S3A). Its expression was largely non-156 overlapping with that of the thyrotropin-releasing hormone (TRH; Fig.S3B). In the 157 human anterior pituitary gland 18 (GSE142653), SMIM1 was found expressed in 158 corticotropes, gonadotropes and somatotropes (Fig.S3C). Whilst, in human thyroid 159 organoids and mouse thyroid 19 (GSE163818) low-level expression was detected 160 mainly in thyrocytes and as yet uncharacterised Flt1-positive cells (Fig.S3C). These analyses indicate that SMIM1 could play one or more roles in the hypothalamic-162 pituitary-thyroid axis. 163 The associations between the genotype at rs566629828 and phenotypes observed in 164 the UKB and NHSBT cohorts were orthogonally validated in 73 Danish SMIM1-/-165 individuals from the Danish Blood Donor Study 20 (DBDS; 25 female, 18 male, and 645 166 controls), and the Copenhagen Hospital Biobank  Interestingly an exploratory analysis of hospital episode statistics revealed an 178 increased risk for cerebral events, with 5 cerebral bleeds and 5 thrombotic strokes in 179 the 65 SMIM1-/-UKB participants for whom data were available (OR=5.53 and 3.46, 180 FDR=6.88e-04 and 2.32e-02, respectively; Table S7). 181 In summary, we identified a loss-of-function variant which is present in homozygosity 182 in 1 in 5,000 individuals in Great Britain and with even higher frequency in the Nordic 183 countries (this manuscript and reference 7 ). The MAF of rs566629828 is at the interface 184 between common and rare variation, and has one of the largest effects on weight 185 ( =0.22) and BMI ( =0.27) reported so far with the exception of extremely rare 186 variants directly implicated in lipid metabolism 22 . Our findings show that SMIM1-/-187 individuals (Vel negative blood group) exhibit a combination of metabolic features 188 including increased fat mass, inflammation, triglycerides and altered lipoprotein 189 metabolism due, at least in part, to reduced energy expenditure, a major risk factor in 190 obesity 23,24 . Some of these associations, like urate and GGT, were driven by the effect 191 being stronger in one sex; others, like SHBG and LDL, were found only in one of the 192 two sexes, females and males respectively. In the more extreme cases, these effects could lead to insulin resistance and metabolic syndrome accompanied by increased 194 susceptibility to cardiovascular disease, as is supported by analysis of electronic 195 hospital records, which indicated that these individuals may be more prone to cerebral 196 bleeds and thrombotic stroke. All together the observed metabolic phenotype and 197 increased risk for cardiovascular events is compatible with the notion that the absence 198 of the SMIM1 protein results in a state of mild hypothyroidism. 199 The quantity of genomic data available, including blood donors typed by arrays 2 , is 200 growing rapidly, as is the number of individuals identified as SMIM1-/-.  presence of a specific prescription in the DPD dataset. The explaining variables used 301 were variant rs566629828 genotype, age of the individuals, genetically inferred sex of 302 the individuals (unless cohort was sex-stratified), and in case of mixed cohort analysis, 303 the cohort of a given individual (DBDS/CHB). Since weight data does not follow a 304 normal distribution, a Wilcoxon signed-rank test was used to assess differences in 305 mean weight-based on variant rs566629828 genotype after sex stratification. 306 Bootstrapping was used to assess directionality in mean weights based on the 307 rs566629828 genotype. For each SMIM1-/-DBDS case, 100 alternate age, sex and 308 smoking status matched control groups were picked at random. The mean weight of 309 each of these 100 alternate controls groups was compared to the case group's mean 310 weight. Directionality of the difference in mean weights was then assessed for each 311 sex separately. Statistical tests have P values corrected with Benjamini-Hochberg 312 procedure with alpha set at 0.05. 313

Single-cell RNA-seq analyses 337
We analysed single-cell RNA-sequencing data from the following sources:

Code availability 359
The code used to analyse the cohorts is available at https://github.          Tables index   496   Table S1 | Linear regression outcomes in the UKB cohort for the traits of interest. 497 This table contains the trait tested (feature) and the effect size (effect) of SMIM1 -/-on 498 each trait. Standard deviation (sd); confidence intervals (CI); p.value corrected with 499 the Benjamini-Hochberg procedure (p.val.FDR). The Tabs contains the values for the 500 effect not corrected for BMI (no BMI correction) and corrected for the effect of BMI on 501 the traits of interest (BMI corrected). Also, the cohort was stratified by sex and the 502 different results for the strata have been reported in different tabs. Tabs with "sex 503 label refer to the analysis that uses the whole cohort (i.e. not sex-stratified), but 504 corrected for the effect of sex. On the first tab of this table there is a short demographic 505 description of the UKB cohort, SMIM1-/-and SMIM1+/+ used for this study 506  Fig.1D. 508 Table S3 | Raw data collected in the NHSBT cohort. The first tab has the cohort 509 characteristics and biochemistry raw data collected from the NHSBT cohort. Body 510 mass index (BMI), leptin to adiponectin ratio (LAR), cholesterol (CHOL), triglycerides 511 levels (TG), high-density lipoprotein (HDL), alanine aminotransferase (ALT), leptin 512 (LEPT), adiponectin (ADPN), aspartate aminotransferase (AST), C reactive protein 513 (hsCRP), free fatty acids (FFA), thyroid-stimulating hormone (TSH), low-density 514 lipoproteins (LDL), triiodothyronine (T3), thyroxine (T4). The second tab has the free 515 T3 and free T4 measurements for a subset of the cohort. 516 Table S4 | Linear regression outcomes in the NHSBT cohort for the traits 517 presented in Table S3. It contains the trait tested (feature) and the effect size (effect) 518 of SMIM1-/-on each trait. Standard deviation (sd); confidence intervals (CI); p.value 519 corrected with the Benjamini-Hochberg procedure (p.val.FDR). The Tabs contains the 520 values for the effect not corrected for BMI (no BMI correction) and corrected for the 521 effect of BMI on the traits of interest (BMI corrected). Also, the cohort was stratified by 522 sex and the different results for the strata have been reported in different tabs. Tabs 523 with "sex label refers to the analysis that uses the whole cohort (i.e. not gender 524 stratified) but corrected for the effect of gender.   Table S9. The effect of the SMIM1-/-have been corrected for the BMI (tab "Corrected 556 for BMI )

Table S8 | UKB fields and phenotype definitions used in the analysis of the UKB 558
cohort. The tab "Fields extracted from UKB contains the list of fields that have been 559 used in the characterisation of the UKB cohort. The tab "matrix_ICD10_Phenotype 560 contains the information regarding the ICD-10 codes that have been used to define a 561 disease.    variant. This is in keeping with the observation that lower levels of SMIM1 RNA are 620 associated with higher RDW levels. It is also worthwhile noting that in red cells effects 621 of the eQTL and the 17bp deletion variants are also observed in heterozygous 622 individuals -this is in sharp contrast with the effect of the 17bp deletion on body weight 623 (Fig.1A). All: n=488,376;CEU,460,186;SAS,EAS,9,473;AFR,7,649;CHINESE,1,504;OTHER,9,  Confirmation of 17bp deletion for at rs566629828 by PCR for those who accepted to take part in this study

Fig.S2 | Cohort information and the number of SMIM1-/-individuals per cohort 625
Genotype and phenotype data from four cohorts were used for the study. From left to 626 right UK Biobank (UKB), National Health Service Blood and Transplant (NHSBT), 627 Danish Blood Donor Study (DBDS and The Copenhagen Hospital Biobank (CHB). The 628 top row provides the number of participants for whom genotype (Vel phenotype in case 629 of NHSBT cohort) information was available and the female:male ratio. For the UKB 630 the ethnicity of the participants is also provided (data taken from Bycroft et al., 2018) 10 . 631 The middle row provides the number of SMIM1-/-individuals per cohort which were 632 included in the study; between brackets (female/male) individuals.