This review was conducted following the recommendations of the Meta-analysis of Observational Studies in Epidemiology (MOOSE)  and reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)  guidelines.
Literature search and eligibility criteria
Literature search was done using five databases, PubMed, Embase, Scopus, Web of Sciences, and Google Scholar, for studies published until December 15, 2020. We used a combination of MeSH terms and free texts referring to vitamin D status and clothing type. The terms referring to vitamin D status were ‘vitamin D’, ‘vitamin D status, ‘vitamin D deficiency’, and ‘25(OH)D’. The terms referring to clothing type were ‘clothing type’, ‘clothing style’, ‘clothing’, ‘dressing style’, ‘lifestyle’, ‘concealing clothing’, ‘unconcealing clothing’, ‘hijab’, ‘niqab’, ‘chador’, and ‘veil’. A PubMed search strategy, developed using a combination of MeSH terms and free texts, is presented as a sample of the search strategy (Additional file 1). The PubMed search strings were translated into the other databases. Hand searching of the reference list of eligible articles was done. Besides, we also hand-searched articles using the PubMed ‘Citing’ and ‘cited by’ links for each eligible article. The search included cross-sectional, longitudinal, cohort, case-control, clinical, and randomized control studies, irrespective of the year and the country of the study. In line with the objective of the study, the search was restricted by age and sex; such that, we included only adult women aged 18 years and above. Articles were excluded on any one of the following conditions: (a) animal studies, (b) language other than English, (c) articles without full text, and (c) qualitative studies, book chapters, symposium and conference proceedings, essays, commentaries, editorials, and case reports, (d) study focused on VD insufficiency rather than VDD, and (e) study done among patients.
Study screening and data extraction
First, the title and abstract of the studies retrieved from database searches were assessed and screened by two evaluators (SHM, SA). Full text reviewing was done by the same two authors to determine whether the studies fulfilled the predefined inclusion criteria. The two reviewers (SHM, SA) worked independently while conducting title, abstract, and full text reviewing. Any disagreement at any stage of the screening and selection process was resolved by discussion. The study screening and selection process, including the specific reasons for inclusion and exclusion, is shown in the PRISMA flow chart (Figure 1). Data extraction was done by SHM, checked by SA. From each study, the following information was abstracted (a) study identification (first author, publication year, title), (b) study characteristics and population (country, sample size, study design, follow up years for longitudinal studies, mean age of participants), (c) clothing type, (d) outcome assessment method, (f) measure of association and reported estimates, and (g) variables used for adjustment. The pre-specified measures of association were standardized mean difference (SMD) in serum 25(OH)D, RR or OR of VDD in women wearing concealing clothing like hijab compared with those not wearing concealing clothing. Whenever a study reported both SMD and RR/OR, we extracted both estimates. For studies that reported both adjusted and unadjusted estimates, we extracted the one adjusted for more covariates. The predefined standardized unit of serum 25(OH)D was ng/ml. Other units like nmol/litre were converted to ng/ml.
Study quality assessment
The quality of the studies included in this work was evaluated using the Newcastle-Ottawa Scale (NOS) for observational studies . The NOS scale allowed evaluating the methodological quality of the included studies in three main domains: selection, comparability, and exposure/outcome. The NOS scales range from none to nine stars. Stars from 0 to 3 represent low quality, 4 to 6 medium quality, and 7 to 9 high quality. The grading was done by two evaluators (SHM, SA), working independently and in duplicate, with disagreement resolved by consensus.
OR and SMD were used to pool the studies and calculate summary estimates. Thus, two separate meta-analyses were done; one for studies which reported OR and the other for studies which reported SMD. The OR represents the odds of VDD in women wearing concealing clothing compared with women not wearing concealing clothing. The SMD represents the difference in the mean serum of 25(OH)D of women wearing and not wearing concealing clothing. For studies which reported only OR or SMD without 95%CI, we calculated the corresponding 95%CI and standard error (SE) using the reported p-values. When exact p-values were reported, they were directly used for the calculation of 95%CI and SE. If P-values were reported as P<0.001, we assumed P=0.001 to calculate the corresponding 95%CI and SE, and if the study reported P>0.05, we assumed P=0.53 . We aimed to calculate the summary estimates, pooled OR and SMD, with fixed-effects model if there was no significant level of heterogeneity or with DerSimonian-Laird random-effects model if there was a significant level of heterogeneity .
Heterogeneity between the included studies was evaluated by I2-metrics. I2 refers to the proportion of variance attributable to between studies’ heterogeneity. I2-values of <40%, 50–75%, and >75% refer to low, moderate, and high levels of heterogeneity, respectively . When heterogeneity was high, we conducted further subgroup analyses to identify the sources of heterogeneity and provided estimates by subgroups.
Publication bias and sensitivity analysis
We aimed to evaluate publication bias graphically by visual inspection of asymmetry of funnel plots and statistically by Egger's regression test. Sensitivity analysis was conducted following the leave-one-out and analyze the rest method. We aimed to exclude studies, the exclusion of which resulted in pooled estimates outside the 95% CI of the overall estimate. Statistical significance was determined at P=0.05. All statistical analyses were done using Stata version 15.0 software.