Selection of studies
Three databases (PubMed, Google Scholar and Science Direct) were searched for genetic association articles as of September 24, 2019. Search terms included: “interleukin”, “IL-18 ”, “cytokine “, “polymorphisms”, “allograft” and “renal transplantation”. Where duplicate articles were encountered, the later dated one was selected. Inclusion criteria were (1) studies that associated IL-18 SNPs with KT outcomes; (2) IL-18 genotype frequencies that compare KT patients and healthy controls, NR and NRJ. (3) genotype frequency data that allowed calculation of the odds ratios (ORs) and 95% confidence intervals (CIs). Exclusion criteria were studies that (1) did not examine renal allografts or KT outcomes; (2) were reviews; (3) were not about the IL-18 SNPs and (4) had unusable genotype or allele frequencies.
Linkage disequilibrium and data extraction
The included articles examined two IL-18 SNPs, -137G/C (rs187238) and -607A/C (rs1946518), each presented with genotype data (Table 1 and S1 Table). Observed phenotypic associations have been attributed to the proximity of two SNPs [11, 12]. NCI LDLINK (https://ldlink.nci.nih.gov/) results shows that the two SNPS are in linkage equilibrium (LD) based on both European (CEU) and Han (CHB) genotypes . LD is defined as the correlation between alleles located near each other  which is measured in terms of D′ with a value of 1 indicating complete LD . Therefore, IL-18 SNPs with D′ values of 1.00 in this study were reported to be in LD (S1 Table) and combined in the analysis (S2/S3 Table). Given these conditions, rs187238 and rs1946518 SNPs in IL-18 were combined. This combination allowed analysis by GD and allograft as well as subgrouping by ethnicity (Tables 2 and 3).
Two investigators (TE and NP) independently extracted data and arrived at a consensus. The following information was obtained from each publication: first author’s name, year of the study, country of origin, ethnicity, age of the subjects, IL-18 SNPs (rs number) and the Clark-Baudouin (CB) score (Table 1). Sample sizes as well as genotype data between the RJ and NRJ were also extracted along with calculated outcome of the minor allele frequency (maf) (S2 and S3 Tables).
Power calculations and HWE assessment
Using the G*Power program , we evaluated statistical power as its adequacy bolsters the level of associative evidence. Assuming an OR of 1.5 at a genotypic risk level of α = 0.05 (two-sided), power was considered adequate at ≥ 80%. The Hardy-Weinberg Equilibrium (HWE) was assessed using the application in https://ihg.gsf.de/cgi-bin/hw/hwa1.pl.
Methodological quality of the studies
We used the CB scale to evaluate methodological quality of the included studies . The CB criteria include P-values, statistical power, correction for multiplicity, comparative sample sizes between cases and controls, genotyping methods and the HWE. In this scale, low, moderate and high have scores of < 5, 5-6 and ³ 7, respectively.
We estimated odds ratios (ORs) and 95 % confidence intervals (CIs) of association using two overall approaches: (i) genotype distribution (GD) between cases and healthy controls and (ii) allograft wherein RJ were compared with NRJ. Calculated pooled ORs for GD were either higher in patients (hp) or higher in controls (hc); in allograft, they were either increased (in) or decreased (de), indicating risk for rejection. Standard genetic modeling was used, wherein we compared the following, (i) variant (var) genotype compared with the wild-type (wt) genotype (homozygous: Ho). To address importance of the heterozygous genotype, we evaluated recessive (Rc: wt-wt versus wt-var + var-var), dominant (Do: wt-wt + wt-var versus var-var) and codominant (Co: wt versus var) effects. Heterogeneity between studies was estimated with the c2-based Q test , with threshold of significance set at Pb < 0.10. Heterogeneity was also quantified with the I2 statistic which measures variability between studies . I2 values of > 50% indicate more variability than those £ 50% with 0% indicating zero heterogeneity. Evidence of functional similarities in population features of the studies warranted using the fixed-effects model , otherwise the random-effects model  was used. Sources of heterogeneity were detected with the Galbraith plot  followed by re-analysis (outlier treatment). Of note, outlier treatment dichotomized the comparisons into pre-outlier (PRO) and post-outlier (PSO). Sensitivity analysis, which involves omitting one study at a time and recalculating the pooled OR, was used to test for robustness of the summary effects. The low number of studies precluded assessment of publication bias. Data were analyzed using Review Manager 5.3 (Cochrane Collaboration, Oxford, England), SIGMASTAT 2.03 (Systat Software, San Jose, CA, USA) and SPSS 20.0 (IBM Co., Armonk, NY, USA).