1. Study design
This was a single-center retrospective, observational cohort study. The study protocol was reviewed and approved by the Tohoku University Hospital Ethics Committee (2020-1-608). All the patients provided written informed consent. The research followed the Japanese Ministry of Health, Labor, and Welfare ethical guidelines for medical and health studies in humans.
2. Subjects
From August 2002 to December 2020, we enrolled consecutive, self-reported Japanese CD patients who had a history of treatment with IFX (Remicade®, Mitsubishi-Tanabe Pharma, Tokyo, Japan) at Tohoku University Hospital. We excluded patients who did not receive the scheduled maintenance treatment within 8 weeks due to either primary non-response or intolerance to the agents. Primary non-response was defined as a case where IFX was stopped in the induction phase (first three administrations) due to a lack of or unsatisfactory agent response. Intolerance was defined as a case where IFX was stopped because of an adverse event.
CD was diagnosed based on endoscopic, radiological, and/or histological findings, in patients who presented with specific features as proposed by the Japanese Ministry of Health, Labor, and Welfare, such as longitudinal ulcer, a cobblestone appearance, and noncaseous epithelioid cell granuloma.
This study included 189 patients with CD. All the patients had IFX treatment for the first time (IFX naive patients). Clinical data of all enrolled patients were obtained from medical records. The time from the introduction to the discontinuation of IFX because of LOR (IFX persistence) was calculated. Patients were followed from the date of IFX treatment initiation to its discontinuation due to LOR or the end of their follow-up. The reason for IFX discontinuation as specified by the treating physician was recorded. The discontinuation of IFX due to LOR was defined as the withdrawal of IFX due to loss of efficiency as determined by biochemical, clinical, and endoscopic data or the need for abdominal surgery related to CD progression.
3. Protocol of IFX administration
IFX was administered to patients with moderate to severe CD, who had an active luminal or perianal disease. There was no indication for IFX in patients with CD who had severe stenosis or internal fistulas; these complications were first treated surgically. During the induction phase of IFX treatment, 5 mg/kg of the drug was administered at weeks 0, 2, and 6. Then, every 8 weeks, 5 or 10 mg/kg of IFX was administered as maintenance therapy.
4. Clinical factors investigated in this analysis
The clinical factors investigated were: gender, age at diagnosis (< 21 or ≥ 21 years), duration of the disease at the start of IFX therapy (< 7 or ≥ 7 years), BMI at the start of IFX therapy (< 19 or ≥ 19), disease location (ileal, ileocolonic, or colonic), disease behavior (inflammation, stenosis, or fistula), presence of perianal disease (perianal fistulas and abscess, anal ulcers and stenosis), history of intestinal resection, smoking at the start of IFX, concomitant elemental diet (< 900 or ≥ 900 kcal/day), concomitant thiopurine use, serum albumin levels at the start of IFX therapy (< 3.7 or ≥ 3.7 g/dl), and CRP levels at the start of IFX therapy (< 0.6 or ≥ 0.6 mg/dl). Median values in our cohort for disease duration, age at diagnosis, BMI, serum albumin, and CRP levels at the start of IFX therapy were employed as the cut-off values.
5. Genotyping and quality control
Standard phenol-chloroform extraction precipitation was used to isolate peripheral blood leukocyte genomic DNA using the PAX gene DNA Kit (BD Bioscience, Franklin Lakes, NJ, USA) or the NA1000 Automated Nucleic Acid Extraction Machine (Kurabo, Osaka, Japan). The Japonica Array V1 (Thermo Fisher, Tokyo, Japan), a single nucleotide polymorphism (SNP) array designed specifically for Japanese individuals [17], was employed to conduct GWAS genotyping. For genotype calling, the Affymetrix Power Tools (version 2.10.2.2; Thermo Fisher Scientific, Waltham, MA, USA) were used. The quality control criteria, as recommended by Affymetrix, were a sample call rate of > 97% and a dish quality control of > 0.82. The SNPs were categorized by cluster separation using the SNPolisher package (version 1.5.2; Thermo Fisher Scientific, Waltham, MA, USA). A subsequent analysis was conducted on 643,411 SNPs categorized as “recommended.” Identity by descent probabilities (PI_HAT) was estimated using PLINK 1.90 software [18], and cryptic relatives were detected by the maximum unrelated set identification (IMUS) method implemented in PRIMUS (version 1.8.0) using a minimum PI_HAT value of 0.1. As part of quality control, samples of cryptic relatives (PI_HAT > 0.5) and those with genotyping rates < 97% or call rates < 0.97 were excluded from further analysis.
The corresponding SNP and sample quality-controlled genotype data of 643,496 SNPs from 189 cases were employed for further investigation. The CrossMap program [19] was employed to transform the genomic coordinates of this data from hg19 to GRCh38 to match the imputation panel’s genomic coordinates.
6. Imputation
For quality control before imputation, SNPs with Hardy-Weinberg equilibrium P-value < 1E-5 were excluded, and 613,834 SNPs on autosomal chromosomes were included for further analysis. The imputation panel was an in-house constructed haplotype panel comprising the haplotypes of 12,343 individuals from diverse populations including 2,493 individuals from the International 1000 Genomes and 9,850 individuals from biobanks of National Center Biobank Network. BEAGLE’s comfort-gt program removed variants that did not match alleles in the reference panel. Subsequently, we used default parameters to run an imputation in BEAGLE 5.2. SNPs with call rate < 0.97, MAF < 0.05, Hardy-Weinberg equilibrium P < 1E-6, or information metric (INFO score) < 0.5 were excluded. After exclusion, the genotyped or imputed data of 5,700,569 SNPs including rs2097432 from 189 cases were used for the genetic analysis.
7. Principal component analysis
Outliers in the sample were detected by principal component analysis of linkage disequilibrium (LD)-independent SNPs using the PLINK 1.90 software. The following LD pruning was carried out using PLINK: -indep-pairwise 50 5 0.1. The samples were nearly homogeneous, as shown by the plot of the top two principal components (PC1 and PC2) for each sample (Supplementary Fig. 1).
8. Statistical analysis
The study design is summarized in Fig. 1. In the univariate analysis of the cumulative discontinuation-free rates of IFX for the clinical factors, the log-rank test was used. A Cox proportional hazards model was conducted in the multivariate analysis of the cumulative discontinuation-free rates for the clinical factors. P-value < 0.05 was considered statistically significant.
In the analysis for the genetic factors of IFX persistence, we first looked at the association between the HLA-DQA1*05 (rs2097432) and cumulative discontinuation-free rates of IFX. The cumulative discontinuation-free rate for rs2097432 variant (C allele) carriers was calculated by log-rank test and the Cox proportional hazards model adjusted by baseline serum albumin levels, concomitant thiopurine therapy, disease location, and the first two principal components. The correlations between the categorical data in Supplementary Table 1 were evaluated by a two-sided Fisher exact test. The association of quantitative data between two groups in Supplementary Table 1 was examined by an unpaired t-test.
We then performed unbiased GWAS for IFX persistence with 5,700,568 quality-controlled SNPs. Using the R package gwasurvivr (version 1.12.0, https://github.com/suchestoncampbelllab/gwasurvivr), a Cox proportional hazards model adjusted by baseline serum albumin levels, disease location, and the first two principal components as covariates were carried out in GWAS. The R software (version 4.1.3, http://www.r-project.org/) was used for all the statistical analyses.
In the analysis of rs2097432, P-value < 0.05 were considered as significant. In GWAS, SNPs with P-values < 5E-8 were considered as genome-wide significant and SNPs with P-values < 1E-6 were were considered as candidates. Among these candidate SNPs, we used the “clump” procedure in PLINK 1.90 software to summarize candidate variants into independent candidate loci considering the LD information. We used R2 > 0.1 (--clump-r2 0.1) and SNPs within 250 kb from the lead SNP (--clump-kb 250) as the LD parameters. The Locus Zoom application [20] was used to generate regional association plots around tag SNPs.
9. Pathway analysis
Pathway analysis with MAGMA [21] was performed using GWAS results. MAGMA first computed the gene-level P-values employing the weighted sum of the related statistics for SNP sites in the region (25 kbp upstream and downstream), considering the local LD structures. Thereafter, a pathway-level statistical inference was performed based on a multiple linear–principal components regression model using biologically functional databases, such as Reactome and KEGG. In total, 1,373 pathways were investigated and pathways with P-values < 3.64E-5 (0.05/1,373) were considered as significant and pathways with P-values < 0.01 were considered as candidates.