Arfid Genes and Environment (ARFID-GEN): Study Protocol

Background The Avoidant Restrictive Food Intake Disorder Genes and Environment (ARFID-GEN) study is a study of genetic and environmental factors that contribute to risk for developing ARFID in children and adults. Methods A total of 3,000 children and adults with ARFID from the United States will be included. Parents/guardians and their children with ARFID (ages 7 to 17) and adults with ARFID (ages 18+) will complete comprehensive online consent, parent verification of child assent (when applicable), and phenotyping. Enrolled participants with ARFID will submit a saliva sample for genotyping. A genome-wide association study of ARFID will be conducted. Discussion ARFID-GEN, a large-scale genetic study of ARFID, is designed to rapidly advance the study of the genetics of eating disorders. We will explicate the genetic architecture of ARFID relative to other eating disorders and to other psychiatric, neurodevelopmental, and metabolic disorders and traits. Our goal is for ARFID to deliver “actionable” findings that can be transformed into clinically meaningful insights. Trial registration: ARFID-GEN is a registered clinical trial: clinicaltrials.gov NCT05605067


Background
We describe the Avoidant Restrictive Food Intake Disorder Genes and Environment (ARFID-GEN) study, which is designed to expand the discovery of genetic and environmental contributions to ARFID risk.

ARFID-GEN builds on previous and ongoing genome-wide association studies (GWAS) by the Eating
Disorders Working Group of the Psychiatric Genomics Consortium (PGC-ED) as part of a global effort to fully characterize the genetic architecture of all eating disorders (EDs) and explore their relation to each other and to other psychiatric, neurodevelopmental, and metabolic/anthropometric traits.
ARFID-GEN will ascertain, phenotype, and genotype a large sample of children and adults with ARFID.We will apply advanced analytic strategies to test and re ne an etiological model of ARFID, explicate heterogeneity, and simultaneously document environmental risk factors for ARFID.
ARFID is associated with high personal and family emotional and nancial cost.ARFID, present in 2-5% of the population (1), is marked by the avoidance and/or restriction of food intake resulting in signi cant weight loss or nutritional de ciency, dependence on feeding supplements, and/or interference with psychosocial functioning.Unlike other EDs, food restriction is not driven by weight and shape concerns, and ARFID may be equally common in males and females (1).The Diagnostic and Statistical Manual of Mental Health Disorders 5th edition (DSM-5) (2) describes three predominant ARFID presentations that likely overlap (3): (1) sensory sensitivity (i.e., rejection of food based on sensory qualities such as texture), (2) phobic avoidance of food (i.e., concern about aversive consequences of eating, such as fear of choking), and (3) low interest/appetite.Psychiatric, neurodevelopmental, and somatic medical comorbidities are common, with ~ 50% of ARFID cases having a co-occurring diagnosis (4).
The three key dimensions of ARFID identi ed are viewed as symptom clusters rather than distinct presentations.The sensory dimension encompasses rejection of food based on sensory qualities (e.g., taste, smell, texture, temperature, appearance) and is the most common reason for referral (9).When phobic avoidance is the dominant presentation, conditioned aversion may play a role (10,11).A genetic predisposition to anxiety may increase the risk of children developing ARFID after an aversive feeding experience like gagging or witnessing someone choke or vomit (12).The low interest/appetite dimension likely captures symptoms previously described as infantile anorexia and food avoidance emotional disorder (13).

Speci c Aims.
Aim 1a.Ascertainment of 3,000 ARFID cases.Leveraging the existing Eating Disorders Genetics Initiative (EDGI) (14) infrastructure at the University of North Carolina at Chapel Hill (UNC), we will ascertain 3,000 children and adults with ARFID.Appropriate controls will be sourced from archived, genotyped repositories such as the National Institute of Mental Health (NIMH) Genomics Repository, database of Genotypes and Phenotypes (dbGaP).We will conduct e cient online phenotyping of children and adults with ARFID including environmental exposures and at-home saliva sampling for deoxyribose nucleic acid (DNA).We will genotype new samples using contemporary methodology.Aim 1b.Validation substudy.We will interview parents/guardians of 25 children with ARFID and 25 adults with ARFID with the Pica, ARFID and Rumination Disorder Interview (PARDI) (15) to further validate our online ARFID diagnostic battery.
Aim 2: Within-disorder ARFID GWAS.We will conduct comprehensive phenotypic and genomic analyses: single nucleotide polymorphism (SNP)-based heritability, GWAS (imputable to minor allele frequency [MAF] ≥ 0.005), genetic correlations (r g s), polygenic risk scores (PRS), standard post-GWAS analyses of the non-mutually exclusive ARFID presentations, and rare copy number variants (CNVs) and CNV burden.Hypotheses: We will identify genome-wide signi cant loci for ARFID, informative r g s, implicated CNVs, and environmental precipitants.Aim 3. Genetic relation of ARFID to other eating disorders.We will test if ARFID shares a core set of genetic factors with other EDs yet is differentiated by disorder-speci c genetic factors.We will conduct a set of cross-disorder genomic analyses to map genetic interrelations between ARFID and other EDs including: (a) cross-disorder GWAS meta-analysis to identify loci with pleiotropic effects, (b) if indicated, calculate r g s and conduct Mendelian Randomization (MR), multi-trait conditional and joint analysis (mtCOJO) (16), disorder-speci c SNP associations, Multi-PRS (17), and, (c) genomic structural equation modelling (GSEM) (18) to examine genome-wide architecture of ARFID relative to other EDs.Based on our preliminary data, we predict that ARFID will show the strongest genetic association with AN.Aim 4. Genetic relation of ARFID with psychiatric, metabolic/anthropometric, neurodevelopmental, and other relevant phenotypes.To test a conceptualization that ARFID has speci c genetic associations with psychiatric, metabolic/anthropometric, and neurodevelopmental phenotypes we will apply: (a) approaches as in Aim 3b, and (b) GSEM to examine genetic, psychiatric, neurodevelopmental, and metabolic/anthropometric factors associated with ARFID.We predict that high sensory sensitivity will be related to autism spectrum disorder (ASD), high levels of phobic avoidance to anxiety and obsessivecompulsive disorder (OCD), and high levels of low interest/appetite to AN. Deliverables: (a) dissection of converging and diverging relations among ARFID and other traits informing and re ning its etiology; (b) genetic assessment of ARFID's relation to other phenotypes, informing nosology.

Participants.
Objective.We will engage the infrastructure utilized in EDGI (14) to ascertain 3,000 children and adults with ARFID.Controls will be ascertained from data repositories from other genomic studies.
Case de nition (ARFID).Inclusion criteria.Individuals ages 7 + who meet DSM-5 criteria for ARFID.An age 7 + cut-off allows adequate phenotyping with a range of valid instruments available for relevant constructs for this age and older.We anticipate most cases will be pediatric as ARFID typically onsets in childhood.Following a brief online eligibility pre-screen, participants/parents/guardians will be screened for eligibility (for themselves or, in the case of parents/guardians, for their children) and for diagnostic purposes using the questionnaires listed in Supplement and Table 1.Exclusion criteria: current Eating Disorder Examination-Questionnaire (EDE-Q) global score greater than 4.0 or self-induced vomiting, laxative use, or more than four episodes of loss of control eating/binge eating in the past 28 days (DSM-5 ARFID diagnosis requires rule-out of other eating disorders).However, we will carefully monitor those screening eligible and ineligible for the study to evaluate this exclusion criterion, because adequate longitudinal data do not yet exist documenting the frequency with which ARFID may transition to other eating disorders over time.* Item 9: "Thoughts that you would be better off dead, or of hurting yourself in some way?" was not included † Items about major depressive disorder (MDD) were not included ‡ Modi ed for this study Recruitment.
We will use a multi-pronged recruitment approach including: outreach to ED clinicians and programs across the country, traditional media (press releases and newspaper announcements), and social media and other online platforms (websites, Facebook, Twitter, Instagram, and podcasts).This includes use of social media ads on Facebook and Instagram, which have proven successful for other similar studies (19).We enrolled a group of ARFID-GEN parent stakeholders who have consulted with us on study design prior to nalizing our methods. Procedure.
Self-report measures.(27).Table 1 contains the series of assessments and which version is used.If the child is determined eligible by parent/guardian-and self-report, they are asked to complete additional questionnaires and provide a saliva sample.A saliva sample collection kit for the child with return packaging is mailed to the parent/guardian with directions addressed to the parents/guardian to oversee the saliva collection from the child.Parents/guardians mail back the completed saliva sample.
Once the kit is received by the study team and all questionnaires are nished, participation in the study is complete.For a owchart of study procedures, see Fig. 1.
Adults (age 18+) with ARFID.Adults visit the website (ar dgen.org) and click the "Take Our Survey" link.
Participants complete a brief pre-screen, providing online informed consent for the entire study, have the option to consent to be recontacted for future studies, and provide contact information.Next, they complete the NIAS and the adult version of the PARDI-AR-Q to con rm the presence of ARFID, the EDE-Qv6 to rule out other current eating disorder symptoms, and the ED100Kv3.Surveys are presented in REDCap (27).Embedded algorithms determine if participants meet inclusion criteria (DSM-5 criteria for ARFID and no other current eating disorder symptoms that warrant exclusion).If they meet criteria, they are considered enrolled.Enrolled participants are asked to complete additional questionnaires (Table 1) and provide a saliva sample.A saliva collection kit is mailed to the participant's home and the completed sample is returned.Participation in the study is complete once the kit is received by the study team and all questionnaires are nished.For a owchart of study procedures, see Fig. DNA extraction and genotyping.DNA extraction and GWAS genotyping are standard.We will use the most contemporary chip appropriate for diverse ancestry populations when genotyping occurs.
Planned Data Analysis.
Aim 1a.Ascertainment of 3,000 ARFID cases.We will conduct descriptive analyses to characterize the sample strati ed by pediatric and adult cases.We will report demographics, symptom patterns, onset, course of illness, comorbid psychiatric conditions, neurodevelopmental characteristics, environmental exposures, and health-related quality of life in child (parent/guardian and child report) and adult ARFID cases.
Aim 1b.Validation study.Diagnoses obtained with the PARDI interview will be compared with those obtained with the online ARFID-GEN battery.We will calculate positive predictive value (PPV) to con rm diagnostic properties of our ARFID-GEN battery.Data on weight, height, body mass index (BMI), and BMI percentile (for those < 18) will be veri ed during remote interviews.
Aim 2: Within disorder ARFID GWAS.We will conduct a comprehensive set of genomic analyses including SNP-based heritability, GWAS, r g s, and PRS, and standard post-GWAS analyses of ARFID and the nonmutually exclusive ARFID presentations, and rare CNVs and CNV burden.The following sections brie y summarize the Aim 2 analyses; a more detailed description can be found in the Supplement.a) PGC Ricopili pipeline supports rapid analysis.As described in our previous PGC-ED publications (28), and the Supplement, we will use "Ricopili" software (29) for pre-imputation quality control, principal components analysis (PCA), imputation, and meta-analysis.Brie y, we will rst follow standard methods to retain high quality SNPs and subjects.Next, we will perform imputation using the largest available resources (currently Haplotype Reference Consortium [HRC]) (30) updating to Trans-omics for Precision Medicine (TOPMed) (31) (N = 65K 30x whole genome sequencing [WGS]).Ancestry will be assessed using PCA for each subject, mapped relative to reference samples of known ancestry.Consistent with our intention to include non-European ancestries, we will use established Psychiatric Genomics Consortium (PGC) cross-ancestry analytical approaches.
b) Post-GWAS analyses.The following outlines standard analytic strategy for post-GWAS analysis to maximize information yield and interpretability.The eld and methodology evolve rapidly.Below represents what we would do today; however, novel proven methods may emerge before analyses are conducted.Greater detail can be found in the Supplement.
Analysis of chrX.X chromosome (chrX) variants in the pseudo-autosomal regions will be handled separately.SNPs with MAF > 0.01 and INFO > 0.70 will be retained.
Females and Males.We will conduct secondary GWAS analysis separately on females and males to determine similarity of the results to the primary combined GWAS.
Clumping.GWAS results implicate genomic regions ("loci").To de ne a locus, SNPs with P < 5×10 − 8 will be identi ed and "clumping" will be used to convert signi cant SNPs to regions.
Conditional and joint analysis.Conditional and joint analysis will be conducted using genome-wide complex trait analysis-conditional and joint analysis (GCTA-COJO) (32).GCTA-COJO investigates every locus with a joint combination of independent markers via a genome-wide SNP selection procedure.
Functional genomic integration.We now routinely use functional genomic results from CommonMind, PsychENCODE, and other efforts to understand GWAS results (33).Much of this is automated in the Functional Mapping and Annotation of Genome-Wide Association Studies platform (FUMA) (34).We also integrate brain single cell ribonucleic acid sequencing (RNA-seq) data to identify the cell types implied by the GWAS results.See the PGC major depressive disorder (MDD) paper for examples (35).
Polygenic risk scores (PRS).PRS aggregate risk alleles across the genome weighted by effect sizes.We will use both the classical p-value thresholding method and a summary statistics version of Bayesian multiple regression (SBayesR) (38).
ARFID presentation analysis.ARFID presentations are overlapping.We will examine the association between ARFID PRS and: a) PARDI-AR-Q subscale scores (sensory sensitivity, phobic avoidance, and low interest/appetite) as continuous scores; and b) a variable categorizing individuals as having a predominant ARFID presentation (e.g., sensory vs. phobic vs. low interest/appetite) de ned as highest subscale score.
Gene-wise analysis.Multi-marker analysis of genomic annotation (MAGMA) (39) will be used to perform gene-wise tests of association with ARFID based on GWAS summary statistics.MAGMA generates genebased p-values by combining SNP-based p-values within a gene while accounting for linkage disequilibrium (LD).
Partitioned heritability.Partitioned heritability will be investigated using strati ed LDSC (40), which estimates the per-SNP contribution to overall SNP-heritability (SNP-h 2 ) across functional annotation categories.
Gene expression.We will investigate whether ARFID heritability is enriched in tissue/cell type speci cally expressed genes using publicly available gene expression data (e.g., the Genotype-Tissue Expression project [GTEx]).
Predicted tissue-speci c gene expression.We will predict differential gene expression using S-PrediXcan v1.0 (41) and genomic and transcriptomic reference data from the brain regions assayed in CommonMind, GTEx and other resources.
Pathway analyses.We will evaluate whether genes associated with ARFID are enriched in speci c pathways, tissues, or cell types.To do this, we will use FUMA to annotate SNPs, identify independent loci, perform pathway analysis, and integrate with a wide array of functional genomic data including gene expression, single cell gene expression, and all available brain epigenetic information.c) Discover structural variation associated with ARFID.The PGC CNV working group is actively optimizing CNV calling using global screening array (GSA) data and we will follow these developments closely.We will maximize comparability between cases and controls by applying rigorous quality control (QC) (42)(43)(44).Following QC, we will t a series of linear models with CNV burden (at different size thresholds) as the dependent variable to investigate both disease status and experimental biases that can potentially confound CNV detection.
Aim 3. Genetic relation of ARFID to other eating disorders.
Greater detail on Aim 3 analyses can be found in the Supplement.
Analytic plan.a) Disorder-speci c GWAS.We will conduct disorder-speci c GWAS for ARFID (and AN, BN, and BED as part of other projects) combining ARFID-GEN with EDGI and existing PGC-ED data using imputed variant dosages and an additive model.We anticipate aggregate sample sizes for the PGC-ED by ~ 2025 (when our collection would be complete) of: (Projected: ARFID = 3,000, AN = 50,000, BN = 20,000, BED = 20,000, ARFID-GEN controls [sourced from existing data repositories] = 42,439, PGC-ED controls = 543,967).GWAS meta-analysis will be conducted with ARFID-GEN and any other ARFID samples that may be introduced to the PGC-ED by time of analysis.
Combined eating disorder GWAS.Combining ARFID-GEN with PGC-ED GWAS of AN, BN, and BED (which will be available by the time our data are analysis-ready), conduct GWAS meta-analysis of all EDs and of component behaviors that cross-cut disorders (e.g., binge eating, restriction), increasing power to identify genetic risk factors that are common across the four disorders.
b) Genetic correlations.Common variant-based genetic correlations (SNP-r g ) measure the extent to which two traits or disorders share common genetic variation.We will calculate SNP-r g for ARFID, and selected traits using GWAS summary statistics via an analytical extension of LDSC (36, 37), as well as explore genome-based restricted maximum likelihood (GREML), which was recently shown to have higher accuracy than LDSC (45).Risk factors classically considered as environmental, such as exercise for AN, are now known to be complex traits underpinned by genetic and non-genetic factors.Large community cohorts like United Kingdom (UK) Biobank, generate GWAS summary statistics for a plethora of phenotypes (e.g., diet, medication, blood metabolites).It is very fast to estimate r g from GWAS summary statistics, and this screening process has the potential to identify unknown associations which will guide the hypotheses of down-stream analyses.
Generalized summary data-based Mendelian randomization (GSMR).Guided by the r g estimated above and our hypotheses, we will perform bidirectional MR analyses to investigate causal relationships between correlated traits and ARFID.For example, it is reasonable to test hypotheses of causality using anxiety as an exposure for ARFID.Signi cant MR results must be reviewed with caution with respect to strong conclusions about causality as unmeasured confounders may exist.Nonetheless, these analyses are potentially exceptionally informative.MR analyses take SNPs that are genome-wide signi cant for one trait (the exposure) and test the correlation in effect sizes in a second trait (the outcome).Under pleiotropy, there is an expectation that the mean effect size in the outcome trait is different from zero, but under causality a directional relationship in effect sizes is expected.Different versions of MR analysis are highly related; at the time of analysis, we will implement best practices for MR.
Multi-trait-based conditional and joint analysis.Results from a) and b) will inform these analyses.For example, we expect to detect genetic correlations of BMI with EDs, and MR analyses will aid interpretation of correlation by causality.We will conduct conditional GWAS analyses to determine if the detected SNP associations for EDs can be explained through their relationship with correlated traits.We will perform a multi-trait-based conditional and joint analysis (GCTA-mtCOJO) (16) using an extension of GCTA (46).
This method uses summary-level data to perform conditional analyses.Based on our previous work, we expect to condition the results of our ARFID GWAS on the best available GWAS results for relevant traits including intelligence quotient (IQ), education years, type 2 diabetes, high density lipid cholesterol, BMI, schizophrenia (SCZ), MDD, ASD, OCD, and neuroticism.Comparing these results to those generated from unconditional GWAS results (i.e., from a above) provides insights into the forces shaping the shared genetic relationships between disorders.
Disorder-speci c SNP associations.A key question in ED research is to understand the differences as well as the similarities among the EDs.This question is very similar to the one posed by PGC colleagues interested in understanding the similarity/differences between SCZ and bipolar disorder (BIP) (47).
Guided by their analyses, we will conduct case vs. case (e.g., ARFID-AN) GWAS.This approach is powerful if case samples can be grouped (e.g., if genotyped together so that technical confounding factors are not present), as sampling errors associated with control allele frequency estimates are avoided.
Multi-polygenic score (MPS).To gain insight into factors underlying ED heterogeneity, we will use the largest available GWAS summary statistics for psychiatric and somatic disorders/traits and combine derived PRS into MPS to predict target outcomes (ARFID, AN, BN, or BED diagnosis), and then more granular phenotypes (e.g., age of onset, severity, low weight).Combining ARFID-GEN with EDGI and existing AN, BN, and BED cohorts in the PGC-ED samples should yield a sample size of ~ 95,000 ED cases and > 500,000 controls.We will increase power by using MPS to combine the predictive power of several PRS in one regression model.Training data will combine the best available GWAS summary statistics on psychiatric, metabolic, BMI, anthropometric, personality, physical activity, and educational phenotypes.These PRS will be used as genetic predictors in models of ARFID, AN, BN, BED, and more granular phenotypes.This approach is ideal for our overarching intentions of predicting outcomes rather than discovering their etiology (i.e., using, not nding genes).MPS is valuable when trait prediction is a priority.c) GSEM.We will use lifetime ARFID, AN, BN, and BED GWAS summary statistics.We will employ GSEM (18) to identify genetic factors for ARFID and associated EDs.GSEM is a multivariate method for analyzing the joint genetic architecture of complex traits.By modeling covariance structure, GSEM synthesizes genetic correlations and SNP heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap.GSEM analyses will include several steps, including a factor analysis of correlated traits, estimating SNP effects and computation of factor-level PRS.
Aim 4. Genetic relation of ARFID with psychiatric, metabolic/anthropometric, neurodevelopmental, and other relevant phenotypes.Analyses will parallel Aim 3 only with an outward focus on traits other than EDs.For r g s, we will follow methods applied in the Anorexia Nervosa Genetics Initiative-PGC-ED (ANGI-PGC-ED) Freeze 2 analysis (28), adding additional traits as GWAS summary statistics become available.
Only GWAS summary statistics are needed for GSEM, many of which are publicly available.Phenotypes of interest are not directly evaluated in the target sample.Related traits for GSEM may include: lifetime anxiety disorder (ANX), neuroticism, BMI, fat mass, and fat-free mass (available in the UK Biobank); ASD, attention-de cit/hyperactivity disorder (ADHD), post-traumatic stress disorder (PTSD), MDD, and OCD GWAS summary statistics (available from the PGC).

Discussion
In addition to the science, we will create rich data and sample resources for the pursuit of related research questions.Our analytic aims are dense, but results will inform follow-on research questions such as: 1) How do genetic and environmental exposures act and co-act to in uence risk for ARFID?

Declarations
Ethics approvals and consent to participate ARFID-GEN was approved by the University of North Carolina Biomedical Institutional Review Board (IRB) Protocol # 22-1524.Informed consent will be obtained from all participants and/or their legal guardian(s).Child assent will also be obtained.All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication
Not applicable Figures

1 .
Saliva sampling.Saliva samples are collected with Isohelix saliva collection kits and returned to the Center for Psychiatric Genetics (CPG) Biorepository at UNC. Gift cards.Parents and adult participants are sent a gift card ($25) once all required questionnaires are complete and their spit kit is received by the study team.

Table 1
* Item 9: "Thoughts that you would be better off dead, or of hurting yourself in some way?" was not included † Items about major depressive disorder (MDD) were not included ‡ Modi ed for this study * Item 9: "Thoughts that you would be better off dead, or of hurting yourself in some way?" was not included † Items about major depressive disorder (MDD) were not included ‡ Modi ed for this study

Table 1
Children ages 15-17 complete the NIAS, PARDI-AR-Q, EDE-Qv6, and the ED100Kv3 (26).Embedded algorithms determine if the child is eligible for the study.If they meet criteria, they are considered enrolled.All consents, assents, and study questionnaires are completed online using Research Electronic Data Capture (REDCap) (20)nts/Guardians of children ages 7-17 and their children.Parents/guardians of children with ARFID who are interested in participating in the study will visit the website (ar dgen.org) and select "Take Our Survey" with the child available.The rst step requires the parent/guardian to answer prescreen questions and to consent to the study which includes answering questionnaires and having their child provide a saliva sample for DNA extraction.Parents also have the option to consent to be recontacted for future research.To determine eligibility of the child, parents/guardians then complete a parent report version of the Nine Item ARFID Screen (NIAS-PR)(20); a parent version of the ARFID self-report version of the Pica, ARFID, and Rumination Disorder Interview ARFID questionnaire (PARDI-AR-Q)(21); and Version 2.0 of the Parent Version of the Eating Disorder Examination-Questionnaire (PEDE-Qv2.0)(22),developedbyKLLoebbasedontheEatingDisorderExamination-Questionnaire Version 6 (EDE-Qv6)(23).If the child is eligible based on parent/guardian report, the parent/guardian provides contact information, the child provides assent to participate, and the parent/guardian acknowledges that the child provided verbal assent as well.Children ages 7-13 then complete the ChEDE-Q8(24), a child version of the 8-item short form of the Eating Disorders Examination-Questionnaire(25).Children age 14 complete the NIAS(20), PARDI-AR-Q (21), and ChEDE-Q8.
uence course of illness (e.g., predict who is at risk for developing persistent vs. transient ARFID)?4) Can we answer precision-medicine questions regarding identi cation of optimal interventions informed by genotype and environmental exposures?Given the paucity of effective interventions for ARFID (and other EDs) (48), ultimately, we hope our work will yield information on critical biological pathways that may point toward drug discovery or repurposing that could aid in reversing the tenacity and lethality of these illnesses.
2) If detected, how will carriers of CNVs differ on PRS and environmental exposures?3) Can we identify genetic factors that in