Phonological Awareness Mediates the Relationship between DCDC2 and Reading Performance with the Influence of Home Environment

Proficient reading requires critical phonological processing skill that interact with both genetic and environmental factors. However, the precise nature of the relationships between phonological processing and genetic and environmental factors are poorly understood. We analyzed data from the Genes, Reading and Dyslexia (GRaD) Study on 1,419 children ages 8 to 14 years from African-American and Hispanic-American family backgrounds living in North America. The analyses showed that phonological awareness mediated the relationship between DCDC2-READ1 and reading outcomes when parental education and socioeconomic status was low. The association between READ1 and reading performance is complex, whereby mediation by phonological awareness was significantly moderated by both parental education and socioeconomic status. These results show the importance of home environment and phonological skills when determining associations between READ1 and reading outcomes. This will be an important consideration in the development of genetic screening for risk of reading disability.


Introduction
Pro cient reading is critical for success in school as well as lifetime earning potential. Children with low reading ability are more likely to live in poverty and have higher rates of unemployment as adults 1 . A great deal of research has been devoted to investigating the predictors of reading outcomes, including measures of individual word reading and reading comprehension, but the general consensus among reading researchers is that phonological processing skill, particularly phonological awareness, is a signi cant determinant of reading outcome 2,3 . Phonological processing refers to the use of phonological information in decoding written language 4 . Children with advanced phonological skills tend to have successful reading outcomes, whereas lower phonological skill is associated with reading di cultie 5,6 .
In addition to phonological skills, reading is also in uenced by genetic factors. In studies of twins, the heritability of reading is high, ranging from .46 to .72 7,8 . However, heritability estimates are not uniformly high in molecular genetic studies of unrelated individuals assessed with single nucleotide polymorphisms (SNPs). While the differences in heritability estimates between family studies and SNP studies in psychiatric disorders is well known -frequently characterized as "missing heritability" -it is also likely that the relationship between genes and reading is indirect. Few studies rigorously address the connection between genes and reading outcomes. The prominent role of phonological skill in reading performance suggests that it may function as a mediator between speci c genes and reading pro ciency.
Approximately 18 genes have been associated with reading performance, but association with only 8 genes has been replicated three or more times: CMIP, ATP2C2, FOXP2, ROBO1, DYXC1, KIAA0319, DCDC2, and CNTNAP2 9 . Among them, only KIAA0319 and DCDC2 are located in the most replicated reading locus (DYX2; chromosome 6p22), and within both genes the peaks of association lie within regulatory features. We previously showed that we could identify children with a speci c phonological de cit with variants of READ1, a regulatory element encoded within a known RD risk gene called DCDC2 10,11 .
Home environment is another factor related to reading and it can in uence the relationship between genetics and reading. The variance due to genetic in uence varies because home environment moderates genetic in uences on reading outcomes 12 . Home environment may function as a condition on the mediating effect of phonological skill in the gene-reading linkage. In the present study, we aimed to examine the mediating role of phonological awareness between genes and reading. We were also interested in the moderating role of home environment to in uence this mediation effect. By simultaneously considering the roles of phonological awareness and home environment, a moderated mediation model was tested to provide guidance in understanding how genetics affects reading performance.

The Connection between Phonological Processing and Reading Outcomes
It is well established that developmental and individual differences in phonological processing are causally related to reading ability in both longitudinal and experimental research 2,13 . Furthermore, de cit in phonological processing is a contributor to reading disability 14,15 .
Phonological awareness, a major component of phonological processing, refers to the sensitivity to and ability to manipulate sounds or sound structures of words. It is a powerful concurrent and longitudinal predictor of reading development 3,4,16 . According to the phonological de cit hypothesis, phonological awareness is a critical factor explaining di culties in reading 17 . Interventions on phonological awareness training prove to be effective in improving reading performance of children with reading disability 5 . Thus, the connections between phonological awareness and rapid automatized naming and reading outcomes are well established and evidenced.

The Connection between Genetics and Reading Outcomes
The research showed signi cant and substantial genetic in uences on reading performance 18 . For those who have di culties in reading, research has shown that reading problems tend to run in families 19 . Twin studies research has shown large genetic in uences on both word reading and reading comprehension from samples in Colorado, Ohio, Florida, England, Australia, and Scandinavia 7,20,21,22,23 . Although behavioral genetics studies can tell us whether reading is affected by genetic in uences, they do not tell us which risk gene(s) can in uence reading. This issue points to the high need of molecular genetics research.
Patterns of heritability toward genetic contribution and molecular genetics have identi ed risk genes that may cause reading di culties 10,24,25,26,27 . Among the identi ed risk genes that are associated with reading di culties, DCDC2, located on chromosome 6p22, is the most replicated risk gene 10,11 . Research has shown that READ1 (regulatory element associated with dyslexia 1) is a regulatory element encoded in intron 2 of DCDC2 and is a highly polymorphic complex tandem repeat with at least 40 alleles 10,11,28,29 . Among these alleles, RU2-Short that includes 6 or fewer copies of repeat unit 2 (alleles 4, 10, and 16 etc.) is considered a highly risk genetic variant of reading di culties 30 . Clinical studies showed READ1 allele-speci c association with severe reading and language impairment 29 . Nevertheless, the identi ed genes account for only a small portion of variance in reading di culties 31 . The READ1 in the DYX2 locus should be further studied for its effects on reading performance.

The Connection between Genetics and Phonological Awareness
Previous studies have consistently found genetic in uence on phonological awareness 8, 32,33 . Furthermore, studies have shown that genes are responsible for the interrelations among phonological awareness and reading-related outcomes. For example, genetic in uences were found to explain the comorbidity among phonological and orthographic skills 8 or covariance between phonological awareness, rapid naming, and reading outcomes 33 .

The Genetic x Environment In uence on Reading
Home environment is crucial in literacy development. Parental education and socioeconomic status (SES) are important indicators of home environment 34,35,36,37,38 . Education is considered one of the most stable variables as it is usually established early in life and does not change over time.
Parental education is highly correlated with children's reading achievement 39,40 . SES is typically the most direct measure of family wealth and meta-analyses have demonstrated that SES is highly correlated with student achievement 40,41 .
Two models have been proposed to understand the relationship between genetic and environmental in uences (G x E) on reading. One is the bioecological model which indicated that genetic in uences should be greater in advantaged environment because genetic potential would be more fully realized in the supportive environments than in the poor environments 42 . The other is the diathesis-stress model which suggested that heritability should be greater in poor environments because deleterious genes may not be observed in more supportive environments. Both models are reasonable accounts of G x E interactions on reading. For example, individuals who carry the deleterious genes to put them as being reading disabled may experience the disadvantaged environment and such environmental triggers can activate deleterious genetic in uences. Conversely, individuals who have the good genes may experience the supportive environments which may realize this genetic potential.
G x E interactions have been examined in reading abilities and disabilities 1,12,20,43 but the ndings are mixed. For example, Kremen et al. (2005) 43 found a shared environment x parental education interaction but not genetic x parental education interaction in a middle-aged men sample. This nding was con rmed by Taylor and Schatschneider's (2010) study 20 . In Taylor and Schatschneider (2010), greater shared environmental in uence than genetic in uences were observed for rst grade reading for the low-income group but not for the middle and high income groups. In contrast, Friend, DeFries, and Olson (2008) 1 examined 545 identical and fraternal twins with at least one member of the pair who had the reading disability. They reported a G x E interaction and found that genetic in uence was higher and environmental in uence was lower among children whose parents had a high level of education. The heritability of low reading ability was signi cantly higher among children whose parents had higher levels of education, indicating that parental education moderated genetic in uences on reading disability. Friend et al. (2009) 12 further explored identical and fraternal twins with typically developing reading abilities from US and UK and reported that the heritability of high reading ability increased signi cantly with lower levels of parental education in both samples. Children whose parents had lower levels of education tend to have stronger genetic in uence on their high reading ability. However, in a similarly aged sample, no moderating effects of parental education on genetic in uences were found 44 . In addition, brain-behavior relationships critical for reading development are more pronounced in low SES environments 45 .
Overall, the ndings from G x E interactions on reading ability are mixed. Much of the research on these topics has focused on behavioral genetics rather than molecular genetics and the G x E interaction research is mainly limited to twin studies. Further research is necessary to understand how genes interact with environment to affect reading ability. Ideally, a study testing this moderating effect should include molecular genetics with the identi ed genes that in uence reading.

The Present Study
In the present study, we hypothesize that (1) phonological processing skill mediates the relationship between READ1 and reading outcomes including word reading and comprehension; and (2) that environmental factors moderate the mediation effect of phonological processing skill. To test these hypotheses, we analyzed data from the Genes, Reading and Dyslexia (GRaD) Study of 1,419 Hispanic-American and African-American participants, ages 8 years to 15 years. For phonological processing skills we used the Elision and Blending subtests of the Comprehensive Test of Phonological Processing (CTOPP) (Wagner et al., 1999). For reading outcomes we used the Woodcock-Johnson III -Letter-Word Identi cation and Word Attack (Woodcock et al., 2001) to assess word reading accuracy, Test of Word Reading E ciency -Sight Word E ciency and Phonetic Decoding E ciency (TOWRE) (Torgesen et al., 1999) to assess word reading uency, and the Standardized Reading Inventory -Passage Comprehension (SRI) (Newcomer, 1999) to assess reading comprehension. To assess the home environment, we used responses from the parental questionnaire. All subjects were genotyped for the READ1 allele, which were partitioned into three functional groups (see Methods). For the analysis, we tested a mediation model in which the relationship between READ1 genotype and reading outcomes was explained by phonological processing skills. Next, we investigated a moderation model in which the home environment factors moderated the relationship between READ1 and reading outcomes. Finally, we integrated the moderator into the mediation model and tested the moderated mediation model in which the strength of indirect (mediation) effect was conditional on the value of moderator (home environment factors).

Participants
There were 1,419 self-identi ed African-American and Hispanic-American children and adolescents who participated in this study. Their age range was from 8 to 15 years. This study was part of a larger, multi-site US and Canadian collaborative Genes, Reading, and Dyslexia (GRaD) project led by Yale University. The full set of sites included Albuquerque, NM; Baltimore, MD; Boston, MA; Boulder and Denver, CO; New Haven, CT; San Juan, PR; and Toronto, Canada.
Participants with signi cant cognitive delays, behavioral problems, emotional/psychiatric disturbances, chronic neurologic conditions, and documented vision or hearing impairment were excluded.

Woodcock-Johnson Tests of Achievement, Third Edition (WJ-III)
Measures from the WJ-III included the Letter-Word Identi cation and Word Attack subtests (Woodcock et al., 2001). These measures were used to assess word reading accuracy. The WJ-III Letter Word Identi cation subtest asked the participant to read a list of increasingly complex single English words aloud. The Word Attack subtest required the participant to use knowledge of English phonology to decode a list of increasingly complex non-words or pseudowords in isolation. The total score for each subtest was the number of words read correctly. The standard score based upon age norms was then converted from the raw score. A composite score of both subtests was used to assess word reading accuracy in the study.

Test of Word Reading E ciency (TOWRE)
The TOWRE was a timed measure used to assess word reading uency. It included subtests of single word reading (Sight Word E ciency) and single pseudoword decoding (Phonemic Decoding E ciency) (Torgesen et al., 1999). In the subtest of Sight Word E ciency, the participant was required to read as many words as soon as possible within 45 seconds. In the subtest of Phonemic Decoding E ciency, the participant was required to read as many pseudowords as soon as possible within 45 seconds. Standard scores for each subtest are the number of correctly read words or pseudowords within the time limit, relative to age norms. A composite score of both subtests was used to assess word reading uency in the study.

Standardized Reading Inventory, Second Edition (SRI).
The SRI (Newcomer, 1999) was used to acquire measures of Comprehension and Word Recognition Accuracy. This individually-administered contextual reading test consisted of 10 passages of increasing di culty, ranging from pre-primer to an eighth-grade level. Accuracy is assessed during oral reading, followed by a series of questions to determine comprehension. Scores were obtained for word recognition accuracy and comprehension on each passage and then converted to standard scores based on age norms.

Phonological Awareness
Phonological awareness was assessed using the Elision and Blending subtests of the Comprehensive Test of Phonological Processing (CTOPP) (Wagner et al., 1999). In the Blending subtest, phonological segments are synthesized to form a word. In the Elision task, a speci ed phonological segment is removed from a word, which forms a new word. The score for each subtest represents the number of correct items, converted to a standard score based on age norms. A composite score of both subtests was used to assess phonological awareness in the study.

Home Environment Measures
Following consent and assent procedures, parents or guardians completed a questionnaire that covered family background, household resources, and the child's education and health history. Parents or guardians reported years of formal education (ranging 6-18 years). Participation in a government assistance program was used to assess socioeconomic status (SES). For the families that received a government assistance program were coded as 1 and those without receiving such a program were coded as 0.

Genotyping
Saliva was collected and DNA extracted using Oragene-DNA kits (DNA Genotek) following manufacturer protocols. SNP genotyping for rs2143340 was conducted as part of a larger Illumina HumanOmni2.5-8 bead chip, with genotyping calls screened for quality control measures. The call rate for rs2143340 in the GRaD sample was 0.983. READ1 genotyping was conducted using PCR ampli cation and Sanger sequencing at the Yale W.M. Keck DNA Sequencing Facility using standard protocols as previously described (Li et al., 2018(Li et al., , 2020. Primer sequences and ampli cation protocol were as previously described (Powers et al., 2013). READ1 alleles were called from chromatograms using a custom program written in C++ (Dr. Yong Kong, available upon request). If the calling program identi ed errors, chromatograms were manually examined and deconvoluted for allele calling. The call rate for READ1 allele genotyping was 0.987. The 2,445 bp microdeletion on 6p22, which encompasses the READ1 allele within breakpoints in intron 2 of DCDC2, was genotyped by allele speci c PCR and agarose-gel electrophoresis. Primer sequences, ampli cation protocol, and gel electrophoresis for genotyping were as previously described 11 . The genotyping call rate for the microdeletion was 0.972.

Data Analyses
We tested our hypotheses in three steps. First, we examined a mediation model in which the relationship between genetics and reading outcomes was explained by phonological processing skill. Second, we investigated a moderation model in which home environment factors moderated the relationship between genetics and reading outcomes. Finally, we integrated the moderator into the mediation model and tested the moderated mediation model in which the strength of the indirect (mediation) effect was conditional on the value of moderator (home environment) factors. Process software 46 was used to investigate the moderation, mediation, and moderated mediation effects among reading, environmental, and genetic components.

Tests of Mediation
After showing correlation, we then tested for mediation with the three reading outcomes (see Table 2

Tests of Moderation
Next, we tested for moderation effects from parental education or SES (Table 3). Cross-product terms between parental education and word reading accuracy (b = .92, p < .01), parental education and word reading uency (b = 1.00, p < .01), and between parental education and reading comprehension (b = .22, p = .01) were all signi cant. Cross-product terms between SES and word reading accuracy (b = -3.52, p = .02), and between SES and reading comprehension (b = -1.05, p = .01) were also signi cant.

Tests of Moderated Mediation
To examine whether the strength of the indirect mediation effect of RU2Short is conditional on the value of the either parental education or SES moderators, we then tested a moderated mediation model (Table 4). When word reading accuracy was the outcome, the interaction terms between RU2Short and

Discussion
In a study of mediation and moderation factors in 1,419 African-American and Hispanic-American children, we examined the in uence of the genetic variant RU2-Short on word reading accuracy, word reading uency, and reading comprehension. The results support a moderated mediation model, showing an indirect effect between RU2-Short and reading outcomes through phonological awareness, which was contingent on levels of parental education and SES.

Mediating Roles of Phonological Awareness
Although the heritability estimates for reading performance are not uniformly high, the variability in the results from previous studies may be partially explained by an indirect relationship between genetic factors and reading 47 . Longitudinal and intervention studies have shown that phonological awareness causally predicts reading outcomes 2,3,4,5 . Our results con rm the fully mediating role of phonological awareness in the connection between at least one risk gene (DCDC2) and the three reading outcomes that we tested. Other factors that likely contribute to the variability between studies include the generally small size of the cohorts, differences in assessments, and the study methods (for example, twins versus kinships) 1,7,12 .

Moderating Role of Home Environment
The nature of the interaction between genetic variants and environment on reading performance is also under-studied. Consistent with previous studies 20, 43 , we show signi cant interactions between a well-known genetic risk variant (READ1) and home environment on reading outcomes, con rming that the relationship between genetics and phonological awareness can be adjusted by home environment factors. The strength of the indirect effect between risk genes and reading outcomes is conditional on the value of the home environment factors. When parental education and SES were low, there was a strong relationship between RU2-Short and reading performance. This supports the diathesis-stress model 48, 49 , in which the heritability for reading is greater in a high-stress environment where stressors may lead to expression of risk genes. In contrast to the ndings of Friend et al. 1,12 , we do not show that the genetic in uence on RD is higher among children whose parents have a high level of education; this may be because Friend et al. did their studies in monozygotic and dizygotic twins, whereas we genotyped unrelated children.

The Moderated Mediation Model
A moderated mediation model could show that the effect of RU2-Short on reading outcomes is transmitted by phonological awareness, varying as a function of parental education and SES. In other words, phonological awareness mediates the relationship between genes and reading outcomes when parents have low education level but not when parents have medium and high education levels, and when SES is low, but less or perhaps not at all when SES is high. The connection between genes and reading performance is indirect through phonological awareness and is adjusted by different values of parental education and SES. While examining the in uence of risk genes on reading, both cognitive and environmental factors need to be considered.
The present study broadens the scope of gene effects and presents a complex picture of how genes in uence reading performance by considering the mediating role of phonological awareness, varying by parental education and SES. The nding is important because it suggests that in spite of a strong relationship between genes and phonological awareness, which in turn affects reading performance, the linkage between genes and phonological awareness is diminished when home environment is positive, and only becomes strong in more stressful environments.

Educational Implications
While reading ability continues to be a critical component of academic success, our results have several implications for education. The ndings highlight the importance of phonological processing skill -particularly phonological awareness -as the main factor to explain the connection between genes and reading ability. In the classroom, teachers should still target the training of phonics to enhance students' reading performance. Results from the present study of geneby-environment interactions support the idea that risk genes tend to affect reading ability among children with parents having low education and in low SES families. Therefore, strategies to improve educational and home environments could be especially fruitful for children who carry risk genes for reading.

Limitations and Future Directions
Although our study is limited by its cross-sectional nature and a longitudinal design would have been more appropriate to test for mediation effects, it helps build the theoretical model of the moderated mediation. Furthermore, the mediator phonological awareness has been shown to be a causal factor of reading outcomes in both longitudinal and experimental designs 2,5,16 , making the mediation effect viable. Still, future research should examine and replicate our model with longitudinal and intervention data. In addition, we examined the contribution of only a single genetic risk variant, RU2-Short, to reading outcomes.
Although RU2-Short is a functional genetic variant in the READ1 regulatory element for DCDC2 and has been independently replicated, the correlations between it and all of the reading variables are small in magnitude though signi cant. Other genetic risk variants should also be investigated in future studies.
Furthermore, our study focuses on two demographic groups (African-Americans and Hispanic-Americans) that have been under-represented in genetics research. To generalize our ndings, future studies of more diverse populations, and larger cohorts, will need to be examined.