Participants
A total of 2284 subjects (primary school students from Shaanxi, Gansu and Inner Mongolia) were recruited. All subjects were normal school-age students from grade 3 to grade 6 without any history of mental illness. In this study, saliva samples were collected (gene samples were extracted). In addition, some of the participants completed six endophenotype tests and two behavioral phenotype tests, which included Chinese character recognition, fluent reading and reading-related cognitive skills.
Genotyping And Snp Selection
An Illumina Asian Screening Array (ASA, 700K-750K) chip was used for genotyping the obtained DNA samples using the Beijing Compass biotechnology formula. PLINK was used to screen the following genotypes for standard quality control (Aderson et al., 2010; Chang, 2015): a single sample SNP detection rate higher than 0.9 (sample call rate > .90); single SNP detection rates (SNP call rate > .95); Hardy-Weinberg equilibrium coefficients (p < 10− 5); minor allele frequencies (MAF > .01); and those with first degree of kinship was removed (PI_HAT > .50). MACH 4.0 software was used to carry out full genome data (imputation) analysis based on the Asian population data in the Genome Asia Pilot (GAsp) project, and the filled data were consistent with the previous quality control standards. Thirteen SNPs were extracted by PLINK v1.90.
Phenotypes
Word list reading task. As a measure of fluent reading, 1588 subjects read a word list consisting of 180 words, all of which were simple and common words. The subjects were asked to read the words quickly in order (both words of two words were correctly marked as 1, otherwise they were marked as 0). Finally, the time the subjects spent reading was used as a test indicator of fluent reading.
Rapid automatized naming task. The rapid automatized naming task includes four tasks: digit naming, picture naming, dice naming, and color naming. Each task table contains eight rows with five items in each row. Each naming task requires the participants to name the items in each row as fast as possible: five figures (2, 4, 6, 7, and 9), five dice pictures (5, 4, 2, 3 and 1), five pictures (dog, flower, book, shoe, and window), and five color pictures (red, yellow, black, green, and blue). Finally, the average time of two rapid automatized naming tasks is taken as the standard measurement. The test-retest reliabilities of the rapid automatized digit, picture, dice, and color naming tasks are 0.87, 0.82, 0.74, and 0.74, respectively.
Phonological awareness task. In this task, a child is verbally presented with a one-syllable word. The child’s task is to remove a given phoneme from the syllable in the word. The task consists of 16 items: initial phoneme deletion items (e.g., /mei4/ (sister) without /m/), middle phoneme deletion items (e.g., /tuan4/ (group) without /u/), and final phoneme deletion items (e.g., /guan1/ (close) without /n/). Scores are calculated if all tasks are pitched and pronounced correctly. This task has been widely used in language studies of Chinese children [35, 56, 57].
Morphological awareness task. Children are asked to identify one of the morphemes among two-morpheme words and to create two new words with the target morpheme. One of the morphemes in two words has the same meaning as the target morpheme; conversely, one of the morphemes in the other two words has a different meaning. Presented with the word bei1bao1 (which means backpack), children are asked to produce two new words containing [bao1], which has the same meaning as "bei1bao1", such as shu1bao1 (which means bag). At the same time, the meaning of [bao1] in two new words is different from that of bei1bao1, such as bao1zi1 (which means steamed stuffed bun).
Parental Education (Pe) Levels
A total of 1620 subjects in this study had information about their parents' educational levels, with 1 representing the lowest educational level and 8 representing the highest educational level. When both parents had access to information, the average score was determined by their educational level, which represented the education level of the parents. The average score was given according to the structure of the Chinese education system: 1 = primary school education, 2 = junior high school education, 3 = senior high school education, 4 = junior college education, 5 = undergraduate degree, 6 = master’s degree, 7 = doctoral degree, and 8 = postdoc.
Data analysis
SNP coding and cumulative genetic scores (CGSs)
The SNPs on KIAA0319 were repeated in GWAS in the Chinese sample [58]. We therefore adopted the β value for the phenotype of fluent reading and character recognition in this study. Coding was based on the first allele and beta value. When the value of β was positive, the homozygous with the first allele was 2 and the heterozygous was 1, 0 was the homozygous for the minor allele. When the beta value was negative, it was opposite to the encoding genotype. Finally, the genetic risk score of the KIAA0319 gene was calculated by combining the number of risk alleles of the 13 SNPs.
Interaction Analysis
Stratified regression analysis was performed to explore the interactions of 13 SNPs and the CGS with the parental education level. The standard multiple regression equation is as follows:
\(Y=B0+B1X1+B2X2+B3(X1 \times X2)+B4 \cdot Age+B5 \cdot Sex+E{\text{ (1)}}\)
where Y is the dependent variable (i.e., fluent reading); X1 is the environmental variable (PE); X2 is the genetic variable, X1×X2 is the product term of the gene‒environment interaction; B1 and B2 are the regression slopes of the main effects of environment X1 and gene X2, respectively; B3 is the regression coefficient of their interaction term; B0 is the intercept; and E is the random error.
Reparameterized Regression Model Tests
The reparametric equation [59] is as follows:
\(\begin{gathered} \hfill \\ Y=B0+B1(X1 - C)+B2((X1 - C) \times X2)+B3 \cdot Age+B4 \cdot Sex+E{\text{ (2)}} \hfill \\ \end{gathered}\)
Equation 2 is a 5-parameter equation (i.e., B0, B1, B2, B3, B4, C). C is the intersection of the predicted values of the environmental variables of the two groups; if the crossover of C and its confidence interval (CI) is within the range of values in the environment, the interaction is disordinal, reflecting the differential susceptibility model, and if it falls outside the range, the interaction is ordinal and conforms to the diathesis-stress model [59].
In this study, B1 is the slope of PE, and B2 is the slope of the interaction term. Point C is not fixed. If Point C is within the range of the parental education level, the interaction of G×E conforms to the differential susceptibility model. If Point C is fixed, a crossover point that falls at the maximum value of the environment variable is added (C = Max (PE)). At this point, the interaction of G×E is orderly and conforms to the diathesis-stress model. ANOVA, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) were used to evaluate the two models. For the AIC and BIC, the lower the value is, the higher the efficiency of the model.
Indirect Effects Analysis
A series of structural equation modeling (SEM) analyses were conducted by using Mplus 17.0 to explore whether RNA mediates the effect of cumulative genetic scores on fluent reading after controlling for age and sex. All the predictor measures were allowed to be related to each other. Indirect effects were tested using the 5000 bootstrap technique [60], and confidence intervals (95% CIs) that did not contain zero indicated significant indirect effects [61]. Moreover, we reported a variety of indices to reflect the model fit [62].