2.1 Study design
The study population consisted of n = 585 participants listed within the LAINBIO project . Enrollment was done at the Preventive Medicine Service of the University of Padua, Italy, from October 2002 to July 2005, as previously described . All participants were notified of the purpose and strategies of the study and were requested to sign a consent form. The Ethics Panel of the School of Medicine, in accordance with principles of the Helsinki Declaration, approved the study (practice number 3843/AO/16). The admissibility criteria for participants were as follows: (1) older than eighteen years at registration, (2) not professionally exposed to PAHs, (3) inhabitant of the Veneto region at the time of the enrolment, and (4) willing to sign the consent form and provide blood and urine samples. Conditions for exclusion from the study included preceding diagnosis of cancer, cardiovascular disease, or stroke within the last year, as well as other chronic syndromes such as multiple sclerosis, Alzheimer’s disease, Parkinson’s disease, depression, bipolar disorder, schizophrenia, and epilepsy . Information on possible extracurricular PAH exposure (i.e. diet and indoor and outdoor exposure) as well as intake of fruit and vegetables were gathered by means of a structured questionnaire, as previously described . When subjects filled out the questionnaire, blood samples were drawn and conserved at −80°C until DNA was obtained by a Genomic DNA purification kit (Wizard, Promega, Italy), following the manufacturer’s instructions. DNA was used for subsequent analyses of leukocyte DNA adducts, LTL and LmtDNAcn. All participants became anonymous after sample collection.
2.2 Estimation of PAH exposure from the questionnaire
Using a self-compiled questionnaire, we collected data on environmental exposure to PAH focusing on the following categories.
Diet. This is the number of times per year that PAH-rich meals were consumed, including grilled meat or pizza roasted in wood-burning ovens. In the statistical analysis, this was considered as a continuous data variable.
Indoor exposure was the combination of a number of sources: presence of a coal- or wood-heater in the residence (used less than or more than 5 times per year = a score of 1 or 2, respectively), leisure activity with exposure to PAHs (works at home/hobbies involving exposure to mineral oils, soot, fumes from combustion of wood, leaves or other combustible materials, engine exhaust = a score of 1), and exposure to passive tobacco smoke (a score of 1). Participants were categorized as having no exposure (total score of 0), or low (total score of 1), intermediate (total score of 2) or elevated (total score of 3) indoor exposure, the latter including one individual who had a score of 4. In the statistical analysis, this was taken into consideration as a continuous variable.
Home. Residential exposure was classified as urban/ peripheral or country areas according to the residential address of each participant, which was used as a categorical data variable with two levels [urban/ peripheral (score = 1) or country areas (score = 0)] in the statistical analysis i.e. 1 or 0, respectively.
Traffic. The exposure assessment of traffic-related air pollution nearby the zone of habitation was based on responses to the following questions: “How do you estimate the traffic in the area where your home is located?” Continuous heavy traffic for most of the day 2; Intense intermittent traffic (e.g. only during rush hour) 1; Scarce or no traffic 0. Traffic was accounted for as a categorical variable with two levels: score = 1 for continuous/intense and score = 0 for scarce or no traffic.
Outdoor. Subjects with outdoor exposure to traffic pollution were individuals such as traffic police officers and gardeners. The variable was categorical in two levels: ≥4 hours/day (score = 1), or <4 hours/day (score = 0).
Smoking. Current smokers (including individuals who had quit smoking up to 4 weeks before participation to the study) were given a score as 1, while nonsmokers and former smokers were scored as 0.
2.3 Analysis of the anti-B[a]PDE–DNA adduct
Anti-B[a]PDE–DNA adduct was identified by high-performance-liquid-chromatography along with a fluorescence detector . The procedure was as previously described  with some minor changes, primarily concerning the mechanization of the HPLC assay. In this way the batch impact was abated (see complete description of anti-B[a]PDE–DNA adduct analysis in the Supplementary Material). In short, samples with non-measurable DNA adducts had a value of one-half the threshold of detection of the assay (LOD/2=0.125). Adduct levels were considered in the analyses both continuously and categorically (present or non-measurable). Individuals classified as having adducts present were those with a level of ≥0.5 adducts/108 nucleotides.
2.4 Leukocyte Telomere Length (LTL)
LTL was appraised by using quantitative Real-Time PCR as previously described . This test calculates LTL in genomic DNA by establishing the proportion of telomere replicate copy number (T) compared to copy number of a nuclear gene (S) in a specified sample relative to a reference DNA sample, i.e. the so called Telomere/Single gene (T/S) ratio. The single-copy gene was human (beta) globin (hbg). As reference DNA, we pooled DNA from 50 subjects randomly selected from the study population (500 ng for each sample). From this, a new standard curve ranging from 30 to 0.23 ng/µl (serial dilutions 1:2), was added in every “T” and “S” PCR run, versus a negative sample (water). In total 9 ng of DNA sample was incorporated in each analysis. Each sample was threefold analyzed as reported in Pavanello et al. . LTL was treated in the analyses both as categorical tl50 (higher or lower than median: 0 = below 0.896; 1 = equal/above 0.896) or as a continuous variable. See complete description of LTL analysis in the Supplementary Materials and Methods.
2.5 Leukocyte mtDNAcn (LmtDNAcn)
LmtDNAcn was determined in the same DNA of LTL testing by means of the real-time quantitative PCR (qRT-PCR) as previously described . This assay appraises mtDNAcn in experimental samples by establishing the relation between the mitochondrial (MT) DNA copy number and the single copy number of a gene (S) relative to the MT/S ratio of a reference assembled DNA sample . All samples were replicated threefold. The average of the three MT measurements was divided by the average of the three S measurements to calculate the MT/S ratio for each sample. The coefficient variation for the MT/S in samples examined on two distinct days was 6%. LmtDNAcn was treated as a continuous data variable in the statistical analysis. See complete description of LmtDNAcn analysis in the Supplementary Materials and Methods.
2.6 GSTM1 and GSTT1
A multi-PCR technique was applied to detect the presence or absence of the GSTM1 and GSTT1 genes, following the procedure as previously described . Briefly, the same amplification mix contained both GSTM1- and GSTT1-specific primer pairs and incorporated a third primer pair for β-globin, the internal positive PCR control. The GSTT1 (480 bp), β-globin (285 bp), and GSTM1 (215 bp) amplified products were separated in a 2% agarose gel. The absence of the GSTM1- or GSTT1-specific fragment designated the corresponding null genotype (*0/*0), whereas the β-globin-specific fragment indicated the presence of amplified fragments in the reaction blend.
2.7 Statistical analysis
We used the Spearman's rank coefficient to calculate the pairwise correlation among the five variables of environmental exposure to PAHs (diet, indoor, home, traffic, outdoor), as well as age and sex. Through a mathematical model (see below: SEM), these variables were aggregated in the latent variable “PAH” that represents an overall picture underlying physical reality, making it easier to understand and handle the data.
2.7.1 Analytic strategy
We used a conceptual framework describing the hierarchical relationships between risk factors, based on knowledge of the relevant literature and temporal considerations. As shown in Figure 1, the latent variable “exposure to PAH” derived from the self-compiled questionnaire was considered as the distal determinant, acting through the proximate determinant “anti-B[a]PDE–DNA” (intermediate variable or mechanism) to affect the final outcomes “tl50” (LTL median) or, alternatively, LmtDNAcn. Although it is uncommon, the notion of proximate and distal determinants is important because in an approach based entirely on statistical associations, distal factors are often improperly adjusted for proximate factors with a consequent reduction or elimination of the effects of the former .
All the above assumptions were converted into two models of structural equation modeling (SEM), one for each final outcome (either tl50 or LmtDNAcn). The STATA command syntax for each model was:
- SEM (PAH -> diet indoor outdoor home traffic) (anti-B[a]PDE–DNA <- PAH sex smoking gstm1) (tl50 <- anti-B[a]PDE–DNA age sex smoking), stand vce(oim);
- SEM (PAH -> diet indoor home traffic outdoor) (anti-B[a]PDE–DNA <- PAH sex smoking gstm1) (LmtDNAcn <- anti-B[a]PDE–DNA sex), stand vce(oim)
The SEM model commands assume that variables are latent if the first letter of the name is capitalized. The variable “PAH” is capitalized because is our latent variable name. In the first, second and third set of parentheses we specified, respectively, the estimations of the latent variable “PAH”, the model for the mediator variable “anti-B[a]PDE–DNA” and the model for the final outcome (either tl50 or LmtDNAcn). Notice that “anti-B[a]PDE–DNA” was a dependent variable in the second set and an explanatory variable in the third set of parentheses. Furthermore, the correlation plot between the anti-B[a]PDE–DNA tetrol biomarker and the self-reported PAH proxy was obtained using appropriate STATA commands. We generated the numerical values of the latent variable “PAH” in the context of two SEM models: PAH1 for the outcome tl50 and PAH2 for the outcome LmtDNAcn. The individual values of both PAH1 and PAH2 were plotted against the logarithm of tetrol (anti-B[a]PDE–DNA). The latent variable PAH is estimated by SEM program, not observed. As can be read in statistical package STATA 14 for SEM analysis, “a variable is latent if it is not in your dataset but you wish it were. You wish you had a variable recording the propensity to commit violent crime, or socioeconomic status, or happiness, or true ability, or even income. Sometimes, latent variables are imagined variants of real variables, variables that are somehow better, such as being measured without error. At the other end of the spectrum are latent variables that are not even conceptually measurable”. All the predictors shown in Figure 1 were used in preliminary analyses (data not shown) but only the statistically significant terms were included in SEM final models. In STATA commands, “stand” specifies that the effects are expressed as standardized (or beta) coefficients that make comparisons easily by ignoring the independent variable's scale of units, while “vce(oim)” specifies how the standard errors are calculated. “VCE” stands for variance–covariance matrix of the estimators, and “oim” stands for observed information matrix (OIM). The OIM estimator of the VCE is the default and is based on asymptotic maximum-likelihood theory. The VCE obtained in this way is valid if the errors are independent and identically distributed normal, although the estimated VCE is known to be reasonably robust to violations of the normality assumption.
We used three SEM goodness-of-fit statistics: (1) the chi square test for “model versus saturated” (the saturated model is the model that fits the covariances perfectly), (2) the standardized root mean squared residual (SRMR), and (3) the coefficient of determination (CD).
SEM results were both tabulated and presented graphically.
The sample size required for SEM is dependent on model complexity, the estimation method used, and the distributional characteristics of observed variables. The best option is to consider the model complexity (i.e., the number of exogenous variables) and the following rules of thumb: minimum ratio 5:1, with a recommended ratio of 10:1, or a recommended ratio of 15:1 for data with no normal distribution . With ten exogenous variables used in the SEM model, we should have a minimum of 150 (= 15 ´ 10) subjects; in total we reached 585 (537 with complete data), thus fulfilling these requirements. The analysis was conducted with the statistical package STATA 14.