This report entails the most comprehensive interrogation of microRNAs in exhaled breath, here uniquely performed to distinguish subjects with and without primary lung cancers [33]. Starting with a lung tissue microRNA-seq discovery effort combined with published literature-suggested microRNAs, we interrogated a panel of 25 microRNAs in exhaled breath condensate using our RNA-specific qualitative RT-PCR. We found that: (i) microRNAs are detectable in exhaled breath condensate; (ii) there are individual exhaled microRNAs that offer case-control discrimination by logistic regression (microRNAs 21, 33b, 212), and (iii) additional RF models can be developed, using the entire microRNA panel, that also suggest some modest additional case-control discrimination, particularly in the subsets of former smoker, and early stage subjects, over and above that demonstrated in comprehensive clinical models.
Technical challenges abound in examining nucleic acids in EBC. While EBC is widely available non-invasively, this specimen entails only trace levels of microRNA template. This is perhaps because the templates are by definition, higher in molecular weight (22 nucleotides in length) than is typically true for exhaled airstream-suspended molecules such as H2O2, 8-isoprostane, and others [22]. Nonetheless, the PCR confers capacity for detection of microRNAs at the low template copy level, as is suggested here. The trace concentrations inherent to EBC specimens for most analytes, including nucleic acids has, to date, precluded performing discovery efforts such as microRNA next gen sequencing, directly from this matrix.
The microRNA interrogation panel choice was therefore based on: (i) a previously unpublished microRNA seq effort (GEO#: GSE33858) inter-tissue comparison of 32 lung resected bronchogenic carcinoma versus remote lung tissue. (stratified for adenocarcinoma, squamous cell carcinoma histologies), with 10 representative overexpressed microRNAs included from each of those two histologies. The remainder came from: (ii) TCGA [15, 16]; and (iii) several literature-identified microRNA markers of lung cancer.
We used our previously published microRNA-PCR that is micro/mRNA –specific, as it excludes gDNA fragment false priming by employing a uniquely tagged RT-primer strategy [23, 25], and in primer design precluded false amplification of messenger RNA fragments. We chose to treat the data as qualitative (individual miR, present/absent) because we were insufficiently confident of robust quantitative RT-PCR data that could not be reliably scaled to a robustly quantifiable internal housekeeper at these trace levels. Performance of the fluorescent intercalating (SYBR®) dye detection strategy coupled to URT-PCR on the realtime PCR platform allowed quality assurance using quantitation curve, melt-curve, melt temperature with each PCR reaction. This was superimposed on a series of other analyses invoked during primer design, using multiple positive and negative controls, described in the Methods and Additional Studies sections. We additionally piloted a commercial qPCR platform (In Vitrogen/Taqman®) without readily apparent additional precision nor microRNA-PCR sensitivity.
This cross-sectional case-control design was chosen as representing a typical initial step in early development of potential risk biomarkers [34, 35]. Clinical-demographic differences were observed in cases versus controls for age, smoking, pack-years, quit years, a pack-years minus quit-years composite index, underlying lung disease (COPD, inflammation/fibrosis, asthma, sarcoidosis, bronchiectasis). However, these differences were equally modelled in both clinical-only models and in the clinical + microRNA combined models identically, so they should not have biased the incremental microRNA-attributable risk prediction. We emphasized current and former smokers predominantly, as they are at elevated risk for lung cancer, and therefore commonly come to clinical attention for surveillance, biopsy/resection, and thus were considered appropriately efficient for enrollment in this initial study. Our case and control ascertainment was crisp, minimizing misclassification as subjects were all confirmed histologically by virtue of their bronchoscopic/surgical procedures, underwent further verification of case and control status by an additional 3–6 month period of clinical follow-up, facilitated by electronically-retrieved clinical assessments from the engaged clinical pathologists, radiologists, surgeons, and pulmonologists on each subject. Recruited subjects with disputed case-control ascertainment (< 1% of enrolled) were excluded from the study.
In this moderate size case-control subject set, with an already selected candidate 24-microRNA panel, we initially performed logistic regression, using case-control status as the main outcome variable, and a clinical model tested with/without each individual miR on the panel. Separately, we then employed iterative cross validation by random forests to assure stability of our results, rather than separate discovery and validation sets. The RF approach iteratively and randomly splits the data, substantively cross validating in truly random fashion, and minimizing over-fit.
The clinical versus clinical-microRNA incremental differences are admittedly modest (~ 0.0–3.0%). We surmise that this is, in part, due to the strength of the clinical model alone displaying ROCs ~ 0.75-80. These were unusually robust clinical models, we believe for two reasons. First is the clinical model comprehensiveness, in part attributable to inclusion of all major known substantive risk factors for lung cancer (including quit years, underlying lung disease, others). Secondly, there is positive selection inherent to enrolling clinical bronchoscopy and surgical subjects such as these (above), wherein both (case and control) sets of subjects are drawn from the same base (procedural-destined) population that is itself selected on clinical criteria to be at high risk for lung cancer. By definition, that high risk is perceived by the clinician as sufficient to warrant an invasive diagnostic/therapeutic procedure, the enrollment point for a majority of our subjects. Both of these factors (clinical model comprehensiveness, and clinical series enrollment bias) contribute to high risk in this clinical series, and imply that clinical risk model performance will be elevated. Thus, the difference between this comprehensive clinical model alone, and that for this clinical model plus microRNA could potentially be artificially narrowed (as compared to that using conventional sparse clinical models) by virtue of the comprehensiveness of the clinical model. We believe the negative impact of such bias on the estimate of the actual contribution of exhaled microRNAs to case-control discrimination, is counter-balanced by the strength inherent in using the same (robust) clinical model when comparing clinical-only models versus combined clinical + microRNA models. Additionally, the definitive diagnoses inherent in recruiting those destined for lung sampling/pathologic readout was another strength. Overall, then, the above considerations suggest ours is a conservative estimate of the exhaled biomarker contribution in real clinical conditions.
Among study limitations, we were forced to use a dichotomous (present/absent) signal for a given microRNA in a given EBC sample, despite being run on a realtime machine, for technical reasons. The realtime CT values, using the chosen platform, were not robust enough to generate reliable quantitative data, worth re-addressing in optimization studies, which are ongoing.
Additionally the discriminant microRNA signal may in fact be small in magnitude, as our data suggest. This small magnitude of microRNA change in the “field” of bronchial epithelium itself has been suggested in a comprehensive RNA-seq study of bronchial brushings in a similar case control setting [14]. Notably, of the discriminant four bronchial epithelial case-control discriminant microRNAs in that report, only one (146-5p) was interrogated in our study. While 146-5p was not individually case-control discriminant in LR models, it was contributory in the RF models for former smokers and early stage. There is a very recent pilot report that EBC miRNAs might allow the identification, stratification and monitoring of lung cancer [31].
We set out to survey the “state of the epithelium” rather than detection of a small peripheral tumor itself. This view of broad epithelial “field” interrogation is appropriate to risk assessment, rather than that of a suspect tumor diagnostic tool. That the signal was likely from the field of normal cell material, rather than spillage of a tumor is supported by the observation that early stage subset showed more case-control discrimination than the late stage cases, which would not be expected if the tumor itself was spilling microRNA material.