High-Throughput Screening of Alternative Micro-Metastasis-Specic Gene Predictors of Circulating Osteosarcoma Cells

Current techniques to identify circulating-tumor cells (CTCs) in osteosarcoma (OS), which are an indication of a poor prognosis in cases of intermediate levels of metastasis, are complicated and time-consuming. This study investigated the ecacy of quantitative reverse transcription PCR (qRT-PCR), a molecular technique that is available in most laboratories, for detection of CTCs in buffy coat samples of OS patients and healthy donors. Previously published reports on data-reviewing and retrieval of data by calculation of differential gene expression from the Gene Expression Omnibus (GEO) database repository were reviewed identify candidate genes. Following analysis of the expression of the candidate genes identied a diagnostic model for detection of specic gene expression was derived using binary logistic regression with a multivariable fractional polynomial (MFP) algorithm. the candidate genes was demonstrated in the buffy coat of 73 OS patients and 79 healthy donors. The analytical model in this study used VIM (vimentin-encoding gene), ezrin, COL1A2, and PLS3 for OS diagnosis and for metastasis prediction markers using a simple CTC detection method which provides both high ecacy and reliability. To identify OS-specic genes which express in blood circulating cells, publicly available microarray gene expression datasets from GEO of OS cell lines (n = 29), primary OS cells (n = 3) and healthy whole blood samples (n = 36) representing OS circulating cells and blood cells in buffy coat samples were selected for the bioinformatic analysis. Datasets of samples which had been administered any agent vectors were excluded. Quantile normalization background adjustment and summarization were calculated using a robust multi-array average (RMA) algorithm and a custom brainarray chip description le (CDF, ENTREZG, V19). As described earlier, probe sets of genes and adjusted P values were calculated using a limma package availably in R to compare gene expression. novel genes


Background
Osteosarcoma (OS), although relatively rare, is the most common primary malignancy of bone, with a worldwide incidence of 3.4 per million people per year [1]. OS is found to occur predominantly in the second decade of life and in the [1,2]. Forty percent of OS patients whose tumor is found to have spread from the primary site to secondary sites during or after diagnosis have a poor response to treatment and poor recovery even when combination therapies are employed [3].
For metastatic OS, magnetic resonance imaging (MRI), computed tomography (CT) scans and positron emission tomography (PET) scans are the standard methods for bone metastasis lesion diagnosis and follow-up. PET-CT is more sensitive for bone metastasis detection than scintigraphy which is currently the standard method [4,5]. Even so, pulmonary nodules smaller than 5-9 mm are still undetectable by PET/PET-CT [1,6,7]. The sensitivity for evaluation of bone metastasis is increased when scintigraphy is combined with PET-CT [5].
Liquid biopsy is an alternative technique for predicting metastasis which represents a promising approach for diagnostic, prognostic, and personalized therapeutic purposes. Among liquid biopsy biomarkers, circulating tumor cells (CTCs) represent a promising avenue for identifying cancer metastasis. The challenge for CTC detection is the low incidence in circulating blood, about 1-10 cells per 10 6 white blood cells [8]. There are several clinically signi cant CTC separation techniques which not only enrich the CTCs but are also able to purify and identify CTCs, e.g., immunomagnetic enrichment and micro uidic immunocapture. Owing to their novelty and the fact that the methods are complicated, these techniques are not widely used [9]. Cancer-speci c mRNA analysis is one promising approach for tracing cancer cells in blood, but speci c mRNA markers for OS are not widely established [10,11]. Candidate gene tumor markers have generally been obtained from and identi ed in the by-products of previous studies, thus they might not represent robust markers for some cancers. Comparative expression analysis using information from biodata resources is a new pregenital approach for identi cation tumor speci c makers. Among sources of bioinformatic data, the Gene Expression Omnibus (GEO) has been widely adopted for identifying tumor-speci c genes [12,13].
To predict the micro-metastasis of OS, we proposed a simple and inexpensive method to identify novel OS-speci c genes using comparative gene expression analyses from a gene expression database and to determine the candidate gene expression in OS cell lines by quantitative reverse transcription PCR (qRT-PCR). Expression of the candidate genes was demonstrated in the buffy coat of 73 OS patients and 79 healthy donors. The analytical model in this study used VIM (vimentin-encoding gene), ezrin, COL1A2, and PLS3 for OS diagnosis and for metastasis prediction markers using a simple CTC detection method which provides both high e cacy and reliability.

Patients
Ethylenediamine tetraacetic acid (EDTA) whole blood and tumor tissue samples had were retrospectively collected from 62 stage IIB and 11 stage III patients obtained during diagnostic procedures conducted between 2012 and 2020 at Maharaj Nakorn Chiang Mai Hospital. Residual anonymous EDTA buffy coat (500 µl) was also drawn from 79 healthy individuals during donor screening procedures and the residual buffy coat (120 ml) was obtained during blood component preparation by the Blood Bank Section of Maharaj Nakorn Chiang Mai Hospital. All blood and tissue samples were collected after receipt of approval by the Research Ethics Committee Faculty of Medicine Chiang Mai University (ORT-2557-02717 and ORT-2562-06549). The overview of the study and the work ow of methods is shown in Fig. 1.
Accession codes are given in Table S3. The gene expression analysis to identify candidate genes compared 1) OS cell lines with healthy whole blood samples and 2) primary OS cells with healthy whole blood samples.
The robust multi-array average (RMA) algorithm through a custom brainarray chip description le (CDF, ENTREZG, V19) was used to calculate the quantile normalization background adjustment and summarized as previously described [12]. For investigation of differential gene expression, P values were calculated with Linear Models for Microarray (limma) data in R. P values with a log 2 Expression Ratio (ER) greater than 2 was a set as the cut-off for initial selection of candidate genes [14].
Sample preparation.

Healthy peripheral blood mononuclear cells (PBMCs):
Buffy coat samples from the blood component separation process were diluted with phosphate buffered saline (1:1). Gradient centrifugation using Lymphoprep™ (STEMCELL Technologies, Canada) was employed for PBMCs isolation. The PBMCs were collected and counted under a light microscope with a hemocytometer.
Buffy coat: EDTA whole blood specimens were centrifuged at 1,600g for 15 minutes, the buffy coat layer between packed red blood cells and plasma was collected and stored at -80 o C. The cryopreserved buffy coat samples from OS patients and healthy donors were lysed with lysis buffer RA1, (Macherey Nagel, Düren, Germany).

Cell lines and primary cells
Human OS cell lines HOS-MNNG and U2OS were obtained from the Cell Line Service (Eppelheim, Baden-Württemberg, Germany). HOS-143B, MG63 and Saos-2 were obtained from the American Type Culture Collection (ATCC, Manassas, VA). Primary OS cells were generated and extracted following previously described protocol [15,16]. The MNNG-HOS cells were grown in the Roswell Park Memorial Institute (RPMI). The MG63, and primary cells were grown in Dulbecco's modi ed Eagle's medium (DMEM). U2OS and Saos-2 cells were cultured in Dulbecco's Modi ed Eagle Medium/Nutrient Mixture F-12 medium (DMEM-F12). HOS-143B was cultured in DMEM-Brdu. All cell lines and primary cells were cultured in 10% (v/v) fetal bovine serum and maintained in a humidi ed atmosphere of 37°C with 5% CO 2 [15].

Spiking assay
Spiking assay was conducted to select potential genes which can distinguish candidate gene expression between the spiked samples and non-spiked samples in a minimum number of OS cells. The spiking assay was performed by spiking various numbers of each cell line (0-10 4 cells) into normal PBMCs (1.25x10 5 -2x10 6 cells) and assessing candidate gene expression by qRT-PCR.
qRT-PCR Total RNA was extracted with an illustra RNAspin Mini Kit (GE Healthcare Europe GmbH, Freiburg, Germany) and 20 µg of cDNAs were generated by iScriptTM (Bio-Rad, Hercules, CA, USA). The PCR reactions were performed with an Applied Biosystems 7500/7500 Fast Real-Time PCR system using SensiFAST™ SYBR® Lo-ROX (Bio-Rad, Hercules, CA, USA) for 45 cycles. Each cycle was performed as follows: 5 seconds at 95°C, 10 seconds at 60°C and then 30 seconds at 72°C. The RNA levels of CD45 and candidate genes (COL1A2, GJA1, PLS3, COL5A2, COL3A1, CDR1, COL2A1, EGR1, Ezrin and VIM) from both bioinformatic analysis and previous publications were normalized with beta-actin (ACTB) as a housekeeping gene using the 2 (-ΔΔC (T)) method. Primer sequences are listed in Table S1.

Statistical analysis
Statistical analysis of candidate genes relative to expression was performed using SPSS 22.0 (IBM Corp., Armonk, NY, USA), Stata 16 (StataCorp, College station, Texas, USA), and Prism 8.4.3 (GraphPad, La Jolla, CA, USA). The data are shown as mean ± standard deviation (SD). The signi cance of difference between means of OS and healthy donors was determined using the Mann-Whitney U-Test for ordinal or continuous data which is not normally distributed. P values less than 0.05 were considered statistically signi cant. Candidate genes were selected by a retrospective study. Non-parametric regression and multivariable modeling were constructed with both OS patients and healthy donors using fractional polynomials. We explored the shape of the association between relative gene expression and log odds of osteosarcoma using locally weighted scatter plot smoothing (LOWESS) and fractional polynomial plots. A diagnostic model for prediction of OS and OS metastasis was derived using binary logistic regression with a multivariable fractional polynomial (MFP) algorithm for tting of continuous determinants based on the actual shape of their association with the predicted endpoints [17,18]. The p-value cut-off was set at 0.2 to exclude gene expression with non-signi cant contribution from the equation model. Model discriminative ability was measured as the area under the receiver operating characteristic curve (ROC). Predicted probabilities of OS and OS metastasis were calculated using the model. Cut-off points for diagnosis of OS and OS metastasis were established based on sensitivity and speci city.

Results
Identi cation of circulating osteosarcoma cell speci c candidate genes using bioinformatic data analysis.
To identify OS-speci c genes which express in blood circulating cells, publicly available microarray gene expression datasets from GEO of OS cell lines (n = 29), primary OS cells (n = 3) and healthy whole blood samples (n = 36) representing OS circulating cells and blood cells in buffy coat samples were selected for the bioinformatic analysis. Datasets of samples which had been administered any agent vectors were excluded. Quantile normalization background adjustment and summarization were calculated using a robust multi-array average (RMA) algorithm and a custom brainarray chip description le (CDF, ENTREZG, V19). As described earlier, probe sets of genes and adjusted P values were calculated using a limma package availably in R to compare gene expression.
From the GEO data, there were 20,188 and 20,186 genes which had been reported in OS cell lines and primary OS cells, respectively. After calculation, we found signi cant up-regulation of 1,426 and 1,899 genes in OS cell lines and primary OS cells, respectively (p value < 0.001) when compared to the healthy whole blood cells with log 2 expression ratio (ER) > 2 ( Fig. 2A and 2B, respectively). These sets of genes were considered to be upregulating genes. Among the upregulating genes, only the eight genes, COL1A2, GJA1, PLS3, COL5A2, COL3A1, CDR1, COL2A1, and EGR1, which presented a 500-fold change of expression in OS cell lines or primary OS cells compared to healthy whole blood samples were considered as novel OS-speci c genes (Fig. S1).

Candidate gene selection
To select potential candidate genes which could distinguish the difference between the spiked sample and non-spiked sample in a minimum number of OS cells, the OS cells were spiked into normal PBMCs. The qRT-PCR was performed to evaluate the expression of the novel OS-speci c genes in all spiked samples. To determine the cycle threshold (Ct) value of each candidate (Y axis) and CD45, a common leukocyte antigen, (X axis) were converted to log 2 -Ct and plotted on quanti cation curves. The genes which were able to distinguish between samples with and without OS were designated as potential candidate genes.
The 50 curve patterns were categorized into three groups: 1) completely separated, 2) partially separated, and 3) non-separated patterns. The genes which exhibited a completely or partially separated pattern were considered candidate genes. The genes which exhibited a non-separation pattern were considered to be unacceptable genes. As shown in Table S2, the candidate genes were EGR1, PLS3, VIM, COL1A2, COL3A1, COL5A2 and Ezrin and the unacceptable genes were COL2A1, CDR1 and GJA1.
Evaluation of the OS diagnostic potential of the genes The samples from OS patients (n = 73) and healthy donors (n = 79) (clinical characteristics shown in Table 1) were evaluated for expression of the candidate genes using qRT-PCR; the Ct values of each of the candidate genes were normalized with CD45 gene expression. The analysis demonstrated that the relative expression of EGR1 (P = 0.0010), PLS3 (P = 0.0038), and VIM (P < 0.0001) in OS patients was signi cantly higher than in healthy donors (Fig. 3A). Even though COL1A2, COL3A1, COL5A2 and Ezrin showed no signi cant difference between OS patients and healthy donors overall, these four genes did present a stronger difference of expression in patients and donors who were younger than 25 years old with borderline statistical signi cance (P values of COL3A1, COL5A2 and Ezrin were 0.0926, 0.0669 and 0.0628, respectively) with the exception of COL1A2 (P = 0.1707) (Fig. 3A-B).  The association between the relative expression of each candidate gene and the log odds of OS was nonlinear (Fig. S7). In the MFP algorithm, COL1A2, PLS3, Ezrin, and VIM, which showed signi cant contribution to the model, were included in the diagnostic model, while COL3A1, COL5A2 and EGR1 were excluded as their P values were less than 0.2 (data not shown). Following to the diagnostic model, the probability of OS was calculated as follows: where lp is the linear predictor yielded from the formula: Each candidate gene term is referenced in Fig. 2.
The ROC curve analysis was performed on the expression of COL1A2, PLS3, Ezrin and VIM to examine the diagnostic performance of the model for identifying OS in samples from healthy donors. All OS samples were positive with two false positives in the healthy donor samples at the probability cutoff of 0.2943. The sensitivity was 100% (95%CI 94.8, 100.0) and the speci city was 96.49% (95%CI 87.9, 99.6), with an area under the ROC curve of 0.9896 (95%CI: 0.9695, 1.0000) ( Table 3).  Table 3).
Expression of candidate genes from bioinformatic data and previous studies in OS cell lines and primary cells compared to heathy donor PBMCs.
In the bioinformatic analysis, the obtained gene expression data was from OS cell lines and primary OS cells which did not originate directly from circulating OS cells. To further explore whether the four candidate genes (COL1A2, PLS3, Ezrin and VIM) were highly speci c to clinical OS tumor origin, we analyzed mRNA expression of candidate genes in primary osteosarcomas from OS patients (n = 24) and compared them to PBMC from healthy donors (n = 3) by qRT-PCR. The results showed that the expression of COL1A2, PLS3 and VIM in the primary cells was signi cantly higher than in normal PBMCs while Ezrin expression was non-signi cantly different (P < 0.05) (Fig. S8).

Discussion
The CTCs are invasive circulating metastatic cells migrating from primary or metastatic sites of tumors. These cells, which survive in circulating blood, function as metastasis precursors and may also colonize, forming secondary tumors and causing refractory and recurrent cancer [19][20][21]. Thus, any molecular predictors which demonstrate a high speci city to OS and a strong correlation to metastasis might be applied as CTC detection tools.
In this study, the differences in gene expression between OS cell lines or primary OS cells vs healthy donor cells identi ed signi cant upregulation of 8 novel OS-speci c genes (COL1A2, GJA1, PLS3, COL5A2, COL3A1, CDR1, COL2A1, and EGR1). Most of them, including COL1A2, GJA1, COL5A2, EGR1 and COL3A1, have been previously reported as upregulated genes or the translated proteins of those genes associated with OS progression in OS tissues when compared to normal tissues [22][23][24][25][26]. In addition, high expression of COL2A1, CDR1 and PLS3 has also been found to be related to tumor progression in several types of tumors [27][28][29][30][31].
The previously reported OS markers Ezrin and VIM were also analyzed for gene expression in retrospective samples. Ezrin, a cross-linker protein, plays an essential role in many metastatic phenotypes of cancers including pediatric sarcomas, OS and rhabdomyosarcoma [32]. The expression level of Ezrin was high in OS circulating cells, especially in OS metastatic stage III in the Enneking staging system [11]. Vimentin, a mesenchymal marker, has recently been reported to be an indicator of epithelial-tomesenchymal transition (EMT) associated with migration and metastasis in various cancers as well [33,34]. Vimentin has also been reported to be highly expressed in human OS tumor tissue [35].
The CTC enrichment process is necessary to discard blood components which might [36]. There are several techniques to enrich CTCs from fresh whole blood including Ficoll-Hypaque density gradient centrifugation, lter-based methods, magnetic bead based CD45 negative and vimentin positive selection [10,11,37] which need fresh whole blood or an a nity column for isolating CTCs. The samples for gene expression analysis in our study were enriched by buffy coat preparation. The total RNA was extracted from the − 80 o C frozen buffy coats without preservatives since the frozen cells would otherwise be lysed by ice crystallization after thawing leading to injury [38]. To avoid losing CTC-total RNAs and to reduce the number of enrichment steps, total RNA from all buffy coats was extracted immediately after thawing. Other genetic components which are not released by CTCs, especially leukocyte RNAs, were main interfering factors which were not able to be discarded in this study. Previous publications have demonstrated that leucocyte common antigen (CD45) expression level is related to a number of leukocytes [39,40]. Accordingly, the gene expression level of each OS cell line spiked PBMC sample was presented as a relative quanti cation curve between OS-speci c gene and CD45.
We expected a correlative relationship between the expression of OS-speci c genes and the number of OS cells, i.e., an expression pattern of the relative quanti cation curve of the speci c OS genes (n = 8) and the two evaluated metastatic genes. The results demonstrated that only seven genes (COL1A2, COL3A1, COL5A2, EGR1, PLS3, Ezrin and VIM) out of the 10 genes tested showed a completely or partially separated pattern for at least one of the 5 OS cell lines (Table S1 and Fig. S2-6). CDR1, COL2A1 and GJA1 were not highly expressed among the ve OS cells (SaOS-2, MNNG, MG63, U2OS and 143B) resulting in interference when spiked OS cell lines had high numbers of PBMCs, causing the level of expression of those genes to be an unreliable indicator.
Measurement of candidate genes expression was also performed in clinical samples, including OS and healthy buffy coats. The expression of EGR1, PLS3 and VIM showed statistically signi cant differences between the two groups (P < 0.05) but not COL1A2, COL3A1, Ezrin and COL5A2 (Fig. 3A). Not surprisingly, OS exhibits a high heterogeneity and complexity of genomic and expression level between patients [41].
We further narrowed the samples to patients and donors age younger than 25, then reanalyzed the expression of seven candidate genes. The results indicated that for COL1A2, COL3A1, Ezrin and COL5A2, all gene expression differences between OS patients and donors were noticeably increased while the differences for COL3A1, Ezrin and COL5A2 bordered on statistical signi cance (P < 0.1).
The e ciency of each candidate gene in the prediction of OS was evaluated with binary logistic regression with the MFP algorithm using relative expression data. The model that included VIM, Ezrin, COL1A2 and PLS3 performed the best in terms of ability to discriminate OS samples from healthy donor samples. We identi ed the OS probability cut-off point at 0.2943. In further clinical studies, patients with a higher possibility of OS than the cut-off point will require con rmation of results with standard diagnostic tests. Patients who present with an OS probability higher than 0.8243 are suspect for metastasis occurrence; in those cases, follow-up with high sensitivity micro-metastasis tests such as bone scans or PET scans is appropriate. In this study, the model identi ed all OS patients as positive; there were two false positive samples from normal buffy coats at a probability cut-off value of 0.2943. The same model also exhibited the ability to predict OS metastasis at a probability cut-off value of 0.8243 with all positive results in metastatic OS (III) samples and some false positive results in nonmetastatic OS and normal samples. We suggest that patients with positive results should be informed and that metastasis should be con rmed using other clinical tools.
The origin of OS is known to be mesenchymal stem cells. Most OS is malignancy developing from osteoblast cells with genetic and epigenetic mutation accumulation [42]. Due to the limited number of samples, it was not possible to isolate circulating OS cells from frozen buffy coat. To con rm whether candidate genes were speci c to OS cells, the expression of COL1A2, EGR1, PLS3, Ezrin and VIM was evaluated in primary OS cells with qRT-PCR and compared to PBMCs from healthy donors. Among the four candidate genes, the expression of COL1A2, PLS3 and VIM in OS cells was signi cantly higher than PBMC (P < 0.05) but not Ezrin [ Fig. S8].
In agreement with previous studies, our qRT-PCR results demonstrated that VIM normally expresses in OS during diagnosis and consistently expresses in both buffy coats and primary cells [35,42,43]. Expression of COL1A2 has been found in human osteoblast lineages [44] and has been shown to be related to migration, invasion, proliferation, and metastasis promotion in various cancers including OS, with signi cantly higher expression in buffy coats and primary cells [23,45,46].
The roles of PLS3 on actin (F-actin) formation in normal bone have been previously described. Oncology studies have demonstrated that PLS3 is a novel CTC marker for prognosis in breast and colorectal cancer [31,47]. In our study, a signi cantly higher expression level of PLS3 was found in OS patients when compared to healthy donors. Thus, PLS3 could be a novel liquid biopsy marker for prognostic prediction in OS.
On the other hand, ezrin, which has been reported to be a typical EMT marker, was borderline signi cantly highly expressed in frozen OS buffy coat but not in primary OS cells. Due to the fact that there was no difference in Ezrin expression between OS primary cells and PBMCs, we suggest that the population of clusters of tumor cells with the potential to become CTCs in the tumor population was low. However, Ezrin expression in primary cells might not be an indication of CTC clusters. High expression of Ezrin, both in RNA and in protein levels in OS patients, was positively correlated with metastatic stage and OS recurrence [48,49].
Comparative expression analysis in this study did not nd signi cant differences in VIM or Ezrin between OS primary cells and PBMCs, although they have previously been reported as OS candidate genes [11,50].
These gene expression unrepresented in whole tumor population, this might be de nitive for OS cells in circulating blood. Comparative expression analysis of genes between single circulating tumor cells and other circulating blood cells should be further investigated. This could potentially improve the detection of OS circulating cells by identifying more precise predictors.

Conclusions
This study demonstrated the feasibility of using VIM, Ezrin, COL1A2 and PLS3 as potential candidate biomarkers to detect CTCs in OS diagnostic and metastatic monitoring using simple molecular