Patients and Treatments
We prospectively recruited 12 patients (7 males and 5 females, mean age 14.6 ± 4.8 years) with primary osteosarcoma who were admitted to the First Affiliated Hospital of Sun Yat-Sen University from August 2011 to March 2012. Eight patients had primary osteosarcoma of the distal femur and 4 had primary osteosarcoma of the proximal tibia. The histological types included osteoblastic (n = 7), chondroblastic (n = 4), and fibroblastic (n = 1) osteosarcoma.
All patients received four cycles of NACT including high-dose methotrexate, pirarubicin, and ifosfamide with or without cisplatinum. Limb-salvage surgery was performed 3 weeks after chemotherapy. Routine MRI examination was performed within 3 days before surgery.
MRI Protocols
The MRI data of all patients were acquired using an extremity coil on a Siemens Magnetom Trio 3.0T whole-body magnetic resonance scanner (Magnetom Trio, Syngo MR 2006T, Siemens Medical Solution, Forchheim, Germany) at Department of Radiology of the First Affiliated Hospital of Sun Yat-Sen University.
T1-weighted imaging (T1WI) scanning adopted the axial spin-echo sequence with repeat time (TR) = 659 ms and echo time (TE) = 11 ms. T2-weighted imaging (T2WI) scanning adopted coronal, sagittal, and axial fast spin-echo sequences with or without fat suppression (TR = 4660 ms, TE = 96 ms). Axial DWI was performed using the single-shot spin-echo echo-planar imaging sequence with the following scan parameters: TR = 3200 ms, TE = 82 ms, echo-planar imaging (EPI) factor = 3, b-values = 0, 800 s/mm2. Finally, we performed a delayed enhanced scan with the same parameters as the axial non-enhanced T1WI sequence, and the subtracted images were automatically generated.
All the axial plane scans were perpendicular to the longitudinal axis of the body and parallel to the tibial plateau. The field of view and the centre of the layer were consistent (with the largest cross-section of the tumour as the centre of the layer), with slice thickness = 5 mm and interslice gap = 1 mm.
Sampling of gross specimen sections and grouping of the sampling areas
The resected gross specimens from limb-salvage surgery were fixed in 10% buffered formaldehyde solution, and sectioned to axial slices with a thickness of 5 mm corresponding to the axial MRI layers. Section-by-section coregistration was performed between MRI and the specimens by a radiologist and an experienced musculoskeletal pathologist to select 6–10 well-matched specimen sections from each patient. Rectangular tumour tissue samples ranging from 10 × 15 mm to 15 × 20 mm were drawn on these specimen sections corresponding to the homogeneous signal intensity areas on T1WI, T2WI, and DWI. Depending on the size of tumour, 9 to 24 sampling areas were selected from each patient, and a total of 127 tissue samples were obtained from the 12 resected specimens. These tissue samples were fixed, decalcified, dehydrated, embedded with paraffin, sectioned, and stained with hematoxylin and eosin (H&E).
In this study, microscopically viable sarcomatous cells, tumour osteoid, tumour bone, viable chondrosarcomatous cells with cartilaginous matrix, sarcomatous cells necrosis, post-necrotic collagen, liquefactive necrosis, blood spaces, and secondary aneurismal bone cysts (ABC) were recorded for all tissue samples by pathologists blinded to the MRI findings. Areas with tumour cell necrosis less than 10% were defined as the tumour viable areas, while areas with tumour cell necrosis greater than or equal to 90% were defined as the tumour necrotic areas. Areas with tumour cartilage greater than 50% were defined as the cartilaginous tumour, while areas with tumour cartilage less than or equal to 50% were defined as non-cartilaginous tumour. Thus, all the tissue samples were classified as non-cartilaginous tumour viable areas, cartilaginous tumour viable areas, non-cartilaginous tumour necrotic areas, tumour necrosis cystic/haemorrhagic and secondary ABC areas, and tumour post-necrotic collagenised areas. Among them, the non-cartilaginous viable tumour areas and cartilaginous viable tumour areas belong to the survival areas of tumour, while the non-cartilaginous tumour necrosis areas, collagen areas after tumour necrosis, tumour necrosis cystic/haemorrhagic and secondary ABC areas are classified as the nonviable areas of tumour (17) ( Fig. 1).
Classification With Machine Learning
Taking axial T1WI and T2WI as reference, circular or oval regions of interest (ROIs) were placed on T2WI, subtract-enhanced T1WI (ST1WI), and the ADC maps which were coregistered to the histological sampling areas; this was performed jointly by two experienced radiologists (MLS and ZHG). The size of ROIs was in the range of 50–250 mm2 (Fig. 2). The MRI parameters on the ROIs, namely ADC and the signal intensity of T2WI and ST1WI, were measured. We divided the above MRI parameters by the respective signal intensity of normal muscle to obtain the corresponding standardised values, namely rADC, rT2WI, and rST1WI.
Our previous studies have demonstrated by statistical methods that the differences in ADC values among the cartilaginous tumour survival areas, the post-necrotic collagenised areas, and the tumour necrosis cystic/haemorrhagic and secondary ABC areas were in significant (17). Thus, in this study, we performed three classification tasks using a supervised machine learning method based on the random forest (RF) algorithm: distinguishing tumour survival from tumour nonviable; distinguishing non-cartilaginous tumour survival from tumour nonviable; and distinguishing cartilaginous tumour survival from tumour nonviable. We performed the training and testing by using the Python scikit-learn learning library (https://scikit-learn.org/stable/). The models were constructed with rADC values only, or with all the above normalised parameters, for comparisons between different feature inputs. Performance of the models was evaluated by 5-fold cross-validation. We calculated the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) to evaluate the classification performance (19). Using the optimal threshold determined by the ROC curve, we also calculated sensitivity, specificity and accuracy as
where, TP (true positive) represents the number of samples correctly predicted as positive, TN (true negative) represents the number of samples correctly predicted to be negative, FN (false negative) represents the number of samples incorrectly predicted to be negative, and FP (false positive) represents the number of samples incorrectly predicted to be positive.