RadImageNet: A Large-scale Radiologic Dataset for Enhancing Deep Learning Transfer Learning Research

Most current medical imaging Articial Intelligence (AI) relies upon transfer learning using convolutional neural networks (CNNs) created using ImageNet, a large database of natural world images, including cats, dogs, and vehicles. Size, diversity, and similarity of the source data determine the success of the transfer learning on the target data. ImageNet is large and diverse, but there is a signicant dissimilarity between its natural world images and medical images, leading Cheplygina to pose the question, “Why do we still use images of cats to help Articial Intelligence interpret CAT scans?”. We present an equally large and diversied database, RadImageNet, consisting of 5 million annotated medical images consisting of CT, MRI, and ultrasound of musculoskeletal, neurologic, oncologic, gastrointestinal, endocrine, and pulmonary pathologies over 450,000 patients. The database is unprecedented in scale and breadth in the medical imaging eld, constituting a more appropriate basis for medical imaging transfer learning applications. We found that RadImageNet transfer learning outperformed ImageNet in multiple independent applications, including improvements for bone age prediction from hand and wrist x-rays by 1.75 months (p<0.0001), pneumonia detection in ICU chest x-rays by 0.85% (p<0.0001), ACL tear detection on MRI by 10.72% (p<0.0001), SARS-CoV-2 detection on chest CT by 0.25% (p<0.0001) and hemorrhage detection on head CT by 0.13% (p<0.0001). The results indicate that our pre-trained models that are open-sourced on public domains will be a better starting point for transfer learning in radiologic imaging AI applications, including applications involving medical imaging modalities or anatomies not included in the RadImageNet database.

Introduction ImageNet 2,3 is a dataset comprising millions of images of the natural world. ImageNet, as an opensourced dataset, has been a central resource to derive sophisticated models in computer vision. Transfer learning 1 is a common deep learning approach whereby a model designed for one problem can be reused to initiate a different but related task in machine learning. Due to the lack of annotated images and limited resources of computing power to train new models from scratch, transfer learning has become a popular method in deep learning for researchers to transfer the knowledge gained from pre-trained models to a related problem and thus speed up the training process with fewer input data and improve the performance and generalizability of a deep learning model 11 . Transfer learning with models trained using ImageNet has been extensively explored in medical imaging AI applications. The architectures of VGG 12 , ResNet [13][14][15] , Inception networks [16][17][18][19] , MobileNet 20, and DenseNet 21 pre-trained with ImageNet have been widely adopted and used in medical imaging applications for COVID-19 diagnosis on chest CT 22 , classi cation of brotic lung disease 23, and classi cation of skin cancer 24 . Despite the high performance of many medical imaging models pre-trained with ImageNet, successful transfer learning requires reasonably large sample size, diversity of images, and similarity between the training image database and the target application images. While ImageNet meets the size and diversity criteria, there remains a signi cant dissimilarity between the training images in ImageNet and the medical images in the new task that represents an important limitation. The development of transfer learning strategies to bridge that gap is an active medical imaging machine learning research area.
RadImageNet provides millions of annotated advanced medical images from various modalities demonstrating numerous different pathologies and can be used to develop pre-trained models for predictive tasks in image-based medicine. We propose that a pre-trained model based solely on medical radiographic features from a vast medical imaging database will provide more appropriate feature representation for image-based predictive problems in medicine than pre-trained models derived from the natural images in ImageNet.
In this study, we describe a large-scale, diverse medical imaging dataset, RadImageNet, to generate pretrained convolutional neural networks trained solely from medical imaging to be used as the basis of transfer learning for medical imaging applications. We compare the pre-trained weights derived from RadImageNet and ImageNet on multiple medical imaging use cases, including target image modalities and anatomies not included in the training RadImageNet database. We show that the pre-trained networks generated from RadImageNet exceed the performance of pre-trained models developed from ImageNet. Furthermore, we show how a medical image recognition problem with a small medical image dataset could bene t from pre-trained weights derived from RadImageNet. This provides evidence that the pre-trained weights from RadImageNet can be transferable across multiple modalities, anatomies, and pathologies. Figure 1 illustrates an overview of this study.

The Radimagenet Database
The RadImageNet dataset includes 5 million annotated CT, MRI, and ultrasound images of musculoskeletal, neurologic, oncologic, gastrointestinal, endocrine, and pulmonary pathology. For direct comparison with ImageNet (the initial size for the ImageNet challenge was 1.4 million images), we collected the most frequent modalities and anatomies on the same scale. The RadImageNet dataset was collected between January 2005 and January 2020 from 131,872 patients at an outpatient radiology facility in New York City. Each study was annotated by a board-certi ed fellowship-trained radiologist. As part of the interpretation of each study, the reading radiologist chose images representative of the pathology shown in each exam. The pathology demonstrated on each of these "key images" was annotated, and a region of interest was created to identify the imaging ndings. These annotations were extracted from the key images and provided the basis for the RadImageNet classes. The portions of the RadImageNet database used for comparison to ImageNet consist of three radiologic modalities, eleven anatomies, and 165 pathologic labels ( Fig. 2a and Extended Data Table 1). Inception-Res-Net-v2 16-19 , ResNet50 13-15 , DenseNet121 21 , and InceptionV3 16 convolution neural network architectures were generated from the data in RadImageNet. We strati ed the RadImageNet dataset by patient ID, allowing no overlaps in either training, validation, and test sets. The dataset was split into 75% training set, 10% validation set, and 15% test set. The performance of the models was reported on the test set.
Furthermore, we randomly sampled 2,016 images from the test set and compared the model performance to three senior sub-specialized fellowship-trained radiologists who were uninvolved with the labeling of the images in RadImageNet (Extended Data Fig. 1).

Comparison Of Radimagenet And Imagenet Pre-trained Models
The four aforementioned RadImageNet models were used as the pre-trained models for ve medical imaging applications to compare their performance to the ImageNet pre-trained models. We applied the pre-trained models to transfer learning problems using publically available datasets, including bone age prediction on hand and wrist x-rays 6 pneumonia detection in ICU patients on chest radiographs 7 ; ACL tear detection on MRI 8 ; SARS-CoV-2 detection on chest CT 9 ; and hemorrhage detection on head CT 10 . These applications were selected to evaluate the capabilities of RadImageNet models on multiple applications that included modalities, anatomies, and labels that were and were not contained in the RadImageNet database.

SARS-CoV-2 Detection
For SARS-CoV-2 detection on chest CT, the RadImageNet-trained models showed signi cant improvement in the AUROC compared to the ImageNet-trained models. For RadImageNet-trained Inception-Res-Net-v2 and ResNet50 networks the AUROC was 99.88% (95% CI 99.84%, 99.92%; P < 0.0001) and 99.89% (95% CI 99.85%, 99.92%; P < 0.0001) respectively whereas for ImageNet-trained Inception-Res-Net-v2 and The gradient class activation map 27 (Grad-cam) was used to demonstrate the features learned by the algorithms. We present the Grad-cam (Fig. 4) for the paired algorithms on each of the ve applications to visualize the distinguishing features captured by the models. Grad-cam images of successful predictions of the RadImageNet and ImageNet models were used to compare the learned features.

Potential Clinical Applications
The pre-trained models trained from RadImageNet can improve predictive performance and generalizability for medical imaging applications. We simulated multiple scenarios of different medical imaging applications with multiple modalities, anatomies, and pathologies where RadImageNet models inherited some or none of the knowledge about that application. These applications show that the pretrained RadImageNet models demonstrated signi cant improvement compared to the ImageNet pretrained models. These outcomes suggest that the RadImageNet pre-trained models can improve medical imaging applications where transfer learning is needed. Moreover, gradient class activation maps suggest the interpretation of the RadImageNet models more closely conforms to the regions of interest as de ned by radiologists.

Discussion
The key determinants of successfully developing models using transfer learning are to have source data that demonstrates a certain level of similarity to the target data, diversity of image type, and a large training sample size. Studies [22][23][24] have shown that ImageNet pre-trained models that have a large number of classes and large-scale, despite the low similarity to medical data, demonstrate a high recognition rate in medical imaging analysis. If a large sample size or diverse source data is missing, source data with higher similarity to the target data can also lead to success 28-32 . Models developed from the RadImageNet dataset combine the positive attributes of both methods. RadImageNet consists of 5 million annotated images. In this study, a subset of 1.4 million images was used to match the size of the ImageNet database. Moreover, RadImageNet data are more similar to the target medical imaging data and include 165 classes of target image from multiple modalities and anatomies.
The bone age hand and wrist x-rays dataset is relatively small, consisting of 12,611 images, and demonstrates the lowest similarity to the RadImageNet dataset since the no hand and wrist x-ray studies were contained in the dataset. Despite the reduced similarity of the modality of the target data compared to the modalities in the source data, the four RadImageNet models all resulted in a smaller mean absolute error (P < 0.05) than the ImageNet models. This indicates that while the modalities differed, the underlying features in RadImageNet were more useful than those in ImageNet, suggesting more broad applicability to medical imaging of models derived from RadImageNet than ImageNet.
Three out of four RadImageNet models showed signi cant improvement in AUROC (P < 0.05) than the ImageNet models for pneumonia detection on chest radiographs in ICU patients. This dataset contains 26,684 images, which suggests that a larger size of the target data may compensate for the lack of similarity in source data as pneumonia chest radiographs were not present in the RadImageNet database. This is in contrast to the ACL tear dataset, which is extremely small with only 1,021 images. Three out of four RadImageNet models demonstrated a higher mean AUROC (P < 0.0001) and smaller standard deviation on 5-fold cross-validation. RadImageNet contained both the modality (MRI) and a similar class (ACL injury) to the target data indicating that source data similarity can contribute to extraordinary performance with even a small dataset.
Two out of four RadImageNet models outperformed ImageNet models (P < 0.05) on SARS-CoV-2 detection on chest CT. RadImageNet contained the same modality (chest CT) and a similar label (pulmonary consolidation) which likely helped the RadImageNet models outperform the ImageNet models. However, the similarity of the images was likely compensated by the large size of the target SARS-CoV-2 dataset (58,766 images), accounting for the similar performance of the remaining models, which showed a non-signi cant difference between RadImageNet and ImageNet. The performance on the intracranial hemorrhage data on head CT was likely due to a similar phenomenon. Three out of four RadImageNet models illustrated signi cant improvement (P < 0.05). These results are likely due to a combination of a similar label (acute intra-axial hemorrhage) and anatomy (brain) being included in the RadImageNet dataset, whereas the ImageNet models were able to compensate due to the large target data set (573,614 images). This again suggests that the underlying image features in RadImageNet are transferable to other medical pathologies other than those included in the database.
These ve clinical applications show that RadImageNet pre-trained models, despite varying levels of similarity and diversity to the target medical imaging data, demonstrate superiority to ImageNet pretrained models and hold promise to aid development and clinical translation of medical imaging arti cial intelligence. Immediate adoption of these models can be achieved by open sourcing them on public domains (https://github.com/BMEII-AI/RadImageNet).
Our proposed RadImageNet models do have limitations. First, one major limitation of this study is that only a single sequence per patient was provided for assessment. Many pathologies require additional sequences and/or adjacent images for accurate diagnosis. Second, the images presented may contain multiple pathologies but only one label. The annotating radiologists may only label the major ndings of the key diagnosis while not exhaustively annotating all other pathologies demonstrated on the image. Third, we provided the radiologists with full-resolution images while the RadImageNet models utilized lower resolution images in algorithm development due to processing limitations. These lower resolution images may obscure small areas of pathology. Finally, the number of classes in the limited RadImageNet dataset used for comparison to ImageNet was less than the number in ImageNet.
In future studies, higher spatial-resolution images could result in higher performance for recognition of smaller foci of pathology. The number of classes of pathology in RadImageNet can be further expanded to match the number of classes in ImageNet. The success of ImageNet is in part due to the number of classes available to discriminate between objects. For example, "dog" is expanded to encompass "Husky", "Golden Retriever", etc. Moreover, performance could be improved by introducing the regions of interest, as de ned by radiologists, to highlight pathological appearance in the images, as well as by providing additional sequences and/or adjacent images. In addition, pre-trained models for CT only or MRI only derived from the RadImageNet dataset can be developed for CT or MRI applications compared to the comprehensive RadImageNet models. Finally, more studies of ne-tuning the pre-trained models compared to the standard pre-trained models will be further analyzed.
In conclusion, RadImageNet (5 million annotated CT, MRI, and ultrasound images) and the associated pre-trained models illustrate the important role of a database with a higher degree of similarity between the source images and the target application as a starting point for transfer learning approaches in medical imaging analysis. We believe the proposed RadImageNet pre-trained models, based on a largescale, diverse, and high-quality annotated dataset of medical images with a high degree of similarity to the target applications, could improve the recognition rate and visualizations of other medical imaging CNN-based transfer learning applications. Online Methods Study participants.

Declarations
The study was approved by the institutional review board of East River Medical Imaging (data provider) and the Icahn School of Medicine at Mount Sinai in New York (data receiver). The institutional review boards waived the requirement for written informed consent for this retrospective study, which evaluated de-identi ed data and involved no potential risk to patients. To avert any potential breach of con dentiality, no link between the patients, the data provider, and the data receiver was made available. A third party issued a certi cation of de-identi ed data transfer from the data provider to the data receiver.
We collected a total of 203,341 CT (52,691), MRI (142,422), and ultrasound (8,228) studies from 131,872 patients between 1 January 2005 and 31 January 2020 where patients had diagnostic scans at East River Medical Imaging in New York.
Multimodal and multi-anatomy components CT exams of the chest, abdomen, and pelvis; MRI exams of the shoulder, knee, ankle, foot, spine, knee, hip and brain; and ultrasound studies of the thyroid, abdomen, and pelvis were collected in the curation of the RadImageNet database. For each study, the radiologist annotated key images with a corresponding diagnosis. To create a reasonable number of classes, the annotations were grouped by pathology and imaging appearance resulting in a total of 11 anatomies from 3 modalities and 165 diagnostic labels. For example, ACL tears and ACL sprains on MRI were combined to a single class -MRI, Knee, ACL injury (see Extended Data Table 3).

Normal studies
To better investigate the characteristics of abnormal key images for model development, we queried 8,528 normal studies based on radiology reports for the aforementioned modal and anatomical images. Each normal study was further con rmed by a board-certi ed radiologist (T.D.). All associated diagnostic sequences and images were included.

RadImageNet model development
To develop the pre-trained models from RadImageNet, we trained 4 different convolutional neural networks without importing weights from existing models, namely, Inception-Res-Net-v216-19, ResNet5013-15, DenseNet12121, and InceptionV316. The dataset was split into 75% training set, 10% validation set, and 15% test set strati ed by patient ID. The associated images from the same patient were in the same set.
Rather than importing the weights from existing models, we randomly initialized the weights to train the individual models. A global average pooling layer, a dropout layer at a rate of 0.4, and the output layer activated by the softmax function were added after the convolutional neural networks. The models returned a list of probabilities that the image corresponded to one of the 165 labels.

RadImageNet model performance
To investigate the performance of the AI models trained on RadImageNet, the top-1 accuracy and top-5 accuracy were calculated. The highest probability of all predicted categories was used to calculate the top-1 accuracy, whereas the top 5 highest probabilities were used to evaluate the top-5 accuracy.
Image preprocessing (window leveling) If window and leveling parameters were set by the annotating radiologist, these were used when transforming the data from DICOM to PNG. Otherwise, a recommended window was used for CT based on the study and reconstruction kernel (for example, a chest CT reconstructed with a lung kernel image had a standard lung window applied) or an auto window generated for MRI images33-36. Detailed CT window and leveling data can be found in Extended Data Table 4.

Reader Studies
To further evaluate the RadImageNet models' performance, we compared model performance to three senior subspecialized radiologists, a neuroradiologist (A.D, clinical experience 12 years), a cardiothoracic radiologist (A.J., clinical experience 10 years), a musculoskeletal radiologist (M.H., clinical experience 10 years). Studies were randomly sampled from the test set to create reader studies in around 2,000 images. If the number of images in a category was greater than 13 images, then a random selection of 14 images from that category was selected for reader studies. Otherwise, all images within that category were presented for reader studies. Radiologist performances were compared to the aforementioned four neural networks trained on RadImageNet according to the expert's specialty. Individual anatomy accuracy was reported for the comparison of radiologists and RadImageNet models.

Transfer learning applications
To compare the performance of convolutional neural networks created from RadImageNet and ImageNet datasets, we used Inception-Res-Net-v2, ResNet50, DenseNet121, and InceptionV3 models on ve applications. All of the paired models were structured with the same parameters and layers for direct comparison with respect to pre-trained weight from RadImageNet and ImageNet.
Bone age prediction on hand and wrist x-rays: this dataset was obtained from RSNA Pediatric Bone Age Challenge6. The dataset was split into 75% training set, 10% validation set, and 15% test set. The mean absolute error was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the linear function were introduced after the last layer of the pre-trained models. A total of 50 epochs were trained, whereas the models with the lowest mean absolute error in such epochs were saved for further evaluation and comparison. A modi ed Bland-Altman plot for the absolute difference between the ground truth and prediction was used to assess the consistency of mean absolute error and evaluate model performance.
Pneumonia detection on chest radiographs: the dataset was acquired from the RSNA Pneumonia Detection Challenge7 to identify pneumonia on chest radiographs (CXRs) from ICU patients. Instead of building deep learning models to localize pneumonia on CXRs, we created a classi cation model for the detection of pneumonia. The cases provided with bounding box information were considered pneumonia cases, while the subjects with no bounding box information were considered non-pneumonia cases. The dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the softmax function were introduced after the last layer of the pre-trained models. A total of 40 epochs were trained, whereas the models with the lowest validation loss in such epochs were saved for further evaluation and comparison.
Knee ACL tear detection on MRI: this dataset was requested from the Stanford MRNet dataset for ACL and Meniscus Tear Detection8. The original dataset contained all images from knee MRIs with and without ACL tears. To more easily compare results between the RadImageNet and ImageNet derived models, we manually selected the sagittal images containing either a normal or torn ACL (images were selected by T.D., a musculoskeletal radiologist with 12 years of clinical experience). The labels of "tear" or "no-tear" were maintained from the original dataset. A total of 4 studies were excluded due to poor image appearance due to excessive susceptibility and/or motion. Model performance on the modi ed dataset was conducted using 5-fold cross-validation due to the small size of the dataset. In each fold, the dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer were introduced after the last layer of the pre-trained models. A total of 50 epochs were trained. The models with the lowest validation loss in each epoch were saved for further evaluation and comparison.
SARS-CoV-2 detection on chest CT: the dataset was obtained from the China National Center for Bioinformation9. Zhang et al. provided key images demonstrating both proven COVID-19 and communityacquired pneumonia where normal chest CT scans were not included. These labels were used to develop the models. We strati ed the dataset by patient ID. The dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the softmax function were introduced after the last layer of the pre-trained models. A total of 40 epochs were trained. The models with the lowest validation loss in such epochs were saved for further evaluation and comparison.
Hemorrhage detection on head CT: this dataset was obtained from the RSNA intracranial hemorrhage detection10. Slice level labels (intracranial hemorrhage or no intracranial hemorrhage) were provided for each study. Due to the imbalance of the dataset, the non-hemorrhage images from positive studies were excluded for model development. The dataset was strati ed by the patient ID and was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the softmax function were introduced after the last layer of the pre-trained models. A total of 30 epochs were trained.
The models with the lowest validation loss in such epochs were saved for further evaluation and comparison.

Gradient weighted class activation mapping
To understand the model interpretability, we used gradient weighted class activation mapping32 (Grad-CAM) to visualize where the models make predictions in an image. The Grad-CAM highlights the important regions in an image by using the gradients of the target layer that ows into the nal convolutional layer to generate a localization map. For both RadImageNet and ImageNet models, the output layer was the target layer, whereas the conv_7b_ac, conv5_block3_out, relu, and mixed10 were selected as the nal convolutional layer to generate the Grad-CAM for Inception-Res-Netv2, ResNet50, DenseNet121, and InceptionV3 networks, respectively.

Statistical analysis
The normality of the distribution of predicted bone age on hand and wrist x-rays was con rmed by the Shapiro test37,38. The paired t-test39 was used to calculate the two-sided P-value for mean absolute error in bone age prediction. The Delong40 method was used to evaluate the 95% con dence interval of AUROC and to calculate the two-sided P-value for the comparison of RadImageNet and ImageNet models. Statistical signi cance was de ned as a P-value < 0.05. The statistics of AUC comparisons were computed in the pROC41 package in R. The Shapiro test and paired t-test were performed in the statsmodels package in Python.
International Society for Optics and Photonics, 10575, 105752R, (2018). 35. The study was approved by the institutional review board of East River Medical Imaging (data provider) and the Icahn School of Medicine at Mount Sinai in New York (data receiver). The institutional review boards waived the requirement for written informed consent for this retrospective study, which evaluated de-identi ed data and involved no potential risk to patients. To avert any potential breach of con dentiality, no link between the patients, the data provider, and the data receiver was made available. A third party issued a certi cation of de-identi ed data transfer from the data provider to the data receiver. 37. Multimodal and multi-anatomy components 3 . CT exams of the chest, abdomen, and pelvis; MRI exams of the shoulder, knee, ankle, foot, spine, knee, hip and brain; and ultrasound studies of the thyroid, abdomen, and pelvis were collected in the curation of the RadImageNet database. For each study, the radiologist annotated key images with a corresponding diagnosis. To create a reasonable number of classes, the annotations were grouped by pathology and imaging appearance resulting in a total of 11 anatomies from 3 modalities and 165 diagnostic labels. For example, ACL tears and ACL sprains on MRI were combined to a single class -MRI, Knee, ACL injury (see Extended Data ImageNet datasets, we used Inception-Res-Net-v2, ResNet50, DenseNet121, and InceptionV3 models on ve applications. All of the paired models were structured with the same parameters and layers for direct comparison with respect to pre-trained weight from RadImageNet and ImageNet.
52. Bone age prediction on hand and wrist x-rays: this dataset was obtained from RSNA Pediatric Bone Age Challenge 6 . The dataset was split into 75% training set, 10% validation set, and 15% test set. The mean absolute error was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the linear function were introduced after the last layer of the pre-trained models. A total of 50 epochs were trained, whereas the models with the lowest mean absolute error in such epochs were saved for further evaluation and comparison. A modi ed Bland-Altman plot for the absolute difference between the ground truth and prediction was used to assess the consistency of mean absolute error and evaluate model performance.
53. Pneumonia detection on chest radiographs: the dataset was acquired from the RSNA Pneumonia Detection Challenge 7 to identify pneumonia on chest radiographs (CXRs) from ICU patients. Instead of building deep learning models to localize pneumonia on CXRs, we created a classi cation model for the detection of pneumonia. The cases provided with bounding box information were considered pneumonia cases, while the subjects with no bounding box information were considered nonpneumonia cases. The dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the softmax function were introduced after the last layer of the pre-trained models. A total of 40 epochs were trained, whereas the models with the lowest validation loss in such epochs were saved for further evaluation and comparison.
54. Knee ACL tear detection on MRI: this dataset was requested from the Stanford MRNet dataset for ACL and Meniscus Tear Detection 8 . The original dataset contained all images from knee MRIs with and without ACL tears. To more easily compare results between the RadImageNet and ImageNet derived models, we manually selected the sagittal images containing either a normal or torn ACL (images were selected by T.D., a musculoskeletal radiologist with 12 years of clinical experience).
The labels of "tear" or "no-tear" were maintained from the original dataset. A total of 4 studies were excluded due to poor image appearance due to excessive susceptibility and/or motion. Model performance on the modi ed dataset was conducted using 5-fold cross-validation due to the small size of the dataset. In each fold, the dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer were introduced after the last layer of the pre-trained models. A total of 50 epochs were trained. The models with the lowest validation loss in each epoch were saved for further evaluation and comparison.
55. SARS-CoV-2 detection on chest CT: the dataset was obtained from the China National Center for Bioinformation 9 . Zhang et al. provided key images demonstrating both proven COVID-19 and community-acquired pneumonia where normal chest CT scans were not included. These labels were used to develop the models. We strati ed the dataset by patient ID. The dataset was split into 75% training set, 10% validation set, and 15% test set. The binary cross-entropy was selected as the loss function. A global average pooling layer, a dropout layer, and an output layer activated by the softmax function were introduced after the last layer of the pre-trained models. A total of 40 epochs were trained. The models with the lowest validation loss in such epochs were saved for further evaluation and comparison.  Figure 1 Curation of a medical imaging database RadImageNet, development of pre-trained convolutional neural networks over RadImageNet, and comparison of RadImageNet pre-trained models and ImageNet pretrained models on multiple medical imaging applications. This study was designed in ve pathways.

Figures
First, key images and associated diagnoses were annotated by radiologists. Second, the images and diagnoses were further grouped by modalities, anatomies, and labels according to their imaging patterns to construct the medical imaging only database RadImageNet. Third, four neural networks as pre-trained models were trained from scratch on the basis of RadImageNet. Fourth, the pre-trained models from RadImageNet and ImageNet were utilized on ve medical imaging applications. Finally, comparisons of the pre-trained models and the visualizations of gradient class activation maps were evaluated.   Figure 1 demonstrates the comparison between these four networks and senior radiologists on a dataset randomly sampled from the test set.