## Data Collection:

This study's data were collected from Omid hospital, a referral cancer center affiliated with Isfahan University of Medical Sciences and Isfahan COVID-19 Registry (I-CORE)(13). The study included all patients with active or previous cancer with confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection by RT-PCR from February 2020 to February 2021. Patients with a radiological or clinical diagnosis of COVID-19 without a positive RT-PCR test were not included in this analysis.

28 features from each patient were collected, including demographics data like age and sex, cancer characteristics including type of cancer, history of recent chemotherapy, COVID-19 symptoms including fever, cough, dyspnea, weakness, diarrhea, nausea, and vomiting. Comorbidities such as hypertension and diabetes were also recorded. All the laboratory data at the time of admission, including White Blood Cell (WBC), Absolute Neutrophil Count(ANC), Absolute Lymphocyte Count(ALC), Neutrophil to Lymphocyte Ratio (NLR), Hemoglobin (HB), Hematocrit (HCT), Mean Corpuscular Volume (MCV), Mean Corpuscular Hemoglobin Concentration (MCHC), Red cell Distribution Width (RDW), Platelets (PLT), Blood Urea Nitrogen (BUN), Creatinine (CR), Aspartate Transaminase (AST), Alanine transaminase (ALT), C-Reactive Protein (CRP) also were collected.

## Defining patient outcomes:

ICU admission during hospitalization and patient status at the time of discharge (alive/dead) were defined as patients' outcomes. Machine learning models aim to predict these outcomes based on collected features and select the most important features that affect the outcomes.

## Data Pre-processing:

The dataset is recorded very carefully; consequently, only less than 4 percent of all features in the whole dataset are detected as missing values. The missing values are all from the lab tests. For filling the missing values, we replace every missing value with the mean of the corresponding feature. After that, we randomly partition the dataset into two sets: the training set containing 305 inputs and the test set containing 34 patients' information. Then, as the third step, we apply a feature scaling algorithm to the training dataset, called standardization, to ensure that none of the features are dominated by other features. To do this, we used the formula \({X}_{Stand}\left(X\right)=\frac{X-Mean\left(X\right)}{\sqrt{Var\left(X\right)}}\) in which Var(X) stands for the variance of X.

## Prediction Algorithms:

We apply several machine learning prediction algorithms, namely Logistic Regression (LR), Naïve Bayses (NB), k-Nearest Neighbours (kNN), Random Forest (RF), and Support Vector Machine (SVM). In the following, we bring experimental settings of the reported prognostic models.

The first method for providing the results is binary logistic regression which is applied for data sets with "0" and "1" class labels.

The second technique is naïve Bayes which is a conditional probability model. Assume that x is the feature vector corresponding to input, and C_1 and C_2 are the two possible label classes. Using Bayes theorem in probability theory, the model computes conditional probabilities Pr(C_1| x) and Pr (C_2 | x) and then, based on the comparison of these two values, decides for the label of x. Since there are binary-valued variables in our dataset, we use the Bernoulli version of naïve Bayes.

k-Nearest Neighbors (kNN) classifier is another algorithm that is applied to this dataset. In this algorithm, the new input label is determined based on the labels of k nearest samples in the dataset. The user specifies the input parameter k, which is typically small. We set this variable to be seven by applying some different values.

Random Forest (RF) method is based on the bootstrap aggregating technique because the prediction of a single tree is very sensitive to noise in its training set, while the average of many trees is not if the trees are not correlated. Consequently, this algorithm reduces the variance of the model in some sense. To run the algorithm, you can determine the maximum number of steps to go to the depth of the tree for each tree. We found the best value for this parameter based on our training dataset. Therefore, we set this tuning parameter to be 13; however, if it is not determined, nodes are expanded until all leaves are pure or until all leaves contain less than the minimum number of samples required to split an internal node.

As another method, we use SVM. Let us have an N-dimensional features vector. The Support Vector Machine (SVM) model approach finds a hyperplane in an N-dimensional space with the maximum margin that distinctly classifies the data points.

## Prediction Performance Evaluation Metrics:

To explain these measures, first, we need some preliminary definitions. As we have a binary classification problem, each label can be seen as positive or negative. By a true positive (TP) value, we mean a correctly predicted positive label. Similarly, a true negative (TN) stands for an indeed detected negative label. A false positive (FP) value shows an input data point that is predicted by positive while its actual label is negative. Finally, a false negative (FN) value is used for a data point where its real class is positive, but the algorithm predicts its label as negative.

One of the most common evaluation measures is accuracy, which is defined as the ratio of correctly predicted items to the total number of items. If we have imbalanced label classes, then accuracy cannot be applied as a good evaluation criterion. For example, if 98 percent of the labels are positive in a dataset, then the trivial algorithm which assigns positive to all the inputs reaches an accuracy of 98 percent. In this case, to better understand the algorithm's performance, we should realize how much of each label class is predicted correctly.

The second parameter, precision, is the ratio of truly detected positive labels to the total number of predicted positive labels in the test set. In other words, Precision=TP/(TP+FP).

Also, recall is defined as the number of correctly predicted positive labels divided by the total number of positive labels. Therefore, Recall=TP/(TP+FN). In binary classification, it is worth noting that recall of the positive and negative classes are called sensitivity and specificity, respectively. Finally, F1-score, a mixture parameter, is the weighted average of precision and recall, i.e., F1-score= (2 (Precision* Recall))/(Precision+ Recall)=TP/(TP+1/2(FP+FN)). It is worth noting that this score is much better than accuracy if we have imbalanced classes. On the other hand, it is better to look at accuracy whenever FPs and FNs have the same weight in the framework. Also, if the cost of FPs and FNs are very different, then both precision and recall should be taken into account.

## Feature Selection:

To find out the most important features for predicting outcomes, a list of the five most important features is obtained for each prediction outcome. We applied SelectKBest, a univariate feature selection method, which identifies the k highest scoring features.

## Ethics Statement:

This study was approved by the Research Ethics Committees of Vice-Chancellor in Research, Medical University of Isfahan (Approval ID: IR.MUI.RESEARCH.REC.1399.004).