Analysis and prediction on postoperative indicators
Our first question is whether the range of postoperative indicators could be predicted at the moment when surgery is done. As the dimension of predictors is relatively high compared to the number of samples, and only a small portion of the predictors should be important for the each of postoperative indicators, we randomly chose 10 predictors as input for each model, as described in Method (Machine learning models construction) part.
The performance of prediction” varies across different algorithms. As the training and verification data for each model is different, we evaluate the overall performance of each algorithm by the portion of ‘good’ (defined in Method ‘Machine learning parameter analysis’ part) models. The reason we applied the indicated criteria is, a ‘good’ ML model should at least predict better than a non-medical person, which may apply the strategy of guessing the median. In general, Support Vector Regression shows the best prediction performance, followed by Neural Network Regression. Linear Regression shows the worst prediction performance (Table 3).
Table 3
Comparison of prediction performance over all regression algorithms.
Algorithm | Linear regression | Regression Tree | Support Vector Regression | Neural Network Regression |
Number of responses in which portion of good models is the highest | 1 | 2 | 19 | 6 |
Number of responses in which portion of good models is the lowest | 19 | 7 | 0 | 2 |
The performance of predictions also depends on the responses. There are higher portion of ‘good’ models on the predictions of platelet level (Hplt, Lplt), hemoglobin level (Hhgb, Lhgb), red blood cell count (Hrbc, Lrbc), and systemic inflammation index level (Hsii, Lsii) in the first three days after the surgery, indicating that the prediction on these of postoperative indicators could be easier. This is reasonable, as TEVAR is a minimally invasive surgery without significant blood loss, hence little influence on hemoglobin, platelet and red blood cell levels. On the other hand, predictions on postoperative body temperature (Htemp, Ltemp) and lymphocyte percentage (Hlympprop, Llympprop) could be difficult to predict. There are also many postoperative indicators, such as white blood cell count (Hwbc, Lwbc), neutrophil count (Hnuetcount, Lnuetcount), lymphocyte level (Hlymp), (platelet-lymphocyte ratio (Hplr, Lplr) and systemic inflammation response index (Hsiri, Lsiri), shows huge differences between algorithms. This may indicate that the prediction on those indicators could be complicated, which may consider the interactions between inputs (Linear and Tree regression here treat each input more individually, while Support Vector and Neural Network Regression consider more complicated interactions) (Fig. 3, Table S3). We also looked at the performance of individual models by selecting the ‘best’ model (described in Method ‘Digital twin prediction’) among all ‘good’ models under each algorithm (Table S4). The result is aligned with our finding, as the prediction of those easier-predicted responses are relatively more accurate compared to those harder-predicted responses (Fig. S1-S4, Table S3).
Next, we focused on finding important parameters for postoperative indicators. We believe that the prediction on responses should be more accurate when important parameters are selected as inputs. As a result, the importance of parameters for a given postoperative indicator should rely on the frequencies they appeared in ‘good’ models (Details described in Method ‘Machine learning parameter analysis’ part). The frequencies of each predictor under different algorithms are shown in Table S5- S8.
There are some interesting findings from parameter analysis. From the point of data itself, some responses have similar important parameters found across different algorithms (Hplt, Lplt, Hrbc, Lrbc, Hhgb, Lhgb, Lplr, Hsii, Lsii), while some other responses have a high divergence in important parameters found (Htemp, Ltemp, Hneut, Lneut, Hneutcount, Llympprop, Hmonprop, Lmonprop) (Fig. 4, Table S9-S10). We can see that the distribution of selected important parameters is related to the performance of prediction on a given response (Fig. 3, Table S3), which means some postoperative indicators are hard to be predicted because effective important parameters are hard to be identified. We can further infer that either the impact of parameters for these postoperative indicators are highly complicated and interactive, or they are missing from our current measurement. In future studies, we may collect data from larger matched patient cohorts under controlled study setting to reduce the impact of inherent data deviation.
From the point of medical research, the majority of single postoperative indicators are strongly correlated to their corresponding or highly related preoperative parameters (Table S9), and constituting parameters are identified for postoperative composite inflammation indicators (“preplt” for “Hsii” and “Lsii”, “monocyteco” for “Hsiri” and “Lsiri”). These results are in line with common medical knowledge, which validates our ML-based analysis. Moreover, there are some interesting findings. Coverage of left subclavian artery during TEVAR (“lsacoverag”) is found to be an important indicator for postoperative lymphocyte proportion (“Hlympprop”) and neutrophil count (“Lneut”) under several algorithms. Length of intensive care unit (ICU) stay (“lohicu”) is found to have important impact on postoperative neutrophil count, lymphocyte proportion and lymphocyte count (“Hneut”, “Llympprop” and “Llymp”). Those relationships have not been studies and reported yet, but patients requiring LSA coverage and longer ICU stay may be more advanced in disease severity, or present with more comorbidities including organ ischemia and plural effusion, which may impact the indicated postoperative indicators. On the other hand, the changes in postoperative parameters may result from differences in operative design and ICU stay time. Considering the complexity of the context and the various confounding factors, the clinical relevance of our findings should be explored and evaluated carefully from the medical aspect, which could be a future study direction.
Analysis and prediction on EL and long-term outcome
Our prediction models of postoperative indicators serve as a proof of concept for our method. More clinical-relevant problems are the analysis and prediction on EL and long-term outcome of the patients). We also focused on the portion of ‘good’ models here. The reason behind the criteria (defined in Method ‘Machine learning parameter analysis’) here is, a ‘good’ ML model should provide meaningful clinical information, which means the ‘positive’ chance should at least be higher when the model indicates ‘positive’. The result indicates that, among all algorithms, Neural Network Classification and Classification Tree show relatively better performance compared to Logistic and Support Vector Classification, as more ‘good’ models are trained under these two algorithms (Fig. 5, Table S11) This is different from the prediction of post-surgical indicators, which indicates that the choice of algorithm could be case-specific. Though long-term outcome requires prediction over a much longer time compared to EL, it is surprising that the prediction on the prior seems to be much easier compared to the latter (Fig. 5, Table S11). This is also the case when we focus on individual models, as the accuracies from ‘best’ models in the prediction of long-term outcome are higher than those of EL under all algorithms (Table S12). However, the comparison in prediction performance among different algorithms may not be helpful, as ‘positive’ sample size is too small in testing dataset (Fig. S5-S8).
Our final clinical interest lies in exploring important influencing parameters for both EL and long-term outcome. Again, we focused on the frequencies of predictors appeared in ‘good’ models as described in the previous section. The result is shown in Table S13-S16. The distribution of important parameter identified in different algorithms follows the same pattern of our finding in the prediction of postoperative indicators: a more concentrated distribution in important parameters reflects a better overall prediction (Fig. 5–6, Table S11, S17-S18).
As the important parameters found for EL are highly variant across algorithms, it is hard to conclude which parameters are truly important for EL (Table S17). From the clinical aspect, however, there are several important findings. Height is found as a significant indicator for EL under Tree classification. This impact has not been considered in the past. Though further medical study should be conducted to analyze and validate our finding here, we propose that the reason may lie in the impact of height on aortic diameter, aortic arch tortuosity and arch curvature angulation, which may influence EL together with pathology type. Another important indicator is time after TBAD onset (“onsettime”) under Logistic classification. While currently no conclusive connection can be made between TBAD onset time and postoperative outcomes including complications and long-term outcome[16], this could be a direction for further studies. As for long-term outcome, “dproximal1”, “age” and “weight” could be most important impacting factors. The presence of “age” as an indicator for long-term outcome is expected, which added credibility to the method applied in this study. The importance of proximal endograft diameter (“dproximal1”) may be relevant to patient age since older patients have more dilated aorta, thus requiring larger aortic endografts, although the relevance is indeterminate.
Parameter analysis through traditional statistical methods
While we have developed a new method of parameter analysis for indicated responses here, which purely depends on data and algorithms without medical pre-knowledge applied, we need to study how it performs compared to traditional statistical analysis. To achieve our goal, we did linear regression for postoperative indicators and logistic regression for EL and long-term outcome as described in Method (Statistical analysis) part. Results are shown in Table S19. First of all, our findings on the “lsacoverag” and “lohicu” are confirmed by statistical linear regression analysis as well, and our previous conclusion drew on these two parameters should be robust. Next, we notice that the overall important parameter patterns found for each individual postoperative indicators are very different between ML-based methods and statistical method (Table S9, Table S19). While ML-based methods have found many postoperative indicators having their corresponded preoperative indicators as important impacting parameters, the portion is much less in the statistical method. According to medical common sense, the analysis result by our ML-based method could be more reliable. From this aspect, we believe the result from our ML-based method fits current medical findings better. Moreover, regression on “Lneut” and “isoutcome” failed to find any important parameters. Taken together, we think our method could be better than traditional statistical method and requires no assumptions prior to the data analysis based on medical knowledge.
Digital twin iterative prediction on EL and long-term outcome
Clinically speaking, it would be helpful to know EL and long-term outcome prior to the surgery with the given preoperative inspections and design surgery method accordingly. To represent the condition, we tried to build digital twin models which combine both postoperative indicators and EL and long-term outcome models. It predict the EL and long-term outcome with preoperative indicators and surgery information by selected models (Details described in Method ‘Digital twin prediction’, Fig. 2). As the accuracies varies little different algorithms, we focus more on the value of P/FOR (reason written in Result part ‘Analysis and prediction on EL and long-term outcome’).
The result aligns with our previous finding, that prediction of ‘isoutcome’ is generally easier (or more accurate) than of ‘iseltype’ (Table 4, Fig. S9- S16). Apart from some groups of DT models failed to find any ‘true positive’ result (Table 4, Fig. S9, S12), all other groups show meaningful P/FOR (greater than 1). Some predictions under the same classification algorithm show the same result between different regression algorithms (Table 4, Fig. S10-S11, S14), as inputs of the classification models contain no postoperative indicators. Other groups successfully achieve meaningful prediction on ‘iseltype’ and ‘isoutcome’ with preoperative indicators, surgery information and predicted postoperative indicators (Table 4, Fig. S13, S15-S16). With this result, we conclude that our attempt to predict the EL and long-term outcome of a patient prior to the surgery is successful.
Table 4
Summary of DT prediction performance.
Algorithms | Predictions |
Classification | Regression | iseltype | isoutcome |
Accuracy | FOR | P | P/FOR | Accuracy | FOR | P | P/FOR |
Logistic | Linear | 76.82% | 0.23 | NaN | NaN | 86.89% | 0.13 | 0.78 | 6.08 |
Tree | 76.82% | 0.23 | NaN | NaN | 86.89% | 0.13 | 0.78 | 6.08 |
Support Vector | 76.82% | 0.23 | NaN | NaN | 86.89% | 0.13 | 0.78 | 6.08 |
Neural Network | 76.47% | 0.23 | 0.00 | 0.00 | 86.89% | 0.13 | 0.78 | 6.08 |
Tree | Linear | 77.16% | 0.21 | 0.53 | 2.48 | 83.90% | 0.15 | 0.29 | 1.95 |
Tree | 77.16% | 0.21 | 0.53 | 2.48 | 85.02% | 0.15 | NaN | NaN |
Support Vector | 77.16% | 0.21 | 0.53 | 2.48 | 83.90% | 0.14 | 0.36 | 2.59 |
Neural Network | 77.16% | 0.21 | 0.53 | 2.48 | 85.02% | 0.15 | NaN | NaN |
Support Vector | Linear | 77.16% | 0.20 | 0.52 | 2.59 | 85.02% | 0.14 | 0.50 | 3.67 |
Tree | 79.24% | 0.20 | 0.67 | 3.37 | 85.02% | 0.14 | 0.50 | 3.67 |
Support Vector | 77.16% | 0.21 | 0.53 | 2.49 | 85.02% | 0.14 | 0.50 | 3.67 |
Neural Network | 76.82% | 0.20 | 0.50 | 2.46 | 85.02% | 0.14 | 0.50 | 3.67 |
Neural Network | Linear | 78.55% | 0.20 | 0.62 | 3.07 | 85.77% | 0.13 | 0.60 | 4.54 |
Tree | 78.20% | 0.20 | 0.60 | 2.93 | 84.64% | 0.14 | 0.44 | 3.19 |
Support Vector | 77.85% | 0.21 | 0.58 | 2.79 | 86.14% | 0.13 | 0.64 | 4.94 |
Neural Network | 76.82% | 0.21 | 0.50 | 2.34 | 85.77% | 0.13 | 0.60 | 4.54 |