Users’ Responses to a Machine-Learning Decision Support Model: A Randomized Controlled Trial for Prostate-Specific Antigen Screening


 Background: Although a shared decision-making (SDM) process integrates patient values and evidence-based medicine, patients’ anxiety and decision conflicts remain. Thus, we propose a new decision-making model integrating a machine-learning algorithm to investigate its feasibility for reducing anxiety, decision conflicts, and increasing satisfaction after making a decision.Methods: We enrolled participants willing to undergo the SDM process for a prostate-specific antigen (PSA) blood test and obtained data including age, PSA knowledge, if they have a friend with prostate cancer, perceptive risk of prostate cancer, International Prostate Symptom Score and Importance for Physiological and Psychological Impact in PSA Testing scores, personal values, and their final decisions, including “Accept” PSA blood test or “Not now,” to build the dataset for training the following machine-learning models: multilayer perceptron neural network, random forest (RF), extreme gradient boosting, support vector machine, and deep learning neural network. Uniform parameter tuning and model comparison were implemented. The best model was used for a randomized controlled trial (RCT), in which we measured the effects of personalized suggestions generated by the machine-learning model on anxiety, decision satisfaction, and decision conflicts.Results: RF was the best algorithm for building models with our dataset from 507 subjects (mean AUC: 0.8801, mean ACC: 0.8313, Max ACC: 0.8933). Therefore, we used the RF model for RCT with 185 and 182 subjects in the machine-learning suggestion group (MLSG) and control group (CG), respectively. The MLSG patients were calmer, more content, and less worrisome than those in the CG. They also experienced higher decision satisfaction and less decision conflict, including more decision support, advice, assurance of decision, ease of decision-making, and adherence to decision. Moreover, participants who were suggested “Accept” by the model were more likely to make “Accept” their final decision than the CG participants (50.75% vs 24.18%, χ2 = 16.07, p < 0.000). The “Not now” suggestion followed a similar trend.Conclusions: A highly accurate machine-learning model was constructed using our methods. Personalized suggestions generated from this model yielded increased satisfaction and reduced anxiety and decision conflict. Patients tended to take machine-learning suggestions as their final decision.Trial name: Shared Decision Making: Decision Tree and Artificial Neural Network Assisted Decision Aid for PSA ScreeningTrial registration: ChiCTR, ChiCTR2000034126. Registered 25 June 2020 – Retrospectively registered, http://www.chictr.org.cn/ChiCTR2000034126

The emotional stress would sometimes compromise logical thinking during the SDM process, possibly leading to an unwise choice [18]. Furthermore, the complexity of considerations and numerous priorities during the decision-making process for certain diseases might become a major challenge for a patient to optimize their choice. In certain circumstances, such as a patient with newly diagnosed cancer, the patients would encounter a very complex decision context about treatment options.
With no prior experience, a decision support is in high demand [19]. Similarly, Brenda et al. proposed the following two major barriers to implementing SDM: too many problems to make a decision and patients' lack of trust in their physicians [20]. Therefore, a novel method is required to be devised that can offer suggestions to patients and select the best choice based completely on experiences of peers who confronted similar complex situations and di cult decisions.
Additional challenges were also encountered during SDM in PSA screening although several studies have described the success of SDM in prostate cancer screening for prostate-speci c antigen (PSA) [11,[21][22][23][24]. First, the usefulness of PSA screening is quite diverse among trials. The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial have demonstrated that prostate cancer screening did not lower cancer-speci c mortality [25,26]. In this study, 76,693 men aged 55-74 years were randomly assigned into the screening or usual care groups [25]. No signi cant differences were observed in the primary outcome of prostate cancer mortality (relative risk [RR]: 1.04, con dence interval [CI]: 0.87-1.24) or overall mortality (RR: 0.98, CI: 0.95-1.00) [25]. However, a subsequent modeling study focusing on differences in the screening intensity between PLCO and ERSPC (European Randomized Study of Screening for Prostate Cancer) showed that the intensity of screening has affected the absolute reduced prostate cancer mortality [27]. In 2013, a Cochrane meta-analysis revealed that although the estimated prostate cancerspeci c mortality did not reduce among men of the screening group (RR: 0.95, 95% CI: 0.86-1.07), cancer was diagnosed more often in this group (RR: 1.30, 95% CI: 1.02-1.65). The meta-analysis gathered data from ve RCTs with 341,342 participants [28]. Because of these controversies, participants have to make their own decision about whether or not to undergo screening according to their personal preferences and values. Furthermore, the decision is not simple. A qualitative study demonstrated that participants need to gure out the priority by considering the following: bene t of early cancer detection, harm from biopsy, false-negative result, false-positive result, overdiagnosis, overtreatment, and emotional impact of all subsequent intervention [29].
These factors make the decision context quite complex. Phyllis Ahiagba et al. demonstrated the con ict between the selected decision and emotional distress among patients who have completed the entire standard course of SDM for PSA blood testing [16]. Therefore, decision support generated by algorithms might have a role in helping patients to make a con dent decision consistent with their personal values.
Researchers have constructed machine-learning models that effectively help clinical decision-making among health care professionals [30][31][32][33][34], including diagnosis of diabetes mellitus and glaucoma, predicting the prognosis of patients with colorectal cancer, diagnosis of fatty liver and optimization of liver segmentation, clinical decisions for breast cancer treatment, etc. However, its availability and accessibility for non-health care professionals and evidences of the effect, safety, and consequences of applying machine-learning decisions, which are essential for promoting arti cial intelligence in clinical practice, are not yet well established. Few papers have focused on machine-learning suggestions or advice for patients, who are usually not health care professionals [35], and a few others have described the effects and consequences of using decisions generated by the machine-learning algorithm [36]. Therefore, researches surveying non-health care professional users and their responses are urgently needed to aggressively promote machine-learning methods in clinical practice.
Several characteristics can potentially convert the SDM of the PSA screening blood test into very good material for developing a machine-learning mediated decision aid. First, the decision depends heavily on personal values, but not rm evidence. As mentioned earlier, the effect of PSA screening is controversial. If a rm suggestion of whether to undergo screening has already been documented in some clinical practice guidelines, personal values would have no role in making the decision, for example, a Pap smear for cervical cancer [37]. Second, PSA screening is complex in its decision context. Machine-learning models and logistic regression are quite good at dealing with cases with a large amount of variables and have been implemented successfully in several medical elds [38]. Third, the choice is binary ("Accept" to receive it or "Not now"). It is quite di cult to build a high-accuracy machine-learning model based on a dataset with more than two categories. The nature of binary classi cation in PSA screening is conducive to establishing a machine-learning model with higher accuracy. Fourth, enrollment of participants for PSA screening is easy. A su cient number of cases is essential for modeling machine-learning algorithms. A large target population makes enrollment of participants easier. Consequently, we hypothesized that a decision suggestion generated by a machine-learning algorithm is implementable and it is a solution to deal with these complex concerns in SDM of PSA screening. In this study, we used SDM of PSA blood testing as a material. We attempted to construct a highly reliable and valid tool to detect personal preferences: We gathered data for establishing machine-learning models with adequately high accuracy. To observe users' responses, we recorded how participants reacted to suggestions yielded by a machine-learning model, and whether these suggestions positively impacted emotion, satisfaction, and decision con ict. We also wanted to know how the suggestions from the machine-learning model in uenced participants' nal decisions.

Methods
Establishment of a preference detection tool and validation of questionnaires We designed a questionnaire, "The Importance for Physiological and Psychological Impact" (IPPI) to understand the value and preference of participants, regarding their concerns about PSA screening based on studies by Eila et al., Lucie Rychetnik et al., and Ferrante et al. [24,29,39]. This 10-item questionnaire (Supplement 1; annotations A-J of Table 1) consists of two dimensions with four items evaluating the physiological impact and six items evaluating the psychological impact. It has been validated by 10 experts who are urological specialists in Taiwan with more than 7 years of experience in urological clinical practice. We demonstrated a strong reliability, with Cronbach's alpha of 0.838 and 0.9000, respectively, for the physiological and psychological impact items of the initial 300 participants. The factor analysis of IPPI was performed using the maximum likelihood method, and then the VariMax rotation was applied to extract the following two factors: personal values focusing on physiological impacts (items 1-4) personal values focusing on psychological impacts (items 5-10). We also validated the Chinese version of the Decision Con ict Scale (DCS) [40] and International Prostate Symptom Score (IPSS) [41]. Moreover, the short version of Spielberger State-Trait Anxiety Inventory (SSTI) [42] and Decision Satisfactory Questionnaire [43] were translated by two people who were uent in both Chinese and English. The research tools are given in Supplements 1, 1a, and 2.

Participant enrollment
This study was received a full ethical review and approved by the ethics Institutional Review Board (IRB) of Fu Jen Catholic University (C105142). The Committee for Medical Research Ethics approved the study, and the requirement to obtain informed consent was waived. We recruited men who visited St. Joseph Hospital randomly from October 2017 to April 2018 by YT Lin.
The participants had to meet the inclusion criteria of age > = 50 years and interest in information on SDM in PSA testing. The exclusion criteria were any psychological disorder, previous PSA testing, history of prostate surgery or preexisting prostate cancer. We planned to recruit 900 subjects totally. We built machine-learning model repeatedly every 20 new subjects enrolled. When we gathered a study sample of 520 participants, we found the performance of machine learning models reached the plateau. The rest 380 subjects were assigned to RCT. Thirteen subjects in 520 were excluded from the dataset because of poor data quality (i.e., 10 consecutive items with the same score or have rhythm among the scores). Consequently, we used the data of 507 participants (model establishing group, MEG) to train and verify models. We further recruited 380 participants and randomly assigned them into the machine-learning suggestion group (MLSG) and control group (CG) to evaluate the effect of suggestions from a machine-learning model (Fig. 1).

Procedure of SDM and data collection
Steps 1-7 were performed for the patients in the MEG, whereas steps 1-8 were performed for the patients in the RCT (MLSG and CG). These steps are outlined as follows.
Step 1. At the beginning of every interview, the participant was asked to provide data on age, marriage status, education level, and their knowledge of PSA, whether they themselves are or have a friend diagnosed as having prostate cancer, and perceptions of risk of prostate cancer, which were recognized as factors that in uenced the willingness to undergo PSA testing in a previous study [24].
Step 2. International Prostate Symptom Score (IPSS) was then lled out by participants.
Step3. Participants then watched a video mediated decision aids detailing the impacts of PSA screening on their body and mind.
The tutors offered additional explanations to answer questions from the participants, and discussed with participants about which impact of the screening was important according to participants' personal values.
Step 4. A three-item brief test was performed to evaluate if participants could retain the core information about the decision aid.
Step 5. Participants lled out the IPPI questionnaire and ranked the impacts by importance according to their values.
Step 6. We keyed in the scores of all previous questionnaires to the machine-learning interface based on the best model established with 507 participants in the MEG during the RCT. The interface would then return the predictive choice of each participant. Returns of "control group, no suggestions offered" and "Accept and do it immediately" or "under consideration or refuse" meant that the participant was assigned to the CG and MLSG, respectively. The assignment was generated by R platform based on random number table with the ratio of allocation to MLSG and CG as 1:1.
Step 7. Let participants make their own decisions.
Step 8. Participants lled out the self-report questionnaires, including SSTI, Decision Satisfaction Questionnaire, and DCS in the RCT without any interference.
It took 20-25 minutes to complete the entire procedure for one participant.
The datasets generated during the current study are available in the supplement les with the le name "raw data of MEG" and "raw data of MLSG vs CG" Method of establishing a machine-learning model Variables for modeling and data preprocessing The models were developed using features of subjects, including age; education; marriage status; knowledge about PSA; whether or not they themselves have or have a friend with prostate cancer; IPSS evaluation; IPPI and the rst, second, and third concerns of IPPI items; and their decision about PSA testing as dependent variables. Logistic regression was used for exploring the signi cance of every variable. Then, we used ve machine-learning methods to build the models, including the multilayer perceptron neural network (MLP), random forest (RF), extreme gradient boosting (XGB), support vector machine (SVM), and deep learning neural network (DNN). We used the MEG dataset to train and verify the models. For MLP, DNN, and SVM, we standardized the continuous variables and created dummy variables for categorical variables; these steps were not necessary for RF and XGB.

Data splitting and parameter tuning
The sample size of 507 was relatively small in our model establishing dataset. Furthermore, we used the bootstrap method to eliminate the bias during data splitting [44,45]. For each machine-learning model, 507 bootstrapping iterations would be performed to form 507 pairs of training and test sets. We then calculated the accuracy and area under the ROC curve (AUC) for every pair of training and test sets (Supplement 3a).
We used the mean AUC of the 507 pairs of training sets as an indicator for parameter tuning. Antlion optimizer (ALO) was used to nd the best parameter with the maximal mean AUC for every model. ALO has been used in parameter tuning in arti cial neural networks and SVMs in previous studies, (46) and has been proved to have better performance than particle swarm optimizer and ant colony optimizer (47). The entire algorithm was well described in Seyedali's work. (48) We also used the design of experiment method to de ne and narrow the range of searching space for ALO (Supplement 3b). The parameters yielding the best mean AUC are listed in Supplement 4.
Comparison among models and building a website-based user interface for RCT After nding the best parameters for these ve machine-learning algorithms, we compared the ACCs and AUCs of 507 models based on 507 bootstrapped training sets among these ve algorithms. We used the nonparametric Kruskal-Wallis test for pairwise comparisons among algorithms. The entire process of model comparison was documented in the study of Hui et al. [45]. The algorithms with the best performance were used to establish a website-based interface. We trained and veri ed 2000 models using 2000 pairs of bootstrapped training and test sets, applying the best algorithm with the best parameters. The one with the best AUC among the 2000 models was used as the classi er and uploaded onto the Shiny server. The user interface was also constructed and uploaded onto the Shiny server to establish a decision-predictive website (http://psachoice.shinyapps.io/psapsa/) [49].

Randomized control trial
In total, 380 participants were randomized into the MLSG and CG. The MLSG received the decision suggestion generated by the decision-predictive website. The tutor would explain the meanings of the prediction/suggestions to the participant. Then, the participant could take the suggestions into consideration to make their own decision about undergoing the PSA blood test. The CG received no prediction/suggestion. After they made their nal decision, participants of both groups were asked to ll the selfreport questionnaires, including SSTI, Decision Satisfactory Questionnaire, and DCS, without any interference.
Software R studio (version 3.4.2) was used as the platform to implement the machine learning and ALO. Several R packages were used and are listed in Supplement 5. SPSS (version 21) was used for the nonparametric statistics and chi-square test.

Results
We compared the features of participants who chose "Accept" and "Not now" as their nal decision in MEG. We found that participants who chose "Accept" had higher scores than those who chose "Not now" for KnowPSA ( about physiological and psychological impacts more were more likely to choose "Accept". By contrast, subjects who did not care about the positive and negative impact of PSA screening were less likely to receive PSA screening. Age, RiskUThink, IPSS6, and item G of IPPI were similar for both groups (Table 1a).
Participants who chose "Not now" were more likely to be widowed or have a low education level. Important priorities were signi cantly different between participants who chose "Accept" and those who chose "Not now" (Table 1b) 0.686, CI: 0.473-0.996, p = 0.048) were negative predictors for the "Not now" decision, whereas G (OR: 1.452, CI: 1.028-2.049, p = 0.034) was the positive predictor for the same. Regarding the rst concern, the answers "A", "B", "D", and "J" were positively associated with "Not now" with "I" as the reference category. For the second concern, the answer "A" was positively associated with "Not now" with "C" as the reference category (Table 1a, 1b). The accuracy and AUC of models constructed using the MEG dataset were calculated to nd the model with the best performance. Initially, we performed a logistic regression using the same unbiased data-splitting method [45]. We obtained the mean accuracy (0.8140), the highest accuracy among LR models (0.8763), the mean AUC (0.7947), and the highest AUC among LR models (0.8939). Obviously, the DoE-ALO parameter tuning method is not suitable in logistic regression. In terms of machinelearning models, we found the best parameters for all ve machine-learning algorithms (Supplement 4). We observed the DNN and RF models to have the highest mean accuracy (0.8429, 0.8313) after parameter tuning. The pairwise comparison showed no signi cant differences in the accuracy between DNN and MLP models. Moreover, RF models have the highest mean AUC (0.8801) after parameter tuning. Because our MEG dataset is relatively imbalanced, the AUC would be better than accuracy as a performance measurement according to a study published by Charles et al. [50]. Accordingly, we chose the model with the best mean AUC, that is the RF model, as the decision-suggesting tool in our study (Fig. 2). The accuracy of the best model with the best parameters among the models constructed using 2000 iterations of bootstrapping is 0.9000. Thus, the RF model was used to build the user interface for the RCT.
We randomized participants into the MLSG and CG. In total, 380 participants accomplished all steps of the experiment. Five of the MLSG and eight of the CG were dropped because of poor answer quality which we mentioned earlier in the Methods section.
There was no important harms or unintended effects in each group. The participants of both groups showed similarity in age, DCS is a ve-scale questionnaire with a reverse scoring system, that is, strongly agree: score 0 and strongly disagree: score 5. We found that the participants in the MLSG perceived that they had more decision support  We also wondered about the in uence of machine-learning suggestions on the nal decisions of participants. We observed that 24.18% of participants in the CG chose "Accept", whereas 75.82% chose "Not now" as their nal decision. Participants who were suggested to "Accept" by the machine-learning model had a higher chance of making "Accept" their nal decision than those in CG deciding on "Accept" (50.75% vs 24.18%, χ 2 = 16.07, p < 0.000). Similarly, participants who were suggested "Not now" tended to be more likely to make it their nal decision, even though it was statistically insigni cant (75.82% vs 82.20%, χ 2 = 1.72, p = 0.190; Table 3). 3b.Comparison of nal decision between subjects got suggestion of "Not now" and those who got no suggestions in the CG

Discussion
Our study demonstrated that scores of IPSS items are signi cantly different between participants in the MEG with their nal decision as "Accept" and "Not now." This indicates that IPSS items could serve as input data for building machine-learning models. Jeanne et al. performed a study to explore factors that in uence men's decision to undergo a prostate cancer screening [39] and found that men were prompted to undergo PSA screening by urinary symptoms. This is consistent with our ndings of the univariate analysis on IPSS items.
IPPI is a questionnaire which we designed for detecting participants' attitudes toward the psychological and physiological impact after PSA screening. We found the more serious the attitude of participants toward psychological and physiological impact, the more they were likely to accept PSA screening. Another prior study demonstrated that a perception of high risk is an important factor for undergoing PSA screening [39]. Our ndings might result from our decision aid elevating the risk perception in some subjects and make them willing to undergo PSA screening. Simultaneously, these participants also tended to take these impacts seriously. In logistic regression, we observed items A, D, F, G, and J to be predictors of nal decisions, as well as the rst and second concerns. It is di cult to nd a quantitative studies which measured the effect of these psychological and physiological impacts on decision-making. The causal relationships between these concerns and the nal decision are still ambiguous. Further studies are needed to clarify these interactions. In factor analysis, items of physiological and psychological impact were grouped separately, instead of comparing positive versus negative impacts. It means participants tended to assign similar importance levels for items in either the physiological or psychological dimensions. Therefore, we further compared the scores of these two dimensions in the "Accept" group. Participants of the "Accept" group tended to score the physiological impacts higher than psychological impacts (3.71 ± 1.38 vs 3.49 ± 1.22, p = 0.004). The participants of the "Not now" group also tended to score the physiological impact higher than psychological impact (3.27 ± 1.38 vs 2.97 ± 1.57, p = 0.000). The univariate analysis of our study also observed education level to be a predictor of the nal decision, even if the same trend was not showed in logistic regression. Participants who had heard more about PSA previously were more likely to make "Accept" their nal decision in both univariate and multivariate analyses of our study. This is consistent with the ndings of Eila et al. In their study, participants who knew someone who had undergone a PSA test or have already discussed with doctors about the PSA test are more intent to undergo a PSA test [24]. Participants with a higher education level were more likely to choose "Accept" as their nal decision. Mehdi et al. reported the positive association between presence of prostate cancer early detection behavior and high education level in Iran [51]. However, the study by RE Myers et al. demonstrated contradicting results in African-American men. The higher their education level, the less intention they had to undergo PSA screening [52]. However, Viet-Thi Tran found decision aid changed decisions favoring PSA screening into disfavoring in high education-level subject in France [53] (24). However, their models did not focus on prediction and suggestion. In our study, we successfully constructed a highly accurate model to provide suggestions for the subsequent RCT. To ensure the quality of suggestions, we used unbiased datasplitting, model validation, and model performance comparison methods which were extracted from Chen's work about differentiating lung nodules (performance comparison of arti cial neural network and logistic regression model for differentiating lung nodules in computed tomography) [45]. We used the uniformal process ALO to tune parameters for every algorithm, instead of using different processes for each distinct algorithm as done conventionally. Because of its highest mean AUC (0.8801) with maximum AUC (0.9329) and high mean ACC(0.8313), we chose the RF model. With logistic regression models, mean ACC, mean AUC, and maximum AUC came to 0.8140, 0.7947, and 0.8939, respectively. Thus, our results clearly demonstrated that machine-learning algorithms outperformed logistic regression, even though a systemic review could not show the superiority of machine learning over logistic regression (38). More recently, Glenn Salkeld et al. applied a personalized decision support tool developed for prostate cancer screening using a software platform known as Annalisa-an interactive decision aid template based on multicriteria decision analysis (57). The personalized suggestions yielded by the decision aid were found to be of a slightly high quality. However, they did not survey the participants' response comprehensively. To clarify these changes in intention and psychological variables after computer-generated suggestions, we obtained the anxiety score, satisfaction scale, DCS, and post-suggested decision changes.
The task of our study was to discover the effects of machine-learning suggestions on our participants. Without a doubt, most decision aids have positive effects on decision-making process. Nahara et al. performed a meta-analysis and reviewed four RCTs about SDM and prostate cancer screening (58). Their results demonstrated a reduction in decision con ict with the decision aids.
Andrew W Stamm et al. found that participants in the decision aid (DA) + SDM arm were signi cantly more likely to report that they always felt encouraged to discuss all health concerns (78% DA + SDM vs 72% DA p = 0.0285) [59]. Heidi et al. reported the percentage of men with "high anxiety" decreased from 12-7% and decision con ict also decreased with the use of decision aids.
In total, 85% of men experienced more ease in making decision [60]. Warlick et al. demonstrated participants reported high decision satisfaction and low decisional con ict [61]. In our research, in addition to the basic decision aid with papers and videos and SDM guided by tutors, we gave highly personalized machine-learning suggestions to the participant, which were generated from their own features and values. Therefore, participants who received machine-learning suggestions were calmer, more content, and less worrisome than those who did not receive suggestions. In fact, the participants in our study showed more satis ed after receiving machine-learning suggestions. In addition, they experienced more decision support, assurance of the decision, ease in decision-making, and adherence to the decision compared to those who did not receive suggestions. Thus, our study proved the positive effects of machine-learning suggestions.
Traditional decision aids typically reduced the likelihood of being screened. Barr et al. reported an example of this. They recruited 1041 predominantly white, well-educated men and recorded their responses to the pre-and post-viewing questionnaires. After viewing, the proportion of patients leaning away from PSA screening increased signi cantly [62]. However, machine-learning suggestions have an entirely different effect on nal decision. In our study, participants tended to follow the machine-learning suggestions. Participants receiving the "Accept" suggestion tended to make "Accept" their nal decision. The same trend was observed in the case of those receiving the "Not now" suggestion, even though it was not statistically signi cant. We tried to explain this new nding using the degree of trust. Zachary Klaassen's work demonstrated that the degree of trust an individual had in his physician for cancer information was strongly associated with the likelihood of him undergoing a PSA screening (63).
It implies that the more a participant trusts his/her doctor, the more the participant's willingness to follow the doctor's orders.
Machine learning gained recognition during the 2010s. Since then, machine-learning methods have created bene ts in many aspects of our daily life. Therefore, our participants had a very positive attitude toward machine learning and trusted the suggestions received. The other explanation is the principle of authority proposed by Robert Cialdini (64). When participants regarded the machine-learning model as an authority, they would tend to follow the suggestion, even if the machine-learning suggestions were opposed to their own decision. Therefore, we still do not know whether this phenomenon is useful or harmful.
Further studies aimed at elucidating the physiological and psychological safety of this phenomenon should be conducted before clinical use.
In this study, we provided a method of machine-learning mediated shared decision making in PSA blood test. It is possible to generalize the methods to other eld of shared medical decision making after the safety being examined. This study has some limitations. First, our study population was relatively small for modeling. Moreover, we used simple binary classi cation to improve the performance of model. Second, this study was not double-blinded. A double-blinded randomized trial with the permission of the IRB is needed for eliminating tutor bias, as well as, observing the effect of reverse suggestions. Third, how machine-learning suggestions impact the psychological status of a participant remains unknown. Future qualitative or quantitative researches should be designed to focus on exploring the psychological impact of machine-learning suggestions and investigating the underlying cause of participants' tendency to follow these machine-learning suggestions. The psychological impact of suggestions should be examined after these suggestions are executed by participants.

Conclusions
We proved that a highly accurate machine-learning model could be constructed successfully. Personalized suggestions generated from this model would additionally yield more positive effects on increasing satisfaction and decreasing anxiety and decision con ict compared with the traditional decision aid only. Our participants tended to take machine-learning suggestions as their nal decision. The in uence and safety of this phenomenon deserves further investigation.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information les.

Competing Interests
We have no any nancial and non-nancial competing interests that could directly undermine, or be perceived to undermine the objectivity, integrity and value of a publication, through a potential in uence on the judgements and actions of authors with regard to objective data presentation, analysis and interpretation.