Study design and sample size estimation
A cross-sectional survey was conducted from November to December 2020 among medical graduate students. An informed consent was obtained from all subjects that were asked to complete an online questionnaire voluntarily without any financial compensation. All valid information including associated device, IP address and answers for each question was collected anonymously, and then constructed a basic database about the mental distress among medical graduate students by automatic collation and graphical representation for each question.
We enrolled subjects apart from those who were post-doctors and reported a diagnosis with depression or anxiety. The subjects were then randomly divided into the training group and validation group. The training group was used to develop a formula to calculate the prevalence of mental distress among medical graduate students in China. Meanwhile, internal validation of the formula was performed in the validation group.
For the estimation of sample size, we took the prevalence of 28%  for mental distress from a study done among Chinese graduate students, 95% certainty and ± 5% margin of error and using the population correction formula. Considering 10% of non-response rate, the sample size was estimated to be 344.
Ethics approval and study registration
The aims and procedures of the study were reviewed and approved by the Research Ethics Committee of Plastic Surgery Hospital of Chinese Academy of Medical Science (Approval number: 2020157). The study was registered at the Chinese Clinical Trial Registry (Registration number: ChiCTR2000039574). All procedures used complied with the ethical principles on human experimentation and with the Helsinki Declaration of 1975 as revised in 2008.
Data on potential risk factors and the main observation (severe mental distress) in this study were collected by a questionnaire consists of sociodemographic characteristics, academic performance, incumbency of tutor, and psychological evaluation. Sociodemographic characteristics contains age, year of study, major and school location (provincial capital or other cities), marital status and monthly income. Academic performance includes degree pursuing, university, type of student, kinds of research, daily research time, scientific learning style, number of research projects and published papers, feeling of time stress (range from 1-7, 1 means none, 2-3 mild, 4-5 moderate, and 6-7 severe). Incumbencies of tutors means tutors have positions in the department, Chinese Academy of Sciences, Chinese Academy of Engineering, or national academic organizations. Whether tutor won a bid of NSFC (National Natural Science Foundation of China) or not within the past 5 years was collected. Psychological evaluation was based on the Generalized Anxiety Disorder Scale-7 (GAD-7) and Patient Health Quehistionnaire-9 (PHQ-9).
GAD-7, developed by , is a 7-item self-report scale to measure anxiety symptoms. Each question was designed to assess the frequency of anxiety, with scores ranging from 0 (never) to 3 (daily). The total score is 0 to 21, coming from the sum of the values for each item. The reported Cronbach’s α coefficient of the GAD-7 among Chinese subjects is 0.92 .
PHQ-9, developed by , is a 9-item scale based on criteria for depressive disorders in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV)  to measure depression symptoms. Each item scores from 0 to 3 according to increasing intensity of symptoms. The PHQ-9 had a Cronbach’s α of 0.86  in this study. The severe mental distress in this study was defined as the sum of GAD-7 and PHQ-9 scores ≥ 30.
Descriptive statistics were tabulated for the overall sample and stratified by the type of answers received. Continuous variables were presented as mean ± standard deviation (SD), while frequency and percentage were calculated for categorical variables. The potential risk factors were screened by the Least Absolute Shrinkage and Selection Operator (LASSO) method. Then, variables with a coefficient value > 0.01 were included in a multinomial logistic regression model to explore the estimates of the included variables in the formula. Statistical significance was set at P < 0.05 level with two-sided tests. Statistical analyses were performed using SAS 9.2 (SAS Institute Inc., Cary, NC) and R version 3.5.3 for Windows XP.
In this study, a LASSO technique combined with the 10-fold cross-validation was used to investigate potential predictors according to computing efficient model descriptions of nonlinear systems. Variables with a coefficient value of more than 0.01 were included in the formula. The estimates used to develop the formula were obtained after the included variables re-entered the multiple logistic regression analysis. Finally, a formula was developed:
Validation of the formula
Internal validation of the formula was performed with the discrimination and calibration in the training and validation group. Discrimination ability of the formula was to separate students who developed mental distress from those who did not. Calibration ability of the formula was the consistency to observe and predict the prevalence of severe mental distress. The AUROC, which is the probability of concordance between predicted and observed the prevalence of mental distress among medical graduate students, was also calculated to measure the predictive effects of the formula’s discrimination ability. An AUROC of more than 0.7 indicates good predictive performance and 0.8 or above indicates excellent predictive performance.
Furthermore, discrimination ability of the formula was evaluated by the discrimination slope that was defined as the difference between the mean predicted risk probability with and without mental distress among medical students. We plotted deciles of the predicted probability of severe mental distress against the observed risk of severe mental distress in each decile and fitted a smooth line. Ideally, the slope of the fitted smooth line would be close to 1 and intercepts close to 0. Besides, the Hosmer-Lemeshow goodness-of-fit test was used to evaluate the formula’s calibration ability. A P-value of more than 0.05 from this test indicates good agreement between the predicted matrix and the observed matrix.