Reliability, validity and responsibility of the Postoperative Clinical Evaluation Criteria for Gastrointestinal Motility

and purpose Abdominal surgery is the main department of gynaecology operation. Accelerating the recovery of gastrointestinal function after surgery can not only make patients early recovery diet, but also can reduce the common complications such as pelvic adhesion. It’s important to the rehabilitation and prognosis of patients. The Chinese medicine has been proven to accelerate the recovery of bowel function after the gynecologic abdominal surgery. Most of the indicators used to evaluate gastrointestinal motility recovery are scattered, and there is no standardized evaluation criterion so far. We found that there is not a scale for evaluating the gastrointestinal motility in perioperative period after reviewing a large number of literatures. The Postoperative Clinical Evaluation Criteria for Gastrointestinal Motility (PCECGM) became the local standard of Guangdong province (DB44/t 1581-2015) issued by the quality supervision bureau of Guangdong province in 2015, and a clinical evaluation now is necessary. It is an effective tool for the evaluation of gastrointestinal motility after the gynecologic abdominal surgery. At present, there is no uniform standard for postoperative gastrointestinal motility evaluation in clinical practice. It’s important to the rehabilitation and prognosis of patients. The Postoperative Clinical Evaluation Criteria for Gastrointestinal Motility (PCECGM) became the local standard of Guangdong province (DB44/t 1581-2015) issued by the quality supervision bureau of Guangdong province in 2015.

At present, there is no uniform standard for postoperative gastrointestinal motility evaluation in clinical practice. It's important to the rehabilitation and prognosis of patients. The Postoperative Clinical Evaluation Criteria for Gastrointestinal Motility (PCECGM) became the local standard of Guangdong province (DB44/t 1581(DB44/t -2015 issued by the quality supervision bureau of Guangdong province in 2015. Key words perioperative gastrointestinal dysfunction; clinical evaluation criteria; reliability,; validity; responsibility Purpose The purpose of this study was to evaluate the the reliability, validity and responsibility of PCECGM, so as to provide scientific evaluation criteria.

Materials and methods Participants
This prospective study was performed at the Guangdong Province Traditional Medical Hospital from Mar. 2015 andJun. 2017. Patients with the following criteria were included in this study: 1) Patients who have undergone surgery for gastric or colorectal cancer, or hysterectomy for hysteromyoma; 2) Age, 40-75 years old; 3) Duration of surgery, 1-5h; 4) Time under anesthesia, 1.5-4.5 h; 5) Provision of signed, informed consent.

Evaluation of measurement properties
The PCECGM contains 6 items related to time to first flatus, time to first defecation, time to first bowel sounds, time to consuming liquid/semiliquid/general diet, abdominal distention and pain, nausea and vomiting. The score ranges from 0 to 100, and the higher scores, the better recovery of the gastrointestinal function. The PCECGM reflects patients who make a subjective judgment about the meaning of change (improvement) following treatment. It is answered on a 3point scale of 0 = not recovered; 10 = partial recovery; 20 = full recovery.

Measurement properties
We evaluated the acceptability, reliability, validity and responsibility of PCECGM. In addition, the quality of the PCECGM was also evaluated by current updated criteria for good measurement properties. Acceptability The feasibility is evaluated by the qualified, recovery and the average time required to fill in the form. Generally, it shows good acceptability when the qualified rate >70% and recovery rate >90%. The more concise the content is, the more understandable the topic language is, and the shorter the time it takes to fill in the form, the higher the acceptability of the subject. reliability This domain contains three measurement properties, i.e., internal consistency, test-retest reliability, and measurement error. Internal consistency is considered as a measure of scale reliability and evaluates how closely related a set of items are as a group. Also, test-retest reliability is the closeness of the agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement. To avoid testing patients with the unstable condition or occurring recall bias, the re-test was performed after 4 weeks after the primary test. At last, the measurement error was calculated by using the standard error of measurement (SEM). Validity This domain also contains three measurement properties, i.e., content validity, criterion validity and construct validity. Content validity mainly examines the measurement aim, the target population and the concepts of the questionnaire. Construct validity was assessed by testing predefined specific hypotheses; that is, how many results are in accordance with predefined hypotheses.

Responsiveness
Responsiveness has been defined as the ability of a questionnaire to detect clinically important change over time in the construct to be measured. We calculated the PCECGM change scores of the between baseline (presurgery) and post-surgery with the time interval of 3 days.

Statistical analysis
All analyses were performed with the Epidata3.0 and calculated by SPSS20.0

Age
The patients included in this survey were between 40 to 67 years old, among those 45 to 49 years old patients were the majority with a total of 97, accounting for 48.5%. (See Table 1

Postoperative hospital stay
The minimum of postoperative hospital stay was 3 days, and the longest was 14 days. The postoperative hospital stay was mainly 5 or 6 days, accounting for 25.5% and 30.5% separately. The cases of postoperative hospital stay less than 7 days accounted for 87%. (See Table 4 for details)

Comparison between the traditional gastrointestinal motility evaluation index and the scale of the total score
The three traditional indicators of postoperative exhaustion, postoperative defecation, and postoperative bowel sound recovery status were compared with the scale of the total score. Table 5 shows that there were 29 patients who have not been vented on the first day after surgery. The scores of the scale were "normal" with 14 cases and "poor" with 15 cases, suggesting that clinical intervention was required for those who have not been vented on the first day after surgery; 171 cases vented on the first day after surgery and 159 of them recovered "good", suggesting that those patients should not be intervened temporarily regardless of postoperative bowel movements but observation required; All patients who were included in the observation were exhausted on the second day, 199 cases were classified as "good", suggesting that those patients did not require clinical intervention with or without defecation, only 1 case was rated as "general" and the others were all recovered of "good", suggesting that only one person should be given appropriate intervention to promote gastrointestinal motility recovery. In Table 6, it can be seen that the postoperative defecation patient scale scores were all "good", suggesting that patients with postoperative bowel movements can be considered to have good gastrointestinal function recovery. There are still 2 cases who had no bowel movements on the third day after surgery, but the scale were all "good", suggesting that these two people did not require clinical intervention, and all patients had bowel movements within 4 days after surgery. In Table 7, it can be seen that there were 97 cases whose bowel sounds recovered to normal on the first day after surgery, and 9 of them were classified as "normal", suggesting that some patients should be appropriately intervened to promote gastrointestinal motility recovery. There were 3 cases classified as "poor", suggesting that although a minority of patients recovered to normal bowel sounds after surgery, certain interventions were required to promote their gastrointestinal motility recovery.  100±0 0 0 0 specialist and one patient. The questionnaire was issued in 202 copies and 202 were recovered. The recovery rate was 100%.

Completion rate
Since this scale is for gynecological specialist on-site interview and physical examination, and then filled out by specialist doctors, only 2 copies of the scale with a completion rate of less than 95%, the total completion rate is about 99.01%. In the analysis, 200 scales were used after deleting those two copies.

Completion time
The minimum completion time of this scale is 61 seconds, the longest is 120 seconds, and the average time is 95.49 seconds. Therefore, the completion time of the scale is (95.49±16.69) seconds. Based on the data above, the recovery rate of the gastrointestinal motility after perioperative period scale was 100%, the completion rate of the scale was 99.01%, and the completion time of the scale was (95.49±16.69) seconds. From the data above, the scale is of good acceptability.

Reliability Cronbach's Alpha
The total Cronbach's Alpha coefficient of this scale is 0.519, and the Alpha coefficient calculated after standardization is 0.620, which is between 0.60 and 0.65, suggesting that the reliability of this scale is among the acceptable range.

Split-half reliability coefficient
Spearman-Brown formula is used to calculate the split-half reliability coefficient of the scale. The items are divided into two parts according to odd and even order. The unequal-length Spearman-Brown is 0.706, split-half reliability is between 0.7 and 0.8, indicating that the split-half reliability of the scale is better (see Table 9). The sensitivity analysis of the item is to remove one of the items in the scale and then calculate its Cronbach's Alpha coefficient. If the value of the coefficient is larger, the influence of the item on the relevant statistics is greater. Thus, this item is the first consideration of adjustment [96]. The sensitivity of the item "vomiting" in this scale is low and is an item that can be considered for optimization after the perioperative gastrointestinal motility scale. (See Table 10) Western medical literatures were obtained, among which there is a total of 1640 articles were retrieved and read, among which 687 articles related to postoperative gastrointestinal motility evaluation index. Relevant evaluation indicators were extracted and entered into the database to obtain 9 symptoms and signs. This evaluation specification has gradually formed through eight rounds of expert consultation: after the first round of expert consultation, bowel sounds, anal exhaust, anal defecation, abdominal distension, nausea and vomiting were the main clinical evaluation indicators; after the second round of experts consultation, the subjective and objective indicators were scored according to the 20-point scale, and the scores of "bloating" were greater than "disgusting" and "vomiting", and "anal exhaust" was greater than "intestinal sounds" and "anal defecation"; after the third round of expert consultation, the index of each indicator and the corresponding score are determined, and the comprehensive evaluation criteria are determined as "good" (80-100 points), "normal" (60-79 points), and "poor" (0-59 points); the evaluation criteria indicators formed by the fourth and fifth rounds of expert consultation (ie, preliminary evaluation specifications); and the sixth to eighth round of expert consultations formed the draft standard for evaluation. The draft was evaluated by a small sample clinical evaluation related disciplines such as surgery, gynecology, orthopedics and etc.. The data show that the evaluation results of the standard are consistent with the evaluation results of postoperative exhaust and defecation indicators. At the same time, it is possible to scientifically evaluate the comprehensive recovery of gastrointestinal function in patients after surgery, including not only the objective indicators of postoperative anal exhaust, defecation, and the time of bowel sounds recovering to normal, but also the recovery of subjective indicators such as nausea, vomit, and abdominal pain. In the case, the recovery of gastrointestinal function reflected is more comprehensive and comprehensive. This form was submitted to the Provincial Quality Supervision Bureau after being approved by the expert group. Finally, it was approved by the Provincial Quality Supervision Bureau on April 16, 2015 and officially implemented on July 16, 2015.

Structural validity KMO test and Bartlett's spherical test
From Table 11, the KMO (Kaiser-Meyer-Olkin) value of the scale is 0.581, which is greater than 0.5, indicating the validity of the factor analysis. According to Kaiser (1974), the scale is suitable for factor analysis; in Bartlett's spherical test, the value of the test is 181.005, the free degree is 15, P<0.001, indicating that the correlation coefficient matrix of the factor is a non-integral matrix, which can extract the least factor at the same time and explain most of the variance, suggesting validity. (See Table 11 for details)

Factor analysis
The principal component analysis method and the maximum variance rotation method were used to analyze the factors. It can be seen from the following table that the factor is extracted according to the characteristic value>1, two common factors are extracted, and the cumulative contribution rate is 56.331% (see Table 12 for details). After the revolve of the shaft, they were combined into two types of factors, the subjective symptom factor and the objective symptom factor, covering the main contents of the scale, suggesting that the structure of the scale is good (see Table 13). As can be seen from Table 16, the first common factor is the subjective symptom factor (including abdominal pain, nausea, vomit), and the second common factor is the objective symptom factor (including exhaust, defecation, bowel sounds).

Analysis of reactivity
The paired rank sum test was performed on the total scores of the posttreatment and post-treatment scales of the postoperative patients. The original hypothesis indicates that the median of the difference between pretreatment (ie, the first test total score on the first postoperative day) and the post-treatment (ie, the first, second and third postoperative day) was equal to 0, and P < 0.001 was calculated. The original hypothesis was rejected, indicating that there are significant statistical differences of the total score before treatment and the first day, the second day, and the third day after treatment (see Table 14-18 for details).

Standardized effect size (ES)
It is the ratio of the absolute value of the mean difference between before and after treatment to the standard deviation before treatment. It is calculated that the standardized effect value of the total score of this scale is 1.32, indicating that the scale has a high degree of reactivity.

Standardized response mean (SRM)
It is the ratio of the mean of the differences before and after treatment to the standard deviation of before and after treatment. It is calculated that the mean standard response of the main symptom scores of this scale is 1.32, which also indicates that the scale has a high degree of reactivity.

Discussion Feasibility
In this study, the scale survey was conducted by gynecological clinicians during postoperative hospitalization, which could be conducted simultaneously with the daily ward rounds. The compliance of investigators and survey subjects was good, and the quality of filling in the form was controllable and the recovery rate was high. The scale is concise and easy to understand. The average time spent on answering questions is 95.49 seconds, which does not take up too much time of investigators and survey subjects in clinical practice. All the above advantages indicate that the feasibility of this scale is good.

Reliability
The Cronbach's Alpha of this scale was 0.620. According to Robert f. DE Willis (2004), is the acceptable range. The low alpha coefficient of the scale is affected by the following factors: off-center average score, negative correlation between items, low item-scale correlation, weak correlation between items, and small variability in grades. The split-half reliability is to test the consistency between the two scales. The unequal-length Spearman-Brown is 0.706, split-half reliability is between 0.7 and 0.8, indicating that the split-half reliability of the scale is better. In addition, due to the improvement of surgical operation level, there are fewer patients with obvious gastrointestinal disorders after gynecological laparoscopic surgery, which may be one of the reasons for the low confidence coefficient in the sensitivity analysis of this test. In the future, we can increase the evaluation in clinical application of gynecological surgery with longer operation time and larger operation scope, such as malignant tumor surgery and gastrointestinal surgery, so as to improve the reliability by increasing the performance variability. The total Cronbach's Alpha coefficient calculated after standardization is 0.620, and the unequal-length Spearman-Brown is 0.706, which can be considered that the scale has stability and reliability. For low sensitivity analysis, further studies are needed, and further gastrointestinal surgery samples can be taken for testing in the future.

Validity
In strict accordance with the preparation requirements of the scale, this evaluation standard has good surface validity after preliminary positioning, expert consultation and selection of items by clinical and scientific teams. The principal component analysis method and the maximum variance rotation method were used to analyze the factors, and two common factors are extracted, and the cumulative contribution rate is 56.331%. The scale included the subjective symptom factor (including abdominal pain, nausea, vomit), and the objective symptom factor (including exhaust, defecation, bowel sounds), which covered all contents of the scale, suggesting that the scale was well structured.

Reactivity
In this study, we conducted oral Chinese medicine intervention for eligible patients. Intervention time was 9 and 16 points on the first day after surgery. The intervention method was oral Chinese medicine granules. Before intervention, the patients were evaluated for the first time (time: 08:00-08:30), and the score was taken as the total score before treatment. Afterwards, they were evaluated on time as the score after treatment. In this study, paired rank sum test【standardized effect size and standardized response mean was performed after multiple time (longitudinal) retesting, and the results indicated that the scale had a high degree of response.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.