Data Sources and Search Strategy
One researcher conducted a literature search in PubMed, Cochrane, Embase, and Web of science databases on October 7, 2022, with no date restrictions and using English as a filter. The search formula for each database is shown in Supplementary Material 1. In addition, the researchers checked the reference list of the literature read in full for additional relevant citations (Fig. 1).
Inclusion and exclusion criteria
Studies were original literature from PROMs or psychometric literature, applicable to adult patients with a confirmed FD and published publicly in academic journals in English. Studies were excluded if they were reviews, commentaries or responses to the original study, or abstracts without the full text (presented at a conference). In addition, studies that were not symptom assessments (e.g., scales used to diagnose FD) or that assessed only quality of life, perceived knowledge, or psychology were excluded (Fig. 1). Titles and abstracts found in the literature search were judged independently by two researchers, and for the remaining full-text articles were also independently searched and judged eligible by two reviewers. Both researchers agreed on the inclusion of these articles, and if any inconsistencies arose, agreement was reached by consulting a third researcher. Full-text articles were screened if at least one researcher considered a study relevant based on the abstract, or if there was doubt.
Data analysis
After removing duplicates from four different databases using Endnote software, three researchers extracted data from the included studies using a standardized data collection tool. Two researchers independently evaluated the psychometric properties and methodological quality of the included PROMs using the COSMIN systematic review guidelines [16] and cross-checked the results. The GRADE approach was then used to synthesize the level of evidence for the inclusion of each PROMs and to form a final recommendation for the scale. In case of disagreement, it was resolved through consultation and agreement with a third researcher.
(1) COSMIN Risk of Bias checklist [18]: COSMIN Bias Risk Checklist is a part of COSMIN Guidelines. The methodological quality of the included scales was assessed using the COSMIN Risk of Bias Inventory in five dimensions: very good (V), adequate (A), doubtful (D), insufficient (I), or not assessed (NA). If the risk of bias differs across entries in a measurement characteristic, the overall risk of bias rating (related to the method used) is assessed based on the "lowest score count" principle.
(2) COSMIN quality criteria [19]: COSMIN quality criteria is another part of COSMIN guidelines. The psychometric characteristics of the included scales were assessed using the COSMIN quality criteria, and the included PROMs were rated as adequate (+), inadequate (-), inconsistent (+/-), or uncertain (?) based on the results reported in the individual studies. If only one study assessed the psychometric properties of a scale, it was used as an overall quality assessment; if multiple studies jointly assessed a particular psychometric characteristic of a scale, their individual scores were aggregated to provide an overall quality assessment of the instrument.
(3) GRADE approach[17]: the COSMIN modified GRADE was used to assess the level of evidence of the included scales into 4 dimensions: high (High), moderate (Moderate), low (Low) and very low (Very low).COSMIN assumes that each measured characteristic is of high quality and then downgrades them according to the following 4 components : (i) risk of bias (if there is a serious, very serious, or extremely serious risk of bias, then downgrade by one, two, or three levels, respectively); (ii) inconsistency (if the inconsistency is serious or very serious, then downgrade by one or two levels, respectively); (iii) imprecision (if the sample size is between 50 and 100, then downgrade by one level; if the sample size is < 50, then downgrade by two levels); (iv) indirectness (if it is serious of or very serious indirectness, then the grade was lowered by one or two levels, respectively). The recommended grades were categorized into 3 areas, A (meaning that there was sufficient evidence of content validity and at least low quality evidence of sufficient internal consistency), B (not meeting the criteria for category A or C), and C (having high quality evidence of insufficient psychometric evidence), to determine the best available PROMs for FD symptom assessment based on the content of the assessment.