Study selection and data extraction
The results of the selection and data extraction of the studies are presented in the PRISMA flow diagram (Fig. 1). A total of 41,886 records, published between 1973 and June of 2020, were considered potentially eligible and retrieved from eight databases. We removed 14,826 records in the duplicate screening phase. Titles and abstracts of 27,060 records were peer-reviewed screened (a total of three pairs of reviewers participated in the process, which evaluated 9,020 records each). A total of 336 records were retained for full-text assessment and 86 records were included. Furthermore, after manually searching the reference lists of the included studies, eight relevant records were added, totalizing 94 records, and 24 different PROMs were included in the review (Fig. 1).
Study and PROMs characteristics
The included studies were published between 1986 and 2020, and most of them were conducted in the United States in the English language. The sample size of included studies ranged from 31 to 6,261 participants. The percentage of females ranged from 14.0–79.3% and the mean age ranged from 47.0 to 81.9 years. Most of the studies were conducted in the hospital setting (n = 46) and didn’t present information about the disease duration (n = 70) and the number of medications being taken (n = 70).
Of the total 24 PROMs included in the review, most are one-dimensional and composed of items with a Likert-type scale response. Seven PROMs were extracted from a total of 30 studies [22–51] conducted with patients only having TD2M, another seven PROMs were pulled from 46 studies [52–97] with patients only having CVD, and the remaining 10 PROMs were pulled from 18 studies with patients having both, T2DM and/or CVD [98–115]. The most prevalent original language was English and 11 PROMs, which original languages were English, German or Urdu, have been translated to another language (Table 2).
Table 2
PROM
|
Target population
|
Recall period
|
(Sub)scale (s) (number of items)
|
Response options
|
Range of scores/scoring
|
Original language
|
Available translations
|
MMAS-8
|
DM and CVD
|
1 month
|
1 (8)
|
Dichotomous and Likert scale
|
0 to 8
|
English
|
16 languages
|
SMAQ
|
DM
|
NC
|
1 (6)
|
Dichotomous
|
6 to 12
|
English
|
1 language
|
MEDS
|
DM and CVD
|
6 months
|
5 (16)
|
Likert scale
|
16 to 80
|
English
|
None
|
MNPS
|
DM and CVD
|
1 year
|
1 (9)
|
Dichotomous
|
0 to 9
|
English
|
None
|
DMAS-7
|
DM
|
NC
|
3 (7)
|
Dichotomous
|
0 to 7
|
Arabic
|
None
|
ARMS-12
|
DM and CVD
|
NC
|
2 (12)
|
Likert scale
|
12 to 48
|
English
|
3 languages
|
MGT
|
DM and CVD
|
NC
|
1 (4)
|
Dichotomous
|
0 to 4
|
English
|
7 languages
|
MTA-OA
|
DM
|
NC
|
1 (7)
|
Likert scale
|
1 to 6
|
Portuguese
|
None
|
MTA - Insulin
|
DM
|
NC
|
1 (7)
|
Likert scale
|
1 to 6
|
Portuguese
|
None
|
LMAS-14
|
CVD
|
NC
|
4 (14)
|
Likert scale
|
0 to 42
|
Arabic
|
None
|
MTA
|
DM and CVD
|
NC
|
1 (7)
|
Likert scale
|
1 to 6
|
Portuguese
|
None
|
MARS-5
|
DM and CVD
|
NC
|
1 (5)
|
Likert scale
|
5 to 25
|
English
|
2 languages
|
A-14
|
CVD
|
NC
|
5 (14)
|
Likert scale
|
0 to 56
|
German
|
2 languages
|
ARMS-10
|
CVD
|
NC
|
2 (10)
|
Likert scale
|
10 to 40
|
English
|
1 language
|
MALMAS
|
DM
|
1 month
|
1 (8)
|
Dichotomous and Likert scale
|
0 to 8
|
English
|
1 language
|
ARMS-D
|
DM
|
NC
|
2 (11)
|
Likert scale
|
11 to 44
|
English
|
None
|
IADMAS
|
DM
|
NC
|
1 (8)
|
Dichotomous and Likert scale
|
0 to 8
|
Arabic
|
None
|
GMAS
|
DM and CVD
|
NC
|
3 (11)
|
Likert scale
|
0 to 33
|
Urdu
|
1 language
|
MAQ
|
CVD
|
NC
|
NC
|
NC
|
NC
|
Kannada, Malayalam
|
None
|
MMAS-5
|
CVD
|
1 month
|
1 (5)
|
Likert scale
|
No score
|
English
|
None
|
(continuation)
PROM
|
Target population
|
Recall period
|
(Sub)scale (s) (number of items)
|
Response options
|
Range of scores/scoring
|
Original language
|
Available translations
|
ProMAS
|
DM and CVD
|
NC
|
1 (18)
|
Dichotomous
|
0 to 18
|
Dutch
|
None
|
DOSE-Nonadherence
|
CVD
|
7 days
|
1 (3)
|
Likert scale
|
1 to 5
|
Engllish
|
1 language
|
ARMS‐7
|
DM and CVD
|
NC
|
2 (7)
|
Likert scale
|
7 to 28
|
English
|
1 language
|
5-item questionnaire
|
CVD
|
1 month
|
1 (5)
|
Dichotomous
|
0 to 1
|
English
|
None
|
Note: CVD = Cardiovascular diseases; DM = Diabetes mellitus; NC = Not clear; PROM = Patient-reported outcome measures.
A total of 35 studies reported a response rate for PROMs that ranged from 21.1–99.5%. Out of the 94 records, only 60.6% of the studies presented information about conflict of interests, and 67.0% informed if the research had any source of funding (Additional file 3).
The two reviewers (HCO and RCMR) are authors of one of the included studies [104]. The analysis of this article was done by a third reviewer (NMCA).
Summarized results of the PROMs' measurement properties
The summary of findings of measurement properties of the PROMs is presented in Table 3 and the summary of the risk of bias assessment is available on Additional files 4 and 5. The following are the results according to each of the measurement properties.
Content validity
The content validity resulted in overall ratings per PROM for relevance, comprehensiveness and comprehensibility, and overall content validity of the PROM. The indeterminate ratings for development or content validity studies were ignored in the overall rating assignment. All the PROM development studies were classified as of inadequate methodological quality and because of that none of them were considered in the overall rating assignment. In the development studies, although it has been reported how PROMS items were created, few studies have addressed the theoretical framework on which their creation was based [112, 114, 115]. Similarly, the target population's comprehensibility of the items created has rarely been assessed by cognitive interviews or debriefing [32, 107]. Regarding the quality of evidence, the studies where the results were not considered in the assignment of the overall ratings were also not considered in this step. According to the COSMIN, the criteria for recommending the use of a PROM is based on estimated content validity and at least low quality of evidence for internal consistency. The content validity encompasses the evaluation of aspects such as relevance, comprehensiveness, and comprehensibility of the PROM. The comprehensibility was the most evaluated aspect in the records, but incompletely. The most of the studies asked about comprehensibility of the items [22, 26, 38, 41, 51, 53, 58, 61, 62, 73, 76, 78, 85, 90, 91, 93, 95, 111, 113] but in a few studies the participants were asked about the comprehensibility of response options or recall periods [22, 38, 76, 85, 91, 93]. Relevance and comprehensiveness of PROMS were rarely evaluated among patients [46, 52, 90]. In some aspects the PROM overall rating was based only on the rating of the reviewers. The evaluation of the risk of bias of the development and content validity studies resulted in ratings doubtful or inadequate because some of the criteria evaluated in the COSMIN checklist weren’t clearly described in the records.
The PROM MGT showed moderate-quality evidence for sufficient content validity and the PROMs ARMS-10, MARS-5, ARMS-12, MALMAS, MTA, ARMS-7, MEDS, IADMAS, ProMAS and A-14 showed very low-quality evidence for sufficient content validity. The PROMs MMAS-8, MTA-OA, MTA - Insulin and GMAS showed inconsistent content validity. The remaining PROMs included in the review didn’t have their content validity evaluated in the selected papers.
Structural validity
The EFA was the statistical method most applied in the evaluation of structural validity of the PROMs [23, 25, 29, 31, 33, 34, 38, 40, 41, 44–46, 49, 51–53, 56, 60, 64, 66, 71–73, 78, 85, 93, 101, 103, 105, 107, 109, 111–113] followed by the confirmatory factor analysis [40, 46, 49, 51, 57, 58, 65, 72, 73, 88, 95, 98, 99, 103, 112, 113] and item response theory [22, 29, 77, 114].
Regarding to the assessment of the methodological quality of EFA, some studies were classified as of doubtful quality, since they didn’t report the rotation method used in the analysis [23, 33, 66, 71, 101].
In the evaluation of the EFA, some studies were classified as indeterminate because they didn’t report the percentage of variance explained [23, 29, 33, 34, 66, 78, 101] or the factor loadings [29, 66, 105]. One study didn’t report the results of the indices used to evaluate the confirmatory factor analysis [73] and another study [29] didn’t present the results of the indices of the item response theory analysis.
Some of the PROMs included in this review such as MEDS, MNPS, GMAS, DOSE-Nonadherence, ProMAS, and ARMS-7 showed high-quality evidence for sufficient structural validity. The PROMs such as DMAS-7, LMAS-14, MARS-5, ARMS-10, and ARMS-10 showed moderate-quality evidence for a sufficient structural validity and it was observed a high-quality evidence for insufficient structural validity for the MTA.
However, the structural validity of the MMAS-8, MGT, and the ARMS-12 were classified as inconsistent. The MMAS-8 presented results with one or two-factor solutions and also sufficient, insufficient, and indeterminate ratings. Similarly, the ARMS-12 presented sufficient and insufficient results in two or three-factor solutions, while the MGT presented only one-dimensional solution, but with sufficient, insufficient, and indeterminate ratings. An overall rating indeterminate was attributed to SMAQ, since the only study included for this PROM was classified as indeterminate [23]. The remaining PROMs included in the systematic review didn’t have their structural validity evaluated in the selected records.
Internal consistency
Regarding the analysis of the internal consistency property, the original factor structure of the PROM was considered in order to evaluate if Cronbach's alpha should be calculated for the total scale and or subscales or domains. One included study [76] was classified as of doubtful methodological quality, since the authors excluded four items from the PROM because of the low Cronbach's alpha coefficient obtained, without considering other reliability or validity estimates. In two studies [59, 115] it was not clear the number of the subscales of the PROM and in another two studies, the Cronbach’s alpha was not calculated for each of the subscales [23, 60].
The four PROMs (MEDS, MNPS, ARMS-D, DOSE-Nonadherence, and ProMAS) showed high-quality evidence for sufficient internal consistency. However, it was observed a very low-quality evidence for a sufficient internal consistency for the ARMS-10, while the PROMS such as DMAS-7 and ARMS-7 showed moderate quality evidence for an insufficient internal consistency. The three PROMs (MGT, LMAS-14 and GMAS) showed high quality evidence for an insufficient internal consistency. The internal consistency of the ten PROMs (MMAS-8, SMAQ, ARMS-12, MTA-OA, MTA-Insulin, MTA, A-14, MALMAS, IADMAS, and MAQ) were classified as indeterminate. The remaining PROMs included in the review didn’t have their internal consistency evaluated in the selected papers.
Cross-cultural validity\measurement invariance
The cross-cultural validity\measurement invariance was evaluated only for the MTA in a single included study [57]. The study presented a very good methodological quality and it was observed high-quality evidence for a sufficient cross-cultural validity\measurement invariance.
Reliability
All included studies that evaluated the reliability of PROMs were classified as of doubtful or inadequate methodological quality [27, 28, 32, 35, 39–41, 44, 46, 48, 49, 54, 60–62, 64, 67, 71–73, 78, 79, 82, 88, 90, 101, 103, 107, 111–113] as they didn't provide enough data to address items 4 (“Did the professional(s) administer the measurement without knowledge of scores or values of other repeated measurement(s) in the same patients?”) and 5 (“Did the professional(s) assign scores or determine values without knowledge of the scores or values of other repeated measurement(s) in the same patients?”) of the risk of bias checklist [116].
In addition, some studies [27, 28, 32, 35, 39, 46, 48, 54, 61, 62, 64, 71, 101, 103, 107, 112] used the correlation coefficient instead of the intraclass correlation coefficient (ICC) to evaluate reliability and in other studies [40, 41, 46, 71–73, 78, 101, 111], the measurement conditions (item 3 of the risk of bias – “Were the measurement conditions similar for the repeated measurements – except for the condition being evaluated as a source of variation?”) were considered inadequate.
The analyses applied in the evaluation of the reliability were ICC, kappa and correlation coefficients. The results of some studies were classified as indeterminate because it was calculated a correlation coefficient instead of ICC [27, 28, 32, 35, 39, 46, 48, 54, 61, 62, 64, 71, 101, 103, 107, 112].
ARMS-10 showed low-quality evidence for sufficient reliability. The PROMs MAQ and ARMS-7 showed very low-quality evidence for sufficient reliability. GMAS showed moderate and DOSE-Nonadherence showed low-quality evidence for insufficient reliability. The ARMS-7 showed moderate-quality evidence for an insufficient internal consistency. The reliability of the ARMS-12, MGT, MARS-5, MALMAS, and IADMAS were classified as indeterminate. The reliability of the MMAS-8 was rated as inconsistent.
Criterion validity
The analyses applied in the evaluation of the criterion validity were area under the curve, sensitivity, specificity, and some hypothesis tests. Some of the included studies didn’t report sensitivity and specificity analyses [23, 37, 38, 93, 101, 105, 107, 112, 113] and one study [56] didn’t provide an area under the curve analysis and because of that were classified as of inadequate methodological quality.
The PROMs DMAS-7, GMAS, and 5-item questionnaire showed high-quality evidence for a sufficient criterion validity. The PROMs MMAS-8, MEDS, MNPS, MGT, MTA, MARS-5, MALMAS, ARMS-D, and DOSE-Nonadherence showed high-quality evidence for an insufficient criterion validity. IADMAS, ARMS-12, and LMAS-14 showed moderate, low, and very low-quality evidence for insufficient criterion validity, respectively. SMAQ had its criterion validity classified as indeterminate. The remaining PROMs included in the review didn’t have their criterion validity evaluated in the selected papers.
Hypotheses testing
There were four included studies in which the methodological quality regarding the convergent validity of the PROMs was considered inadequate. In two of them [44, 89], it was rated inadequate because the comparator instrument had insufficient measurement properties. In the other two studies [26, 56], the statistical tests applied were not optimal or appropriate. One study was classified as of indeterminate quality, because the PROMs being correlated were not applied to the same participants [27].
Concerning to the known-groups validity, there was one included study that didn’t report description of the important characteristics of the groups being compared [112].
The analysis applied by the included studies were mainly correlation coefficients, regression models, and comparison and association tests.
The PROMs MMAS-8, MEDS, DMAS-7, ARMS-12, MALMAS, ARMS-D, GMAS, DOSE-Nonadherence, ProMAS, and 5-item questionnaire showed high-quality evidence for a sufficient construct validity. IADMAS and LMAS-14 showed moderate and low-quality evidence for a sufficient construct validity, respectively. MTA - OA and MTA – Insulin showed very low-quality evidence for a sufficient construct validity. The PROMs MGT, MTA, MARS-5 and A-14 showed high-quality evidence for insufficient construct validity. The remaining PROMs included in the review didn’t have their construct validity evaluated in the selected papers.
Responsiveness
Responsiveness was evaluated only for the MMAS-5 in a single study [83]. The study reported a very good methodological quality and it was observed a high-quality evidence for a sufficient responsiveness.
Meta-analysis
Considering the results obtained in the meta-analysis, the PROM GMAS showed high-quality evidence for an insufficient internal consistency. The PROMs MMAS-8, ARMS-12, and MGT were classified as indeterminate because its structural validity was classified as indeterminate. The MTA was classified as indeterminate because its structural validity didn’t show at least low evidence for sufficient structural validity and MALMAS didn’t have its structural validity evaluated in the included studies. It was observed moderate and high I² values [117] for the PROMs MMAS-8, ARMS-12, MGT, and GMAS (Table 4).
Table 4
Summarized results of the meta-analysis, heterogeneity, and quality of evidence of internal consistency of the PROMS included in this analysis.
PROM
|
Pooled alpha (CI 95%)
|
Sample size
|
p-value (χ² test)
|
I²
|
Overall rating
|
Quality of evidence
|
MMAS-8
|
0.68 (0.63–0.72)
|
9,235
|
< 0.0001
|
92.24
|
?
|
No information
|
ARMS-12 (subscale 1)
|
0.91 (0.57–0.98)
|
714
|
< 0.0001
|
99.24
|
?
|
No information
|
ARMS-12 (subscale 2)
|
0.72 (0.55–0.83)
|
714
|
0.0001
|
93.09
|
MGT
|
0.59 (0.50–0.67)
|
8,382
|
< 0.0001
|
94.02
|
?
|
No information
|
MTA
|
0.66 (0.59–0.72)
|
701
|
0.1466
|
47.92
|
?
|
No information
|
MALMAS
|
0.62 (0.54–0.69)
|
279
|
0.4233
|
0.00
|
?
|
No information
|
GMAS (subscale 1)
|
0.77 (0.71–0.82)
|
528
|
0.0766
|
61.09
|
-
|
High
|
GMAS (subscale 2)
|
0.75 (0.65–0.82)
|
528
|
0.0088
|
78.88
|
GMAS (subscale 3)
|
0.47 (0.18–0.68)
|
528
|
< 0.0001
|
93.05
|
Note: CI = Confidence interval; PROM = Patient-reported outcome measures; “−” = Insufficient; “?” = Indeterminate.
Interpretability and Feasibility
It was not possible to identify the information needed to evaluate the interpretability and feasibility in most of the included records. Considering that the evaluation of these aspects would be incomplete because of the lack of information, the reviewers decided to don’t evaluate these aspects.
Recommendations for selecting a PROM
According to the results of our systematic review, the ProMAS was the only PROM that reached the criteria of category “a”, i.e., the results obtained across the studies can be trusted and the PROM can be recommended for use. This PROM was the only one with evidence for sufficient content validity and at least low-quality evidence for sufficient internal consistency.
The PROMs, such as MTA, LMAS-14, GMAS, MEDS, MNPS, MALMAS, ARMS-D, DOSE-Nonadherence, MGT, MARS-5, and A-14, were categorized as not recommended for use (category “c”), because they presented high-quality evidence for at least one insufficient measurement property.
The remaining PROMs, i.e., MMAS-8, SMAQ, DMAS-7, ARMS-12, MTA-OA, MTA-Insulin, ARMS-10, IADMAS, MAQ, MMAS-5, ARMS-7 and 5-item questionnaire were considered as having the potential to be recommended for use (category “b”) because didn’t reach the criteria of the categories “a” or “c”.