On the appropriate interpretation of evidence: the example of anti-vascular endothelial growth factor for diabetic macular edema

2 Findings

We identified five NMAs (13–17) through searching electronic database (PubMed, Embase, Cochrane Library, Web of Science, CNKI, Wanfang, VIP) and screening. AMSTAR-2 tool (3) was employed to provide some rating of the quality of each of the reviews.

Everything seemed varied in almost every review (Table 1). Although the question under investigation was consistent, the searches, the numbers of studies used and definitions for eligible participants, comparisons from which to source data and acceptable outcomes mostly lacked rigid consistency.

Table 1

Summary of Included NMAs
						Study ID^a
						Korobelnik 2015(13)	Régnier 2014(14)	Zhang 2016(15)	Muston 2018(16)	Virgili 2018(17)
Protocol identified						✖	✖	✖	✖	✓
Search		Cochrane				✓	✓	✓	✓	✓
		EMBASE				✓	✓	✓	✓	✓
		MEDLINE				✓ ^b	✓ ^b	✓ ^c	✓ ^b	✓
		Others								✓ ^d
		Date				01/2013	02/2014	08/2015	12/2016	04/2017
Number of studies						11	8	21	13	24
Statistics	Network model					Bayesian	Bayesian	Bayesian	Bayesian	Frequentist
	Sensitivity analysis					Heterogeneous studies; ethnic group	Ethnic group		Heterogeneous studies; ethnic group	Studies at higher risk of bias^f
	Covariates					baseline BCVA and/or CRT	baseline BCVA and/or CRT		baseline BCVA and/or CRT
	Other								Some IPD
Participants	Diabetic macular edema					✓	✓	✓	✓	✓
						Significant, focal or diffuse ,	Baseline BCVA & CRT varied - 24-78 letters		As for Korobelnik 2015	Baseline visual acuity between 20/200 and 20/40
						DME secondary to diabetes involving the center of the macula
						retinal thickening due to DME/clinically significant macula edema with DR				previously received central/peripheral laser or treatment naïve included
Interventions^h	aflibercept					2q4 or 2q8	2 mg; bimonthly	intravitreal	2q8	2 mg^g
	aflibercept				+ laser	✓			✓
	ranibizumab					0.5 mg, PRN	0.5 mg, PRN	intravitreal	0.5 mg, PRN or 0.5 mg T&E or 0.3 mg, q4	0.5 mg or 0.3 mg
					+ laser	✓	✓		✓	deferred
					+ laser		✓		✓	prompt
	dexamethasone					implants		implants
	(continued)
	bevacizumab							intravitreal	1.25 mg	1.25 mg
	bevacizumab				+ laser	✓		✓	✓
	triamcinolone acetonide							intravitreal	4 mg, q4/PRN or 4 mg, q4
	triamcinolone acetonide				+ laser	✓		✓	✓
	pegaptanib									0.3 mg
	Laser					✓	✓	✓		✓
	Laser		+ sham injection			✓
	Sham						✓			✓
Outcomes	Binary			ETDRS letters^e		>10 and >15 gain; >10 and >15 loss	>10 gain		>10 and >15 gain; >10 and >15 loss	>15 gain
	Binary			AEs		✓		✓		✓
	Continuous (average change)			in BCVA using ETDRS charts		✓		✓	✓	✓
	Continuous (average change)							CMT		CRT measured using OCT
Quality rating (AMSTAR-2)						Low	Low	Low	Low	High
a Sorted by search date
b Including In-Process Citations and Daily Update
c PubMED
d International Clinical Trials Registry Platform; ISRCTN registry; LILACS; Novartis Clinical Trials database; US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov; World Health Organization
e in BCVA
f Post hoc
g Regarding drug dose and monitoring/retreatment regimen, Virgili 2018 included
schemes that are either on-label or commonly used in clinical practice (such as monthly, bimonthly, PRN, T&E
h No included NMAs included RCTs which investigated conbercept.
AE: adverse event; BCVA: best corrected visual acuity; CI: credible/confidence interval; CMT: central macular thickness; CRT: central retinal thickness; DME: diabetic macular edema; DR: diabetic retinopathy; ETDRS: Early Treatment Diabetic Retinopathy Study; IPD: individual patient data; NI: no information; NMA: network meta-analysis; OCT: optical coherence tomography; PRN: pro re nata; q4: every 4 weeks T&E: treat-and-extend; 2q8: 2 mg every 8 week

Table 2 summarised three outcomes in the NMAs. The first outcome shows, how, for the identical binary outcome, different reviews gathered data from different studies, and, partly due to this, arrived as slightly different point estimates – although all agreed on their being no clear difference between the two treatments (all 95% Confidence Intervals straddled zero); and the other two outcomes reproduce the results from each review and illustrates how the same measure is reported in different ways and different combinations across the five reviews.

Table 2

Summary of three outcomes in NMAs
	Régnier 2014(14)	Korobelnik 2015(13)	Zhang 2016(15)	Muston 2018(16)	Virgili 2018(17)
Gain ≥ 10 ETDRS letters at 12 months (three reviews^a)
OR [95% CrI]	0.63 [0.19 to 1.63]	1.59 [0.75 to 3.35]	NR	1.79 [0.63 to 4.06]	NR
Studies reporting these data^b in each NMA
Elman 2010(18) [DRCR.net Protocol I]	Included	Included	NR	Included	NR
Mitchell 2011(19) [RESTORE]	Included	Included	NR	Included	NR
Korobelnik 2014(20) [VIVID; VISTA]	Included	Included	NR	Included	NR
Massin 2010(21) [RESOLVE]	Included	Not included^c	NR	Not included^c	NR
Googe 2011(22) [DRCR.net Protocol J]	Not included^d	Included	NR	Included	NR
Do DV 2012(23) [Da VINCI]	Included	Not included^d	NR	Not included^d	NR
Ishibashi 2015(24) [REVEAL]	Not included focus on Asian population^e	Included^e	NR	Included	NR
RESPOND(25) [NCT01135914]	Included	Not included^f	NR	Included	NR
Nguyen 2009(26) [READ-2]	Included^e	Not included^g	NR	Not included^g	NR
Average change in BCVA^h at 12 months MD [95% CrI] (five reviews)
	4.5 [1.5 to 7]ⁱ	4.67 [2.45 to 6.87]	2.07 [-0.97 to 5.33]	5.20 [1.90 to 8.52]	4 [2.5 to 5.5]
Gain ETDRS letters^h at 12 months OR [95% CrI] (five reviews)
≥ 10	0.63 [0.19–1.63]	1.59 [0.75–3.35]	NR	1.79 [0.63–4.06]	NR
≥ 15	NR	NR	NR	2.30 [1.12-4.20]	1.33 [1.06-1.67]^j
a Zhang 2016 and Virgili 2018 did not report this outcome.
b the additional reasons presented for ‘included’ or ‘not included’ were identified by author team of this review, not identified in original texts of NMAs
c data unavailable on ranibizumab 0.5 mg
d unclear reason for exclusion
e included in sensitivity analysis
f unpublished when NMA conducted
g data only reported at 6 months
h higher values represent better visual acuity measured using ETDRS letters
i data were analysis by author team of this review (Bayesian network model/random effects, using ADDIS software), not reported in original texts of Re´gnier 2014.
j data were risk ratio (RR) and its 95% CrI
CrI: credible interval; BCVA: best corrected visual acuity; ETDRS: Early Treatment Diabetic Retinopathy Study; NMA: network meta-analysis; NR: not reported; OR: odds ratio.

3 Discussion

Generally speaking, it is clear that the reader must continue to think ‘cleanly’ amidst the data which may not be quite so clean. Below we discuss some points of particular concern that are illustrated by these five similar NMAs.

3.1 Network differences

Network meta-analysis employs data from [in these cases] randomised trials in ways by which comparisons of interest can be constructed by using somewhat assumption-heavy observational methods. For example, aflibercept versus ranibizumab is the comparison of interest (referred to as the decision set). Aflibercept or ranibizumab, however, may only have been compared with sham injection in randomised trials. We use the term ‘supplementary set’ to refer to interventions, such as sham injection, that are included in the network meta-analysis for the purpose of improving inference among interventions in the decision set(4). As different selection for supplementary set, different network structure will be conducted for the same clinical problem. In selecting which competing interventions to include in decision set, researchers should ensure that the transitivity assumption is likely to hold, mostly based on clinical considerations(4, 27).

When theoretical assumptions are guaranteed (transitivity and consistency), there is no absolute right or wrong in the construction of a network structure. The reader of the review should carefully consider if she/he feels the network indirect comparisons are sensible and making best use of available data.

2.2 PICO differences

Mostly, different decisions for the PICO choices are all considered to make clinical sense in different NMAs. These differences do then lead to results that are not identical. For example, one review may feel that a systematic difference in participants in a particular trial may be make it inappropriate to network with the data of other studies (e.g., Ishibashi 2015, only included in sensitivity analysis in Régnier 2014, included in main analysis of Korobelnik 2015). Variations in dosage of treatments may, in the view of one review team, make a study ineligible but be acceptable to other researchers (e.g., Massin 2010, included in Régnier 2014, excluded from Korobelnik 2015/Muston 2018). Time-point of outcome assessment can add more differences (e.g., Nguyen 2009, included in Régnier 2014, excluded from Korobelnik 2015/Muston 2018).

These decisions, all done with the best of intentions, lead to inclusion of different studies contributing to the final – slightly different - results as illustrated in Table 2. It should not be a surprise that clinicians and researchers evolve their ideas and differ even at the same time. Readers need to consider and understand what participants are included – and excluded – in the review, what treatments are its focus and if there are omissions, and what outcomes are being reported and why those choices were taken.

2.3 Different data from the same measures of effect

Clinicians and patients first tend to seek if the treatment will help them, for example, ‘get better’ (a question that merits a binary answer) and then, only as second preference, seek more detailed information on the degree of improvement (meriting an answer on a continuous scale). The dichotomous or binary though is often a crude and even arbitrary cut-off within an ostensibly continuous measure. Continuous measures are, however, often a research fabrication and not truly continuous.

In the examples in Table 2 the average change improvements seem relatively consistently to be a matter of around 4 points. It is problematic to really understand what this may mean for any one patient’s life. In averaging across the groups something may be lost, however, that is revealed in the binary and Table 2 gives good evidence for speculation. What trials that report a ≥10-point gain consistently are reviewed to show no clear difference between aflibercept and ranibizumab but the two latest reviews have a new binary to report (≥15 point gain) and both show advantage for those allocated to aflibercept. Perhaps in the averaging across all people in the trials there has been a masking of an important group of people who respond better to aflibercept. But these are clinical and research points of debate.

Overall, the five reviews have reported results that are complicated, thought provoking, but not truly inconsistent with each other. The reader needs to consider the value of the outcome for their need. The researchers may favour the continuous measure of function, the clinician or patient the binary cut-off for better/not better and the policy maker the economics.

2.4 Differences in what is truly significant

When the synthesis of data produces a pre-stated level of statistical significance, however, the findings of the outcome measure may not have great clinical impact. It is easy for confusion to arise when the same data are commented from the statistical perspective or clinical meaning. Careful consideration is required from the reader to understand the assessment of the reviewers - are they reporting the clinical or statistical perspective – or a mixture of both.

Further danger of differing interpretations of the same findings lies in when confidence intervals straddle zero (for continuous data) or 1 (for binary data – as for all the ≥10 point gain findings in Table 2). It is easy for reviewers and readers of the reviews to confuse ‘no evidence of an effect’ with ‘evidence of no effect’. When confidence intervals are wide, for example the 0.63 to 4.06 of Muston 2018 in Table 2, they straddle 1 or unity. In this case it is wrong to claim that aflibercept has ‘no effect’ or is ‘no different’ from ranibizumab – both statements carry too much certainty. It is true, there is no clear difference, but one drug is not clearly different to the other. If a true beneficial effect is mentioned in the conclusion, a true harmful effect should also be mentioned and discussed.

As always, really thinking about the meaning of findings is key. Together, the point estimate and confidence interval provide information to assess the effects of the intervention on the outcome. For example, in the evaluation of these drugs on BCVA it could have been decided that it would be clinically useful if the medication increased BCVA from baseline by 5 letters – and at the very least 2 letters. Virgili 2018 reports an effect estimate of an increase from baseline of 4 letters with a 95% confidence interval from 2.5 to 5.5 letters. This allows the conclusion that aflibercept was useful since both the point estimate and the entire range of the interval exceed the criterion of an increase of 2 letters. The Régnier 2014 review reported similar point estimate (4.5 letters) but with a wider interval from 1.5 to 7 letters. In this case, although it could still be concluded that the best estimate of the aflibercept effect is that it provides net benefit, the reader could not be so confident as the possibility still has to be entertained that the effect could be between 1.5 and 2 letters – a low range that had been pre-specified to be of little clinical value. The contrast of Régnier 2014 and Virgili 2018 serves well to illustrate how very similar findings may justify subtly different implications. The reviewers carry a responsibility to help the reader by clear reporting and thoughtful inclusive explanations – but where this has not happened the readers may have to do this for themselves.

References

Bastian H, Glasziou P, and Chalmers I. Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? Plos Medicine. 2010;7(9):e1000326.
Mulrow CD. Rationale for systematic reviews. Bmj. 1994;309(6954):597–9.
Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. Bmj. 2017;358:j4008.
Higgins JPT TJ, Chandler J, Cumpston M, Li T, Page MJ, Welch VA. Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020). Cochrane. 2020.
Ciulla TA, Amador AG, and Zinman B. Diabetic retinopathy and diabetic macular edema: pathophysiology, screening, and novel therapies. Diabetes Care. 2003;26(9):2653–64.
Fenwick EK, Xie J, Ratcliffe J, Pesudovs K, Finger RP, Wong TY, et al. The Impact of Diabetic Retinopathy and Diabetic Macular Edema on Health-Related Quality of Life in Type 1 and Type 2 Diabetes. Investigative Ophthalmology & Visual Science. 2012;53(2):677–84.
Hariprasad SM, Mieler WF, Grassi M, Green JL, Jager RD, and Miller L. Vision-related quality of life in patients with diabetic macular oedema. British Journal of Ophthalmology. 2008;92(1):89–92.
Chen E, Looman M, Laouri M, Gallagher M, Van Nuys K, Lakdawalla D, et al. Burden of illness of diabetic macular edema: literature review. Current Medical Research & Opinion. 2010;26(7):1587.
P.H. S, M.L. M, C. B, E. J, P. H, and S. K. Reported symptoms and quality-of-life impacts in patients having laser treatment for sight-threatening diabetic retinopathy. Diabetic Medicine. 2006(No.1):60–6.
Jain A, Varshney N, and Smith C. The Evolving Treatment Options for Diabetic Macular Edema. Int J Inflam. 2013;2013:689276.
Schmidt-Erfurth U, Garcia-Arumi J, Bandello F, Berg K, Chakravarthy U, Gerendas BS, et al. Guidelines for the Management of Diabetic Macular Edema by the European Society of Retina Specialists (EURETINA). Ophthalmologica. 2017.
Gemmy, CM, Cheung, Young, Hee, Yoon, et al. Diabetic macular oedema: evidence-based treatment recommendations for Asian countries. Clinical & Experimental Ophthalmology. 2017.
Korobelnik JF, Kleijnen J, Lang SH, Birnie R, Leadley RM, Misso K, et al. Systematic review and mixed treatment comparison of intravitreal aflibercept with other therapies for diabetic macular edema (DME). BMC Ophthalmology,15,1(2015-05-15). 2015;15(1):52.
Stephane R, William M, Felicity A, Jonathan W, Vladimir B, and Andreas W. Efficacy of Anti-VEGF and Laser Photocoagulation in the Treatment of Visual Impairment due to Diabetic Macular Edema: A Systematic Review and Network Meta-Analysis. Plos One. 2014;9(7):e102309.
Lu Z, Wen W, Yan G, Jie L, Xie L, and Alan S. The Efficacy and Safety of Current Treatments in Diabetic Macular Edema: A Systematic Review and Network Meta-Analysis. Plos One. 2016;11(7):e0159553.
Muston D, Korobelnik JF, Reason T, Hawkins N, Chatzitheofilou I, Ryan F, et al. An efficacy comparison of anti-vascular growth factor agents and laser photocoagulation in diabetic macular edema: a network meta-analysis incorporating individual patient-level data. Bmc Ophthalmology. 2018;18(1).
Virgili G, Parravano M, Evans JR, Gordon I, and Lucenteforte E. Anti-vascular endothelial growth factor for diabetic macular oedema: a network meta-analysis. Cochrane Database of Systematic Reviews. 2017;6(6):CD007419.
Elman MJ, Aiello LP, Beck RW, Bressler NM, and Sun JK. Randomized Trial Evaluating Ranibizumab Plus Prompt or Deferred Laser or Triamcinolone Plus Prompt Laser for Diabetic Macular Edema. Ophthalmology. 2010;117(6):1064-77.e35.
Mitchell P, Bandello F, Schmidterfurth U, Lang GE, Massin P, Schlingemann RO, et al. The RESTORE study: ranibizumab monotherapy or combined with laser versus laser monotherapy for diabetic macular edema. Ophthalmology. 2011;118(4):615–25.
Korobelnik JF, Do DV, Schmidt-Erfurth U, Boyer DS, Holz FG, Heier JS, et al. Intravitreal aflibercept for diabetic macular edema. Ophthalmology. 2014;121(11):2247–54.
Massin P BF, Garweg JG, Hansen LL, Harding SP, et al. Safety and efficacy of ranibizumab in diabetic macular edema (RESOLVE Study): a 12-month, randomized, controlled, double-masked, multicenter phase II study. Diabetes Care. 2010;33:2399–405.
Googe J, Brucker AJ, Bressler NM, Qin H, Aiello LP, Antoszyk A, et al. Randomized trial evaluating short-term effects of intravitreal ranibizumab or triamcinolone acetonide on macular edema after focal/grid laser for diabetic macular edema in eyes also receiving panretinal photocoagulation. Retina (Philadelphia, Pa). 2011(No.6):1009–27.
Do DV, Quan DN, Boyer D, Schmidt-Erfurth U, and Heier JS. One-year outcomes of the DA VINCI study of VEGF trap-eye in eyes with diabetic macular edema. Ophthalmology. 2012;119(8):1658–65.
Ishibashi T, Li X, Koh A, Lai TY, Lee FL, Lee WK, et al. The REVEAL Study: Ranibizumab Monotherapy or Combined with Laser versus Laser Monotherapy in Asian Patients with Diabetic Macular Edema. Ophthalmology. 2015;122(7):1402–15.
Clinicaltrial.gov. Safety, efficacy and cost-efficacy of ranibizumab (monotherapy or combination with laser) in the treatment of diabetic macular edema (DME) (RESPOND). NCT01135914. https://www.clinicaltrials.gov/ct2/show/NCT01135914?term=RESPOND&cond=DME&rank=1. 7 May 2018.
Nguyen QD, Shah SM, Heier JS, Do DV, Lim J, Boyer D, et al. Primary End Point (Six Months) Results of the Ranibizumab for Edema of the mAcula in diabetes (READ-2) study. Ophthalmology. 2009;116(11):2175-81.e1.
Salanti G, Ades AE, and Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol. 2011;64(2):163–71.
Critical Appraisal Checklist For A Systematic Review. https://www.gla.ac.uk/media/Media_64047_smxx.PDF. Accessed 20 March 2021.
Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology. 2011;64(4):383–94.

On the appropriate interpretation of evidence: the example of anti-vascular endothelial growth factor for diabetic macular edema

Abstract

1 Background