Widespread Pain Phenotypes Impact Treatment Efficacy Results in Randomized Clinical Trials for Interstitial Cystitis/Bladder Pain Syndrome: A MAPP Network Study

Clinical trials of pain are notoriously difficult and inefficient in demonstrating efficacy even for known efficacious treatments. Determining the appropriate pain phenotype to study can be problematic. Recent work has identified the extend of widespread pain as an important factor in the likelihood of response to therapy, but has not been tested in clinical trials. Using data from three previously published negative studies of the treatment of interstitial cystitis/ bladder pain with data on the extent of widespread pain, we examined the response of patients to different therapies base on the amount of pain beyond the pelvis. Participants with predominately local but not widespread pain responded to therapy targeting local symptoms. Participants with widespread and local pain responded to therapy targeting widespread pain. Differentiating patients with and without widespread pain phenotypes may be a key feature of designing future pain clinical trials to demonstrate treatments that are effective versus not.


Introduction
Pain is a subjective experience which is known to have multiple mechanisms for the generation and maintenance of painful syndromes. The resulting heterogeneity of patient populations enrolled in clinical trials is thought to one of the primary issues in conducting clinical trials to demonstrate e cacy even for known e cacious therapies 1,2 . Recent work has identi ed the extend of widespread pain as an important factor in the likelihood of response to therapy in chronic pelvic pain syndromes including interstitial cystitis/bladder pain syndrome (IC/BPS), but has not been tested in clinical trials.
Interstitial cystitis/bladder pain syndrome (IC/BPS) is estimated to affect 2.9-4.2% of the US population with a high degree of patient morbidity and few effective treatments. 3 In one of the largest comprehensive cohort studies of patients with pain, the Multidisciplinary Approach to the Study of Chronic Pelvic Pain (MAPP) Research Network has measured and evaluated the importance of a number of factors in the course of the disease over time, including the extent of widespread pain beyond the pelvis, [4][5][6][7][8][9] and its association with Chronic Overlapping Pain Conditions (COPCs). COPCs are known to occur in 39% of IC/BPS patients, indicating the degree of pain manifested in other parts of the body. 10,11 In MAPP patients, the extent of COPCs and pain beyond the pelvic sites reported on a pain body map were found to be contributing factors to IC/BPS symptom severity and lower quality of life. 7 The MAPP Network has also demonstrated fundamental differences in pain-related brain connectivity (functional MRI scans) 8 , experimental pain sensitivity (quantitative sensory testing) 12 , and immunological factors between patients with localized versus more widespread pain. 13 Until the end of the 20th century, idiopathic widespread pain was often used as a marker for a "psychogenic" cause of pain due to the limited understanding of the interaction between the central and peripheral nervous system. With the recent rapid growth in knowledge about the complexities of the nervous system and in large part due to the development of functional brain imaging 14 , it is now understood that widespread pain is often a marker of peripheral and central pain pathway dysfunction deserving of intense basic and clinical research to promote improved care for patients with chronic pain. 15 A growing number of clinical trials have explored potential therapies for widespread pain, 16,17 but almost none have examined its impact on the outcome of trials for speci c pain syndromes and other symptoms.
This has come about in a setting where advances in personalized therapeutics require increasingly more precise characterization of individual phenotypes, especially in similar pain syndromes with different mechanistic phenotypes. 18,19 Without adequate phenotyping, testing of mechanistically based therapies can lead to false negative clinical trial results, due to dilution of the potentially responsive phenotype. Precise characterization of individual phenotypes will have an important impact on future clinical trial designs meant to identify effective targeted therapies for disorders that frequently include patients with diverse symptoms. 20, 21 To extend the MAPP cohort study ndings to clinical trial data, we used the availability of individual patient data from previously conducted randomized clinical trials (RCTs) of treatments for IC/BPS to explore the impact of the extent of widespread pain on treatment responses in RCTs.

Demographics
Participant characteristics and baseline measures, [22][23][24] are summarized by pain widespreadness subgroups in Table 1 using the package tableone in R. 34 Average age ranged from 37.3 to 49.9 years, female sex from 73% to 96%, and Caucasian race from 60% to 89% across subgroups and RCTs. Based on these differences, analyses were adjusted for age, sex, and race (Table 1).

Treatment Response in Combined Sample
Most treatment effects for absolute change in NRS outcomes in the full study population were not signi cant (α=0.05), with the exception of urgency (change in urgency=-0.57, 95% CI [-1.09,-0.04], p=.034) in the BCG trial and frequency in the amitriptyline trial (change in frequency=-0.78, 95% CI [-1.38,-0.18], p=.011) (Extended Table 1 Neither of the single treatments, PPS or hydroxyzine, were statistically different from placebo for either pelvic pain or urinary urgency outcomes in both pain widespreadness subgroups. Therefore, the single treatment and placebo arms were combined to form a comparator arm to the combination PPS/Hydroxyzine therapy. The combination treatment versus the comparator arm was statistically signi cant for pelvic pain (change in pelvic pain=-1.44, 95% CI [-2.64,-0.25], p=0.018) ( Figure 1, Table 2), and urinary urgency (change in urinary urgency=-1.09, 95% CI [-2.08,-0.10], p=0.031) ( Figure 2, Table 2), but only for patients with low pain widespreadness. In the 1 ─ECDF curves for observed change in pelvic pain (Extended Figure 8), the number of participants achieving 50% improvement or greater for the combined therapy compared to the control arm were 35% vs 9% in the low pain widespreadness subgroup, with nearly overlapping response rates in the high pain widespreadness subgroup.
Intravesical Bacillus Calmette-Guerin (BCG) study Treatment response was not statistically signi cant for change in pain or urgency in either subgroup (Figures 1b and 2b) but was for GRA response ( Table 2) comparing BCG treatment to placebo only in the high widespread pain group. All 1 ─ ECDF curves overlapped indicating no difference in observed improvement (Figures 1-2).

Amitriptyline:
Amitriptyline compared to placebo, with all patients receiving an IC/BPS focused EBMP, demonstrated a statistically signi cant difference for change in pelvic pain (change in pelvic pain=-1.14, 95% CI [-2.08,-0.19], p=0.019) (Figure 1), urinary frequency (change in urinary frequency=-1.53, 95% CI [-2.50,-0.56], p=0.002) ( Figure 3) and the GRA response (log odds ratio=1.18, 95% CI [0.29,2.07], p=0.009) outcomes for patients with high widespread pain ( Table 2). Amitriptyline with EBMP response was similar in both widespread pain subgroups; however, placebo with EBMP was similarly effective in the low widespread pain group leading to non-signi cant treatment differences ( Figure 3) but much less effective in the high widespread pain group leading to the statistically signi cant differences. Comparing observed percentage change in the 1 ─ ECDF curves between subgroups, differences between treatments were observed for the high pain widespreadness subgroup for both pelvic pain and frequency outcomes. For frequency, the difference in proportion of patients with 50% improvement or higher within the high widespreadness subgroup was 43% vs 19% for amitriptyline with EBMP compared to placebo with EBMP (Extended Figure 10).

Discussion
The primary nding from our re-analysis approach is the in uence of the widespread pain phenotype on treatment e cacy for patients with IC/BPS. We identi ed symptom improvement from the PPS/hydroxyzine combination only in patients with low widespread pain. Amitriptyline demonstrated similar treatment improvement responses in both widespread pain groups, but only statistically signi cant in the high widespread pain group because the IC/BPS focused EBMP control group demonstrated similar e cacy in the low widespread group, but much lower bene t in the high widespread group. To the best of our knowledge, this study is the rst to support the importance of widespread pain on treatment e cacy for patients with IC/BPS and supports the likelihood of a similar impact on other centrally vs peripherally pain syndromes.
Importantly, the effect of widespread pain on response to treatment was different in the three studies, but consistent with the putative mechanism of action of the treatment. PPS/hydroxyzine is thought to target localized pelvic symptoms speci cally.
The combination therapy was only superior in the absence of widespread pain, suggesting that the local bene t was not adequate in patients with pain extending beyond the pelvis. In contrast, the noradrenergic/serotonergic effects of amitriptyline have been demonstrated to provide bene t in generalized pain 35,36 , and the anti-cholinergic effects are thought to reduce some urinary symptoms. 37 Amitriptyline demonstrated an effect in both the high and low widespread pain phenotypes; however, the EBMP behavioral therapy alone was effective for both pain, urinary, and frequency symptoms only in the absence of widespread pain. The level of response in the EBMP behavioral control group limits the detection of a statistically signi cant effect for amitriptyline in the low widespread group, despite the same level of response in all patients. BCG instillation is a local therapy with a low level of bene t for both the low and high widespread pain groups for pain or urgency. Without consideration of the widespread pain phenotype, the negative results of the analysis of the whole population may not identify potentially effective therapies for a group of patients for whom there are few treatments.
The heterogeneity of treatment effect is known to be important in evaluating clinical trials. A baseline risk-based approach de ning outcome heterogeneity on identi able phenotypic characteristics may provide useful insights for the interpretation of previously conducted RCTs and have implications in the design of future clinical studies. 38 Such a re-analyses approach to de ning phenotypes in patients with diabetes has led to better understanding of response in drug vs behavioral therapy. 39 Speci c phenotype-based analyses should be planned a priori to increase the assay sensitivity of the clinical trial. 40 In addition, our results suggest that the spatial distribution of a patient's pain should be measured and considered in the choice of therapies used for patient care as well. 40 We recognize several important limitations of our study. Although our results support the importance of overall pain widespreadness in treatment response these ndings need to be con rmed in future clinical trials and other prospective studies as subgroup analyses have well-known limitations. 38,41 It will be important to conduct future studies using sensory testing methods, in addition to patient reports of pain, to differentiate underlying processes that may maintain widespread pain, because none of our earlier studies included such measures. The differences between our three studies involve more than just the focus of the therapies. The amitriptyline study enrolled new IC/BPS patients; whereas the PPS/hydroxyzine and BCG studies enrolled patients with more chronic conditions who had undergone previous treatment. These differences are also a strength since the presence of widespread pain had a substantial impact on two of the three trials. Lastly, our measure of widespread pain phenotype was limited to ve available questions about non-pelvic pain, which may be less robust than a body map. Nevertheless, we detected a substantial difference in treatment response in two of the three studies. A strength of our study is the quality of the trial data, which were collected as part of well-designed, and carefully conducted clinical trials, with rigorous data quality procedures, and limited missing data.

Conclusions
The present study strongly supports the importance of the baseline identi cation of patient widespread pain phenotypes that substantially affect the outcomes in two of three previously published RCTs for IC/BPS. The data suggests that treatments focused on urinary and pelvic pain symptoms are more likely to demonstrate bene t in patients where local symptoms predominate. In contrast, patients with widespread pain may require centrally-directed treatments intended for more generalized pain. Our ndings support the importance of additional research to identify methods of measuring characteristics that de ne important symptom phenotypes, including those measures in the design of clinical trials, and making evidence-based decisions about which patients to include in studies of symptomatic therapy, including those with IC/BPS.   23 , and amitriptyline 24 ) collected WSS data, permitting re-analyses incorporating baseline strati cation by widespread pain. Patients were >18 years of age, and for the PPS/hydroxyzine and BCG studies, had symptoms for 24 weeks, a pain/discomfort score of >4/9, urinary frequency >11 times in 24-hours for four weeks, and IC/BPS diagnosis veri ed through cystoscopy and hydrodistension. Patients were followed for 24 weeks for the PPS/hydroxyzine trial and 34 weeks for the BCG trial. The amitriptyline trial focused on untreated IC/BPS patients with at least 6 weeks of symptoms and pain severity and urinary frequency scores of ≥3/10 for at least four weeks. The study did not require a previous IC/BPS diagnosis for inclusion. Amitriptyline doses were titrated from 25 to 75mg, as tolerated, and followed for 12 weeks. All patients in this trial also received a standardized IC/BPS focused education and behavioral modi cation program (EBMP), including instruction on uid and food, bladder, and stress management.

Baseline Strati cation by Pain Widespreadness
To construct subgroups with maximal separation of baseline symptoms, an unsupervised consensus clustering (CC) algorithm (ConsensusClusterPlus) in R (version 3.4.1) 26 was applied over the combined RCT dataset consisting of 16 WSS pain, urinary and frequency symptoms questions and 5 measures of pain beyond the pelvis (headache, backache, chest pain, joints aches, abdominal cramps), each reported on a 0-6 scale from "not-at-all" to "a lot". 25 An ideal classi cation rule using only the 5 WSS measures of pain beyond the pelvis was developed to predict membership to the high widespread pain cluster through logistic regression models with receiver operator characteristics curves (ROC) applied to a harmonized analysis dataset of all three RCT studies. 27,28 In particular, participants had to have a score of ≥2/6 reported for ≥3 of 5 WSS pain questions to be classi ed as high widespreadness of pain in our analyses. Further details about consensus clustering and development of widespreadness classi cation can be found in the Extended Methods and Extended Figures 1-4.
Outcomes IC/BPS symptoms of pelvic pain and urinary urgency were measured on a numeric rating scale (NRS), and absolute change was estimated as the difference between the study endpoint measure and the average of the screening and baseline visit measure. Percent change was calculated as the outcome at end of study minus the average of baseline and screening, divided by the outcome and then multiplied by 100 for each NRS scale. A 0-9 NRS scale was used for pain and urinary urgency in the PPS/hydroxyzine and BCG trials; whereas the 0-10 scale was used in the amitriptyline trial which measured urinary frequency in addition. As an alternative to evaluating change in 3 separate IC/BPS symptoms, the original RCT analyses used a global response assessment (GRA) for the primary outcome measure. The 7-point GRA scale, collected at study endpoint, ranged from 1:markedly worse, 2:moderately worse, 3:slightly worse, 4:no change, 5:slightly improved, 6:moderately improved, or 7:markedly improved, with a responder de ned as ≥6.

Analyses
All statistical analyses were implemented within SAS 9.4. The primary modeling was conducted with the three NRS outcomes speci ed above. All statistical hypothesis tests are two-sided with a level of signi cance of p=0.05 and were not corrected for multiple comparisons in our analyses. As this work is a reanalysis of clinical trial data, all analyses are post-hoc and results are interpreted as exploratory and hypothesis-generating.

Absolute Change Modeling for NRS Measures
The primary re-analysis of data from each RCT was designed to detect differential response between treatment arms within widespread pain strata for each NRS measure (pelvic pain, urinary urgency, urinary frequency). The absolute change between the outcome at the end of study and baseline average of outcome at screening and randomization visit was modeled within a GLM, with separate treatment effects for each widespread pain subgroup, and covariate effects for average baseline outcome, age, race, and sex. Negative treatment effects were indicative of symptom improvements. All primary analyses implemented multiple imputation with predictive mean matching and m=100 imputations 29,30 to impute missing outcome and baseline covariates assuming data are missing at random. Rubin's rules were used to calculate the nal model coe cients and corresponding standard errors. [29][30][31] (Extended Methods) Treatment effect heterogeneity between widespread pain subgroups was formally tested within GLM for each NRS measure. Complete case analyses and modeling treatment effects without regard to widespreadness were run as sensitivity analyses (Extended Tables 1-2). Estimated change in NRS measures of treatment and control for each widespreadness subgroup were calculated (Extended Table 3) and displayed along with the mean trajectory plots (Figures 1-3 The second re-analysis modeled responders (GRA=6,7) with logistic regression to detect differential responder rates between treatment arms within widespread pain strata, adjusted for age, race, and sex. Treatment effect heterogeneity in GRA responder rates between widespread pain subgroups was formally tested within the overall GLM.

Percentage Change Modeling for NRS Measures
Empirical cumulative distribution functions (ECDFs) of observed percentage change from baseline to the end of study for the 3 NRS symptom outcomes were generated by treatment arm within widespread pain subgroups. Patients with missing observations for the end of study outcome were assigned a percentage change of zero. Plots for the inverse (1 ─ ECDF) display the proportion of patients with a percentage improvement above the value indicated on the x-axis (generated in R with the package ggplot2). 32 (Figures 1-3, Extended Figures 8-10) A non-parametric Wilcoxon Rank Sum test was implemented to test differences between treatments within widespread pain subgroups given distribution of observed percentage change. 33 Further details regarding the gures can be found in the Extended Methods. Figure 1 Predicted change and observed percentage improvement in pelvic pain from baseline by RCT and pain widespreadness.

Declarations
The mean trajectory plots (left within cell) represent the estimated change in pelvic pain from average baseline for the average participant within RCT accounting for baseline outcome, age, race, and sex. Treatment effect, corresponding p-value and 95% con dence interval are displayed within plot. Mean trajectories were derived from the primary analysis models for change in outcome at end of study from average baseline utilizing multiple imputation for missing data. The 1-ECDF plots (right within cell) represent the proportion of patients whose observed percentage improvement in end of study outcome from average baseline is beyond the value on x-axis. Patients missing end of study data have an observed percentage change of 0. P-value for testing the difference in observed percentage improvement curves for treatment (Blue) and control (Red) are displayed within plot and are derived from the non-parametric Wilcoxon Rank-Sum test (see Extended Methods).

Figure 2
Predicted change and observed percentage improvement in urinary urgency from baseline by RCT and pain widespreadness.
The mean trajectory plots (left within cell) represent the estimated change in urinary urgency from average baseline for the average participant within RCT accounting for baseline outcome, age, race, and sex. Treatment effect, corresponding p-value and 95% con dence interval are displayed within plot. Mean trajectories were derived from the primary analysis models for change in outcome at end of study from average baseline utilizing multiple imputation for missing data. The 1-ECDF plots (right within cell) represent the proportion of patients whose observed percentage improvement in end of study outcome from average baseline is beyond the value on x-axis. Patients missing end of study data have an observed percentage change of 0. P-value for testing the difference in observed percentage improvement curves for treatment (Blue) and control (Red) are displayed within plot and are derived from the non-parametric Wilcoxon Rank-Sum test (see Extended Methods).

Figure 3
Predicted change and observed percentage improvement in urinary frequency from baseline for amitriptyline trial by pain widespreadness.
The mean trajectory plots (left within cell) represent the estimated change in urinary frequency from average baseline for the average participant within the amitriptyline train accounting for baseline outcome, age, race, and sex. Treatment effect, corresponding p-value and 95% con dence interval are displayed within plot. Mean trajectories were derived from the primary analysis models for change in outcome at end of study from average baseline utilizing multiple imputation for missing data. The 1-ECDF plots (right within cell) represent the proportion of patients whose observed percentage improvement in end of study outcome from average baseline is beyond the value on x-axis. Patients missing end of study data have an observed percentage change of 0. P-value for testing the difference in observed percentage improvement curves for treatment (Blue) and control (Red) are displayed within plot and are derived from the non-parametric Wilcoxon Rank-Sum test (see Extended Methods)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.