This study finds no association between anxiety and lung function impairment, defined by FEV1(%), in 12 out of 15 eligible studies. There are inconsistent findings about a potential association between depression and FEV1(%), as 15 out of 32 studies show a significant negative association. In general, the eligible studies vary regarding study size, prevalence and definition of anxiety or depression, which to some degree could explain the inconsistency. Only half as many studies were allocated to the anxiety group compared to the depression group.
Associations between anxiety and impairment of pulmonary function
Most studies show no association between anxiety and FEV1(%). There are even studies showing positive correlation coefficients or higher FEV1(%) in patients with anxiety compared to patients without anxiety [30,33-34,36,40].
Three studies by Funk et al 2009 [31], Livermore et al 2012 [32] and Allam et al 2017 [37] found a significant negative association between FEV1(%) and anxiety, while one study by Hieba et al 2021 [38] found a slightly significant association with GOLD stage 1-4 but not FEV1(%). These studies had small to medium study sizes (62-150), so the significant association cannot be explained by study size. The study population sizes in studies with significant associations did not differ from the studies showing a positive, but non-significant association between FEV1(%) and anxiety (30-291) [30,33-34,36,40]. The study population sizes of the anxiety group were generally small, as only one study with a large study population was included.
The prevalence of anxiety was relatively inconspicuous in the significant studies (22-54.5%) compared to the other studies in the anxiety group (Table 1), and comparable to the studies showing positive non-significant associations (11.3%-50%) [30,33-34,36,40]. The differences in anxiety prevalence could be due to the use of different rating scales in different study populations. However, there does not seem to be a pattern in the use of anxiety rating scales, as e.g. two of the significant studies and two of the studies obtaining positive, but non-significant associations used HADS-A, though cut-off varied between 8-11 [31-31,37,40]. Though, a study on patients with Parkinson’s disease found a high association among various anxiety scales [63]. In general, there is a great degree of heterogeneity between prevalence of anxiety in the studies, which might influence the results.
The three significant studies were all conducted in outpatient clinics in Australia, Egypt and Australia respectively. The study population in Livermore et al [32] included only patients in GOLD group II and III, whereas Allam et al [37] and Funk et al [31] included patients with stable COPD. Exclusion criteria (other unstable diseases) were not substantially different compared to the remaining studies in the anxiety group, as most studies excluded unstable patients in general. Livermore et al [32] specifically investigated panic disorder, which differs from the other studies. The patients in the study by Allam et al [37] had an average FEV1(%) of 76.6%, as such mild COPD [22]. This was a considerably higher average FEV1(%) compared to the studies by Livermore et al [32] with an average FEV1(%) of 52.9% and in one of the largest studies by Hernández-Pérez et al, with an average of 58.0% [40]. It is possible that the rating scales have a different sensitivity and specificity in patients with severe COPD compared to mild COPD, as anxiety often mimics somatic symptoms [64].
The two studies with the youngest average age of study populations found a significant association between anxiety and FEV1(%). The average age of the patients included in the study by Allam et al [37] was 50.3 years, comparable to the study by Hieba et al [38] with an average age of 57.2 years, while the average age of the patients in the remaining studies was 60-75 years (table 2). Previous studies have shown that younger people are better at describing their symptoms as anxiety, which might have an impact on the result of a study [65]. It is outside of the scope of this study to determine if the risk factors of anxiety might be different in a younger population than in an older population, and further studies would be needed to investigate this.
The percentage of females was 25%, 44% and 56% respectively in the studies with significant findings [31-32,37]. As the study populations in the studies with non-significant results consisted of 3% to 61% women this does not separate the studies with significant results from the non-significant (table 2). In this review there is no indications of specific gender differences in the association between FEV1(%) and anxiety.
Four studies used a direct comparison of average FEV1(%) between patients with and without anxiety. Three studies directly compared the prevalence of anxiety in GOLD group 1-4. Eight studies used different correlation coefficients. The studies showing significant results used both direct comparisons and correlation coefficients [31-32,37] and the same applies for the studies showing non-significant positive associations [30,33-34,36,40]. While this does not support significance implications of the statistical method, as long as it is appropriate, there are other studies that could indicate that choice of statistical method has influenced outcome: Hieba et al 2021 [38] found a significant association between GOLD group 1-4, i.e. FEV1(%) as a categorical value, but no significant association with FEV1(%) as a continuous value. Opposed to that, a significant association between FEV1(%) and anxiety severity were found, but no association with GOLD group 1-4 [38]. It is not unreasonable to think that the choice of continuous versus categorical values could influence the results.
The consequence of great heterogeneity in both outcome, exposure and statistical methods makes comparison between studies challenging.
With three studies indicating a negative association and five indicating a positive association this review does not indicate any association between anxiety and COPD. However, most studies are small and show a great heterogeneity in study population, statistic method and definition of anxiety.
Associations between depression and impairment of pulmonary function
The definition of depression varies greatly between studies. Nine different scales were used, with BDI, PHQ-9 and HADS-D being the most frequent. Even among studies using the same scale, cut-off values vary greatly. For the studies using GDS, cut off values of for example 6, 8 and 11 have been used [48-49,55]. GDS is meaningful in screening for depression in patients >65 years, as common somatic symptoms in elderly (Loss of appetite, sleep disturbances, tiredness) and possible symptoms of dementia are not included [66]. Some studies using GDS, only including patients over a certain age [48,55], but two had no age limit or even excluded older patients >65 years [46,49]. The use in younger populations is not validated. Sensitivity and specificity of GDS is comparable to the other scales (~80%) [66].
The scale yielding the highest proportion of significant studies is PHQ-9, as three out of four studies using this scale, showed significant results [58-59,62]. This might be due to the large sample sizes in those studies (630-1800). Studies have shown a specificity and sensitivity using PHQ-9, similar to the other scales [66]. Only one of the four studies using CES-D showed any significant results [43,45,50-51]. CES-D was invented for epidemiologic studies. A review by Smarr Kl et al showed that the use of CES-C yields a high degree of false positives at cut-off >16, which three of the studies in this review used, while one used >24 [43,45,50-51, 66]. On the other hand, CES-D is sensitive to anxiety and might misclassify somatic symptoms as symptoms of psychiatric disease, which have not been shown to be associated to FEV1(%) in this study [66]. Nonetheless, all four studies using CES-D had a small-medium sample size, and may therefore lack statistical power [43,45,50-51]. The heterogeneity in the definition of depression between the studies may be reflected in the prevalence of depression, ranging between 5.8%-75% (Table 2).
Common symptoms of depression include somatic symptoms such as sleep disturbances, appetite loss and weight loss [67]. Studies have shown that sleep disturbances and sedentary behaviour lead to depression in the elderly [68]. It has also previously been shown that 70% of patients with COPD have some degree of sleep disturbances [69]. Patients with COPD have a high degree of sedentary behaviour, which is even higher in case of comorbid depression [52,70]. Thus, symptoms of depression and burden of illness can be hard to distinguish in patients with COPD, possibly leading to bias or residual confounding. Most studies exclude patients with other severe comorbidities or cognitive impairment, and some exclude the oldest patients [34,40,46,57,60]. Age, comorbidities and cognitive impairment that is highly prevalent in COPD [71], also hold a risk of bias, misclassification and residual confounding. The scores used in the studies to define depression could possibly act as confounders in themselves, but it cannot be confirmed in this review.
The only large study without unambiguous significant results was Miravitlles et al [52]. Depression was here defined as BDI >10, which was the most commonly used cut-off value in the studies using BDI. Nevertheless, a prevalence of mild degree of depression of 74.6% and moderate to severe degree of depression of 51.1%, suggests a higher prevalence in this study than most studies in this review (table 2). In Miravitlles et al, the degree of depression as a continuous value, rather than categorical, was significantly correlated to FEV1(%). This suggests a significant association between depression and FEV1(%), after all. As most patients was allocated to the depression group, the heterogeneity of this group could be too big to obtain significant results when using categorical values for depression [52].
Studies have suggested that the increased mental health awareness in the last decade has led to overinterpretation of normal emotions as pathological by the individual [72]. Though the perception of depression and mental illness have changed over time, it is not clear whether time of publication was a significant confounder, as there is no apparent pattern with higher prevalence of anxiety and depression over time in this review.
Almost all of the large studies showed a significant association between FEV1(%) and depression, and even the remaining large study showed some significant results [52]. This indicates that a larger sample size is needed to obtain the power to carry studies in affective diseases in patients with COPD. All the studies with small to medium sized study populations, apart from one, show a trend of lower FEV1(%) in patients with depression, compared to patients without depression. Thus, this review suggests there may be an association between FEV1(%) and depression.
Limitations
As is the case with systematic reviews this review is susceptible to publication bias or outcome reporting bias [73]. Conference abstracts were not included, and it is possible that some data could have been retrieved from those.
Strict selection criteria increase the homogeneity and makes it possible to compare studies, but also increase the risk of exclusion-bias. Three studies mentioned FEV1(%) in a group of COPD patients with and without depression or anxiety but did not perform significance testing. FEV1(L) is not useful in this systematic review because it fluctuates with sex, age, ethnicity and height [22], and sixteen studies were excluded for only reporting FEV1(L). It could lead to bias if FEV1(%) was deliberately excluded from those studies because of non-significance or if demographic characteristics differ between groups or studies.
Many of the included studies had small to modest sample sizes, increasing the risk of lack of power, leading to an underestimation of the association between anxiety or depression and FEV1(%).
Quality assessments were done using NOS. As statistical analysis with p-value and appropriate definition of outcome were parts of selection criteria, all studies scored the maximum of two stars in the ‘Outcome’ evaluation. All studies also obtained the pulmonary function in an appropriate way, as spirometry verified COPD was a part of selection criteria. Therefore, all studies scored at least 3/8 points. The studies with smaller sample sizes would be susceptible to bias, which is also reflected in a lower NOS-score. At study level the greatest risks of bias would be the lack of control for confounders, especially symptoms, and unexplained non-respondents. Most studies (23/32) failed to inform about non-respondents, and even if they did, the risk of bias would not be completely eliminated. A previous study has found a greater risk of depressive symptoms in non-respondents [74] which could also be the case for these studies. Most studies did not control for confounders. The most important confounder would probably be symptoms of COPD, since previous studies have found a link between the symptom burden and both FEV1(%) and anxiety or depression [6-8,14,15,75].
Most studies included patients from outpatient clinics or rehabilitation. Whether this gives a satisfactory external validity depends on the access to these facilities (e.g waiting lists, referral criterions and payment), which would differ greatly from country to country. Most studies did not describe this. It is reasonable to believe that the prevalence of anxiety or depression could depend on whether patients in the study are stable or not, as a greater burden of symptoms earlier have shown to lead to anxiety and depression [6-8,14,15]. An influence on the results from selection bias on study level is definitely possible.
There was a great degree of heterogeneity between studies, which is especially evident when assessing the prevalence of depression and anxiety. It is possible that the risk of anxiety and depression differs between countries, as in the general population [76]. This review does not contain enough studies from each country, to enable evaluation of differences between nationalities.