A Real-World Data Analysis of The Distribution of Traditional Chinese Medicine Syndromes and Their Elements Among Bronchiectasis Patients

Introduction: We sought to investigate the distribution of traditional Chinese medicine (TCM) syndromes and their elements among bronchiectasis patients using real-world data. Methods: A real-world study was performed to explore the relationship between TCM syndrome and bronchiectasis using electronic medical information from 1,113 patients in China. Factor analyses were used to reduce the dimensions of TCM syndrome elements and to detect common factors. Additionally, cluster analyses were employed to assess combinations of TCM syndrome elements. Finally, association rule analyses were performed to investigate the structures of TCM syndrome elements in order to estimate the patterns of TCM syndromes. Results: A total of 17 TCM syndrome elements were extracted using this method. There were four Shi TCM syndromes of Tan_Re_Yong_Fei (36.39%), Tan_Zhuo_Zu_Fei (12.94%), Gan_Huo_Fan_Fei (11.59%), and Feng_Re_Fan_Fei (11.32%) with >5.0% distribution frequency in total sample. The highest Xu TCM syndrome was Fei_Yin_Xu (18.24%). Factor analysis, cluster analysis, and association rule analysis found that Tan, Huo, Feng, Yin_Xu, Fei, and Gan were the core TCM syndrome elements. Conclusion: In this study, TCM Shi syndromes of Tan_Re_Yong_Fei, Tan_Zhuo_Zu_Fei, Gan_Huo_Fan_Fei, and Feng_Re_Fan_Fei were detected with a high frequency among bronchiectasis patients using real-world data, as was the TCM core Xu syndrome of Fei_Yin_Xu. The core elements of Huo, Tan, Feng, Yin_Xu, Fei and Gan were found across the entire sample.


Introduction
Bronchiectasis is de ned as chronic in ammatory bronchial disease with irreversible dilation of the bronchial lumen (1). Clinically speaking, it usually presents with chronic recurrent cough, expectoration, hemoptysis, progressive dyspnea, pulmonary functional deterioration, recurrent respiratory infections, and a reduced quality of life and life expectancy (2). Pathologically, patients have abnormally dilated bronchi leading to impairment of host defenses, chronic infection with bacteria, and airways in ammation. A retrospective cohort study showed there was a high prevalence of bronchiectasis in the United States (3), and the prevalence of bronchiectasis in China is 1.5% in men and 1.1% in women (4,5). Moreover, the prevalence of bronchiectasis appears to be increasing in recent years (6).
Since a signi cant proportion of patients fail to get satisfactory treatment from modern medicine, many patients seek help from complementary and alternative medicine, especially traditional Chinese medicine (TCM) (7,8). TCM uses a classi cation system developed over thousands of years wherein different "syndromes" are used to summarize the cause, nature, and location of pathological changes at various stages of disease (9), (10). These syndromes stratify diseases according to groups of speci c symptoms that are regarded as a summary of the body's condition in the disease process (11). They describe differences in etiology and pathogenesis of a disease and emphasize the variation in individual bodies' constitutions (12). Thus, patients with the same disease may present with different syndromes because of individual differences in physiological responses; TCM considers these differences vital for prescribing appropriate treatment (9). Syndromes are thus used as the foundation for diagnosis and prescription in TCM, and are the key concept of TCM theory (9). TCM treatment for bronchiectasis has a long history, and numerous basic and clinical studies have found that TCM has curative effects on bronchiectasis (8). Since applying TCM to the treatment of bronchiectasis must be based on syndrome differentiations for optimum e cacy, it is especially important to identify the TCM syndromes commonly identi ed with bronchiectasis. However, such information is currently limited to textbooks and individual expert counseling (7), and there is limited clinical evidence of the TCM syndromes that typically present with bronchiectasis. A better understanding of TCM syndrome distribution in bronchiectasis patients may help healthcare providers improve the clinical e cacy of Chinese medicine treatment.
The advent of big data creates both opportunities and challenges for TCM (13). Real-world studies (RWS), in which data are collected in real-life practical circumstances, have become hotspots for clinical research (14), (15). Clinical data collected in the hospital, at home or abroad, are more accessible due to the development of information technology and wide use of hospital information systems (HIS) and electronic medical records (EMRs) (16). It is crucial to perform RWS on TCM syndromes, using large clinical datasets with modern data-mining and machine-learning techniques, to produce a scienti c and reliable evidence base for TCM, thereby transforming TCM from experience-based medicine to evidencebased medicine (16), (17).With this in mind, the present study attempted to investigate common distribution patterns and core elements of TCM syndromes in patients with bronchiectasis by searching real-world medical records for evidence of TCM treatment in patients with bronchiectasis.

Study design and participants
We performed a RWS to explore the distribution of common TCM syndromes in patients with bronchiectasis by using patient medical records, mainly including EMR and HIS. All medical records from 2013 to 2016 in respiratory disease wards at ve Chinese hospitals were collected. Bronchiectasis was identi ed using a centralized database of patients with bronchiectasis All adult patients with a physicianestablished diagnosis of bronchiectasis were eligible for inclusion. Bronchiectasis was diagnosed to include blood tests, sputum cultures, and on review of imaging, which recommends all non-cystic-brosisrelated bronchiectasis be con rmed by CT scan (18). The institutional review board of each participating site approved the study, as did an administrative institutional review board for the data collection center. After patients provided informed consent, medical records were queried by a study coordinator or principal investigator using standardized recording forms. Participants with other comorbid pulmonary diseases (e.g., asthma or chronic obstructive pulmonary disease), or any other comorbid diseases and conditions that might confound data interpretation (e.g., cancer, immune diseases, endocrine diseases, cardiovascular, hematologic, hepatic, renal, or neurological diseases, oral contraceptive use, or pregnancy) were excluded from the study.

Data collection and preparation
The contents of patient medical records were based on EMR and standard Chinese guidelines. General information, complaints, medical histories, modern medicine diagnoses, and TCM diagnoses were extracted. Medical record data were transferred from the EMR systems of ve hospitals and loaded onto electronic platforms capable of handling large datasets at the Institute of Biomedical Informatics and Biostatistics, the Institute of Integrative Medicine at Fudan University. Demographic information, hospitalization records, and clinical outcomes were extracted using Python 3.5 programs. Standard common data models for integrative medical treatment of bronchiectasis, including diagnosis of TCM syndromes, were created to establish uniform standard codes for data analysis. A total of 1,113 medical records with complete data, consisting of general information, TCM syndrome features, and nal TCM and modern medicine diagnoses were available for analysis.

Data analysis
Standard TCM syndrome guidelines were used to establish the elements of each TCM (19). Data analyses were carried out to estimate the distribution of different TCM syndromes and their associated elements among bronchiectasis patients, including frequency of individual distribution, frequency of different combinations, and difference analysis of TCM syndromes and combinations of their elements. Differences in variables among gender were determined by one-way analysis of variance (ANOVA), and differences in properties were detected by χ2 analysis among the groups. Tests were two-sided, and a pvalue of < 0.05 was considered signi cant. Frequency analyses were employed to explore the proportion of TCM syndrome diagnoses for bronchiectasis. Moreover, frequency analyses were performed to assess the proportion of TCM syndrome elements present in different diagnoses. Results were analyzed using the Statistical Package for Social Sciences for Windows, version 16.0 (SPSS, Chicago, IL, USA).
We performed factor analyses to reduce the dimensions of TCM syndrome elements and to explore the structures of these elements. The Kaiser-Meyer-Olkin (KMO) test and Bartlett's test of sphericity were used to evaluate the suitability of the collected TCM syndrome elements for factor analysis (20).Common factors were extracted using principal component analyses (20). Varimax rotation was used to allow for factor load absolute value of the new common factor (factor load absolute value was≥ 0.20) (20). Hierarchical cluster analyses were used to classify TCM syndrome elements using Ward's method to generate a dendrogram for estimation of similar clusters (21). Association rule analyses were carried out to investigate the structures of TCM syndrome elements in order to estimate the distribution of TCM syndromes. A set of frequent rules was generated, and the strength of these rules was then evaluated using three parameters of support, con dence, and lift. (22). Rules having a support percentage value of > 10 and a con dence percentage value of > 80 were reported. The Apriori algorithm was used to evaluate the pattern of association within TCM syndrome elements. Data mining was performed using the SPSS Modeler (version 18.0, Chicago, IL, USA) and packages in Python 3.5.

Demographic characteristics
The baseline demographics for the 1,113 individuals were listed in Table 1. The proportion of females was 60.02%, and the mean age was 65.31 years. The average duration of hospitalization was 12.06 days in total sample. There was no difference in duration of hospitalization in male than females (10.03 vs. 12.23, p = 0.115). The primary ethnicity of the participants was Han Chinese (94.79%). The vast majority (98.74%) of patients experienced improvements in their condition or were cured.

Frequency of TCM syndromes and their elements among bronchiectasis patients
The distribution of TCM syndromes among the bronchiectasis patients in our study is shown in Table 2.

Factor analysis of TCM syndrome elements
Using the entire sample, the KMO value of the partial correlation of variables was 0.731, and the approximate chi-square value of Bartlett's test of sphericity was 672.74 (p < 0.001). Similar results were obtained using the Shi and Xu groups of elements (KMO > 0.50 and p < 0.001 for Bartlett's test for both).The results of the factor load matrix after rotation transformation are listed in Table 4. In the entire sample, characteristic root values of the rst eight common factors were > 1.0, and their cumulative variance contribution rates reached 81.93. A scree plot was used to show the relevance of common factors and characteristic root values ( Figure 1A), indicating that the scatter location of the rst eight common factors was steep and that the characteristic root values for the rest of the common factors were small. Varimax rotation was used for factor rotation and transformation, and absolute factor load values that were ≥ 0.20 are listed in Table 4. Similarly, four and two common factors were extracted from the Shi and Xu syndrome groups, respectively (Figures 1B and1C).

Cluster analysis of TCM syndrome elements
Using the entire sample, hierarchical cluster analysis found that TCM syndrome elements were signi cantly different among three clusters (Figure 2A). Cluster 1 was comprised of the Huo, Tan, and Fei elements; Cluster 2 consisted of the Shi, Pi, Qi_Zhi, Xue_Yu, Du, Xue_Xu, and Han elements, while Cluster 3 was comprised of the rest of the elements. Additionally, in the Shi syndrome group, three further clusters were identi ed: Cluster 1 included the Qi Zhi, Xue_Yu, Fei, and Han elements; Cluster 2 included the Gan, Feng, Han, and Tan elements ( Figure 2B). In the Xu syndrome group, Cluster 1 consisted of the Fei and Yin_Xu elements, while Cluster 2 consisted of the Shen, Qi_Xu, and Xue_Xu elements ( Figure 2C).

Association rule analysis of TCM syndrome elements
An association rule analysis indicated associations among the Tan, Huo, Feng, Yin Xu, Fei, and Gan elements (Table 5). Using the rule algorithm, eight rules were found and are listed in Table 5. The strongest support percentage parameter was between the Fei and Huo elements (60.907%). Three rules were established which applied to the full set of elements: Huo => Gan and Fei; Fei => Tan and Huo; and Fei => Yin Xu. In the Shi syndrome group, three further rules were identi ed, suggesting associations among the Huo, Feng, Tan, Fei, and Gan elements. In the Xu syndrome group, one association was identi ed: Fei => Yin_Xu.

Discussion
Although TCM is widely used in the treatment of bronchiectasis in China, a system of syndrome differentiation using evidence-based medicine has not been established (16). Since syndrome diagnosis is key to TCM therapy, there is an urgent need for TCM researchers to perform more epidemiological studies of high methodological quality on syndrome distribution in bronchiectasis patients. To our knowledge, this is the rst study to investigate TCM syndrome distribution in bronchiectasis patients using real-world data. RWS offer insights into the interactions between patient characteristics, preferences, lifestyles, and treatment outcomes that are often excluded from double-blind clinical randomized controlled trials (RCTs) (23). This type of study also offers the opportunity to explore how these interactions differ between therapies and treatment modalities and to evaluate important clinical outcomes, such as bronchiectasis exacerbations, that can be underpowered in RCTs owing to their short duration and their pre-selection of idealized patients in whom negative outcomes are often relatively infrequent (24).
Through comprehensive summary, the results of the present study demonstrate that the four most common Shi syndromes among bronchiectasis patients, in decreasing order of frequency, were Tan_Re_Yong_Fei, Tan_Zhuo_Zu_Fei, Gan_Huo_Fan_Fei, and Feng_Re_Fan_Fei. Frequency analyses showed all four Shi syndromes were present with > 5.0% frequency. These results suggest that the four Shi syndromes are the most common in bronchiectasis. Similarly, the Xu syndrome of Fei_Yin_Xu was identi ed in a signi cant proportion of bronchiectasis patients. Additionally, frequency analyses, factor analyses, cluster analyses, and association rule analyses for different syndrome elements demonstrated that the predominant elements in the pathogenesis of bronchiectasis were Huo, Tan, Yin_Xu, and Feng, representing the four elements phlegm, re, Yin-de ciency, and wind, respectively. The main disease locations were Fei (lung) and Gan (liver), indicating that bronchiectasis can be attributed to functional disorders of the lungs and liver. However, it should be pointed out that the "liver" in TCM refers not only to the anatomical organ but also to a complex system which includes a series of functions which modern medicine ascribes to the metabolic system, central nervous system, endocrine system, blood system, digestive system, and others (25).
According to TCM theory, bronchiectasis arises from the invasion of evils (i.e., wind, heat, and dampness, which conceptually resemble pathogenic infection in western medicine) (26). Our results indicate that bronchiectasis can be divided into ve overlapping categories based on TCM syndromes. In the rst type of bronchiectasis cases, bronchiectasis may lead to retention of body uids in all meridians and collaterals, leading to the production of sputum, which clinically manifests as Tan_Zhuo_Zu_Fei (phlegmturbidity obstructing the lung) (27). In the second type of cases, the accumulation of phlegm and heat readily results in bronchial necrosis (27). This type of bronchiectasis is the end result of a pathological process involving a vicious circle of in ammation, recurrent infection, and bronchial wall damage caused by phlegm and heat (28). This phlegm-heat obstructing the lung has been associated with exaggerated in ammatory responses secondary to airway destruction (28). In the third type of cases, bronchiectasis is the result of stagnated wind evils, particularly in patients with frailty (27). In these cases, wind and heat (the pathogenetic causes) stagnate in the lungs; this usually occurs in the early stage of bronchiectasis (26). In the fourth type of cases, liver-re invading the lung syndrome appears in cases of bronchiectasis with hemoptysis, which is consistent with the traditional understanding (7).In the fth type of cases, prolonged courses of bronchiectasis, coupled with evil stagnation and frailty, can render a substantial loss of primordial Yin (29).According to TCM theory, a de ciency of bodily uid in the respiratory system, especially in the mucous epithelium, is the mechanism behind the Yin-de ciency in the lung syndrome (29). Therefore, these patients presented with Yin-de ciency in the lung syndrome. In the present study, Yin-de ciency syndrome is the most common Xu TCM syndrome with bronchiectasis patients, suggesting the importance of nourishing Yin and moisturizing the lungs(30).
In terms of distribution of the TCM syndrome in bronchiectasis, the detected frequency of phlegm-heat syndrome was signi cantly higher than that of other syndromes, indicating that phlegm-heat is a major contraindication in the treatment of bronchiectasis. The frequency of Yin-de ciency was also signi cantly higher than the other three syndromes (phlegm turbidity, liver-re, and wind-heat). It can therefore be inferred that Yin-de ciency has a wide distribution as a fundamental syndrome pattern with a high frequency of occurrence in bronchiectasis patients, especially in the elderly, frail, and patients with long illness durations.
Identi cation of the aforementioned ve syndrome categories suggests that they form the core pathogenesis of bronchiectasis and should thus be the fundamental diagnostic elements taken into account during differentiation of bronchiectasis. It follows that, when formulating a prescription to treat bronchiectasis, the ve strategies that should be considered depending on the presenting syndrome pattern are removing sputum and clearing heat, resolving phlegm turbidity, clearing away liver-re, and nourishing Yin. Furthermore, the target organs of treatment should be the lung and liver (as de ned by TCM).
This study achieved three primary results of signi cance. First, this study identi ed the distribution of TCM syndromes in patients with bronchiectasis using objective parameters from real-world datasets. This method may help reduce bias and may also encourage more practitioners to use this syndrome differentiation approach. Second, results from this study can guide clinical trials in evaluating the e cacy of TCM treatments for bronchiectasis. Once the most common TCM syndromes of bronchiectasis are identi ed, further interventional research can investigate the e cacy and safety of TCM treatments for bronchiectasis based on the speci c syndromes identi ed. Third, since this study established that wind-heat syndromes usually occur in the early stage of bronchiectasis, effective treatment can be targeted to this population to slow down the progression of the disease, ultimately reducing the burden for both patients and the medical system caused by advanced complications.
There are some limitations of this study. First, selection bias may exist because the present study was based on a RWS design. All data were derived from participants in hospitals in ve cities using a relatively small sample, and therefore may not be representative of the distribution of TCM syndromes in bronchiectasis patients in the rest of China. Additional multi-center studies with larger samples are required to verify the conclusions of the present study. Second, the different causes, severity, and stages of bronchiectasis in the original sample population were unclear because we did not subdivide the data. Third, the associations between TCM syndromes and modern medicine indicators were not explored in this work. Future research should focus in greater depth on the objective evidence of TCM syndromes in bronchiectasis patients.

Conclusion
In summary, the TCM Shi syndromes of Tan_Re_Yong_Fei, Tan_Zhuo_Zu_Fei, Gan_Huo_Fan_Fei, and Feng_Re_Fan_Fei were detected with a high frequency among bronchiectasis patients using real-world data, as was the TCM core Xu syndrome of Fei_Yin_Xu. The core elements of Huo, Tan, Feng, Yin_Xu, Fei and Gan were found across the entire sample.

Interpretation for items
Fei -the lung; Pi -the pleen; Shen -the kidney; Xin -the heart; Nao -the brain; Shi -the excess; Xu-the de ciency; Huo -the re; Tan -the phlegm; Feng -the wind; This study was approved by the Committee of Huashan Hospital, Shanghai, China. The methods were carried out in accordance with the approved guidelines.

Consent for publication:
All authors read and approved the nal manuscript.
Availability of data and material: The datasets generated and/or analyzed during the current study are not publicly available due to private information but are available from the corresponding author on reasonable request. Dataset are from the study whose authors may be contacted at Institute of Bioinformatics and Biostatistics, Institutes of Integrative Medicine, Fudan University.

Competing interests:
None declared of con ict of interest  Note: *entrance from outpatient clinic, **outcome to include improved or cured, ***difference analyses for variables between male and female.   Note: values in parentheses are results of factor load matrix after rotation transformation, factor load values that were positive and larger or equal to 0.20