Epidemiology and association rules analysis for pulmonary tuberculosis cases with extrapulmonary tuberculosis from age and gender perspective: a large-scale multi-center observational study in China


 Background Tuberculosis (TB), a multi-systemic disease with protean presentation, remains a major global health problem. Although concurrent pulmonary tuberculosis (PTB) and extrapulmonary tuberculosis (EPTB) cases are commonly observed clinically, knowledge regarding concurrent PTB-EPTB is limited. Here a large-scale multi- center observational study conducted in China aimed to study epidemiology of concurrent PTB-EPTB cases by diagnostically defining TB types then implementing association rules analysis.Methods This study was conducted at 21 hospitals in 15 provinces in China and included all inpatients with confirmed TB diagnoses admitted from Jan 2011 to Dec 2017. Association rules analysis was conducted for cases with concurrent PTB and various types of EPTB using the Apriori algorithm. Results Evaluation of 438,979 TB inpatients indicated PTB was the most commonly diagnosed (82.05%) followed by tuberculous pleurisy (23.62%). Concurrent PTB- EPTB was found in 129,422 cases (29.48%) of which tuberculous pleurisy was the most common concurrent EPTB type observed. Fully adjusted multivariable logistic regression models demonstrated that odds ratios of concurrent PTB-EPTB cases varied by gender and age group. For PTB cases with concurrent EPTB, the strongest association was found between PTB and concurrent bronchial tuberculosis (lift=1.09). For EPTB cases with concurrent PTB, the strongest association was found between pharyngeal/laryngeal tuberculosis and concurrent PTB (lift=1.11).Confidence and lift values of concurrent PTB-EPTB cases varied with gender and age.Conclusions Numerous concurrent PTB-EPTB case types were observed, with confidence and lift values varying with gender and age. Clinicians should screen for concurrent PTB- EPTB in order to improve treatment outcomes.


Background
Tuberculosis (TB) is an infectious disease caused by the bacillus Mycobacterium tuberculosis. Despite eradication efforts, TB remains a major global health problem and cause of serious illness for millions of people each year. According to the World Health Organization (WHO), the estimated global incidence of TB was approximately 10.0 million cases in 2019 [1].TB typically affects the lungs (pulmonary TB, PTB), but can also affect other sites (extrapulmonary TB, EPTB) that include pleura, lymph nodes, abdomen, genitourinary tract, skin, joints and bones, meninges, etc. [2][3][4].
In recent years, considerable effort has been expended to gain a deeper understanding of TB [5][6][7], a multi-systemic disease with a protean presentation. In fact, in clinical practice PTB and EPTB have been both detected in the same patient [8,9]. It was reported that about 10%-50% of EPTB patients have concomitant pulmonary involvement [8]. The effective treatment regimens for concurrent PTB-EPTB cases were shown to be more di cult to administer and different from effective regimens used for treating single PTB or EPTB cases [10].As information pertaining to concurrent PTB-EPTB is scarce, research efforts are needed to diagnostically de ne TB case types and explore the epidemiology and association rules between PTB and different types of concurrent EPTB. In this work we addressed this gap in knowledge by conducting a large-scale multi-center observational study. The resulting acquired knowledge will be used to alert clinicians to the common occurrence of concurrent PTB-EPTB disease and improve treatment regimens used for such cases.

Study subjects
The study was performed at 21 Hospitals in 15 provinces in China. All inpatients with con rmed TB diagnoses from 2011 to Dec 2017 were included in the study. TB was mainly categorized by lesion site. Diagnosis of TB was made based on WHO guidelines [11] and the Clinical Diagnosis Standard of TB issued by the Chinese Medical Association [12].In general, TB was diagnosed using both traditional and modern methods based on clinical symptoms and physical signs together with results obtained using bacteriological methods (including sputum smear microscopy, bacterial culture and molecular diagnostic methods), tuberculin skin testing (TST) or puri ed protein derivative (PPD)testing, X-ray ndings, T-SPOT.TB and/or Gene Xpert MTB/RIF assay results, outcomes of treatment with a course of anti-tuberculosis chemotherapy, etc.

Data management and statistical analysis
Measures taken to guarantee data quality included both the implementation of a standardized study protocol and standardized training of research staff. Moreover, trained health workers collected medical information using a standardized questionnaire. Meanwhile, we obtained clinical characteristics of TBa icted inpatients (e.g., age, gender, site of disease, etc.) from medical records. Descriptive statistical analysis included frequencies and proportions with 95% con dence intervals (CIs) for categorical variables. Multivariable logistic regression analysis was used to examine the associations of gender and age group and determinations of odds ratios were conducted for concurrent PTB-EPTB cases. P<0.05 was the threshold for statistical signi cance.
Association rules analysis, a technique used to discover relationships hidden in large databases, was used in this work. This technique, which was developed for the computer sciences, has been used in a variety of other elds [13][14][15].The Apriori algorithm makes it possible to apply a set of association rules to data mining. The principle of Apriori is based on two steps: the rst step searches for item sets that exceed the minimum support threshold value, while the second step generates association rules then lters them by selecting "con dence" item sets (based on a threshold) from item sets found in the rst step [16,17]. For the association rule of A concurrent with B, support, con dence and lift were de ned as: Support=P(A), Con dence=P(B|A), Lift=P(A∩B)/[P(A)*P(B)] whereby A is antecedent and B is consequent. Lift was used to evaluatethe magnitude of association rules whereby lift>1 indicates a positive association rule. Association rules for concurrent PTB and diverse types of EPTB were analyzed using the Apriori algorithm by setting the minimum support degree and the minimum con dence degree.

TB patient characteristics
A total of 438,979 TB inpatients were included from Jan 2011 to Dec 2017 at 21 hospitals from 15 provinces in China, most of which were specialized tuberculosis hospitals. The ratio of male to female patients was 1.83. In these patients, 83 tuberculous lesion types were detected at 604,114 sites, with each patient harboring, on average, 1.38 TB lesion types. The most common types of TB were PTB (82.05%, 95%CI: 81.94%-82.16%), followed by tuberculous pleurisy (23.62%, 95%CI: 23.49%-23.74%) and bronchial tuberculosis(7.01%, 95%CI: 6.94%-7.09%). Types of TB found in proportions of ≥0.1% of total cases are shown in Table 1.

The most common concurrent PTB-EPTB types
According to association rules analysis of concurrent PTB-EPTB cases, the 20 most common concurrent PTB-EPTB case types are listed in Table S1 in the Supplementary Appendix, sorted by case number. Case numbers of concurrent PTB and tuberculous pleurisy (15.35%, 95%CI: 15.25%-15.46%) and concurrent PTB and bronchial tuberculosis (6.28%, 95%CI: 6.20%-6.35%) were greater than numbers of other types of concurrent PTB-EPTB cases.

Association rules analysis of concurrent PTB-EPTB cases
In order to nd most of the possible association rules for Antecedent=PTB, the minimum con dence degree was set to 1.00%. After executing the association model, six association rules were obtained. The association rules are shown in Table 3 and were sorted by con dence. The rst rule row (ID=1) in Table 3 was interpreted as demonstrating that for a total of 360,187 PTB cases (Instances), PTB accounted for 82.05% of all TB cases (Support),while PTB with concurrent tuberculous pleurisy accounted for 18.71% of PTB cases (Con dence). The con dence value for concurrent bronchial tuberculosis in PTB cases was next highest (7.65%), followed by tuberculous meningitis (2.72%). The strongest association rule for PTB with concurrent EPTB was found for PTB with concurrent bronchial tuberculosis (lift=1.09), whereby the lift value of 1.09 indicated that PTB was positively associated with bronchial tuberculosis.
In order to nd most of the possible association rules for Consequent=PTB, the minimum support degree was set to 0.1% and the minimum con dence degree was set to 40%. After executing the association model, 22 association rules were obtained, including ve rules with con dence values above 70%. The association rules shown in Table 4 are sorted by con dence. The rst rule row (ID=1) in Table 4 was interpreted as showing that 2,382 cases (Instances) of pharyngeal/laryngeal tuberculosis accounted for 0.54% of all TB cases (Support), while pharyngeal/ laryngeal tuberculosis with concurrent PTB accounted for 91.23% of pharyngeal/ laryngeal tuberculosis cases (Con dence).The con dence value of concurrent PTB in bronchial tuberculosis cases was next highest (89.51%), followed by that of tuberculosis of mediastinal lymph nodes (77.57%). The strongest association rule for EPTB with concurrent PTB was found for pharyngeal/laryngeal tuberculosis (lift=1.11), indicating that pharyngeal/laryngeal tuberculosis was positively associated with PTB.
Association rules of concurrent PTB-EPTB types with gender Most types of TB can be found in both males and females, with obvious exceptions of ovarian tuberculosis, oviduct tuberculosis, etc. We found association rules in males and females by setting the minimum support degree and the minimum con dence degree (Tables S2&3 in the Supplementary Appendix). In males, tuberculous empyema with concurrent PTB was the strongest association rule (lift=1.20), followed by costal tuberculosis with concurrent PTB (lift=1.16). In females, bronchial tuberculosis with concurrent PTB was the strongest association rule (lift=1.64), followed by supraclavicular lymph node tuberculosis with concurrent PTB (lift=1.56).
Association rules of concurrent PTB-EPTB types with age We found association rules in all age groups by setting the minimum support degree and the minimum con dence degree (Tables S4~S10 in the Supplementary Appendix). In patients <15 years of age, tuberculous meningitis with concurrent PTB was the strongest association rule (lift=3.89), followed by tuberculosis of axillary lymph nodes with concurrent PTB (lift=3.78) and cervical vertebra tuberculosis with concurrent PTB (lift=3.61). In patients aged 15-24 years, splenic tuberculosis with concurrent PTB was the strongest association rule (lift=2.23), followed by tuberculous myelitis with concurrent PTB (lift=2.18).
In patients aged 25-34, oviduct tuberculosis with concurrent PTB was the strongest association rule (lift=3.09), followed by endometrial tuberculosis with concurrent PTB (lift=2.17). In patients aged 35-44 years, the strongest association rule was again oviduct tuberculosis with concurrent PTB (lift=1.71), with endometrial tuberculosis with concurrent PTB (lift=1.69) ranked next. In patients aged 45-54 years, the strongest association rule was vocal cord tuberculosis with concurrent PTB (lift=1.61) that was followed by wrist joint tuberculosis with concurrent PTB (lift=1.57). In TB patients aged 55-64 years, the strongest association rule was ankle joint tuberculosis with concurrent PTB (lift=1.55), followed by adrenal tuberculosis with concurrent PTB (lift=1.51). In TB patients aged ≥65 years, the strongest association rule was tuberculous pericarditis with concurrent PTB (lift=1.42), with hilar lymph nodes with concurrent PTB (lift=1.30) ranked next.

Discussion
TB is transmitted when people a icted with PTB expel M. Tuberculosis bacteria into the air; these airborne bacilli are subsequently breathed in by another person then settle in the lungs and grow. From there they can be disseminated throughout the lymphatic or hematogenous systems and subsequently infect single or multiple extrapulmonary sites, such as the pleura, lymph nodes, meninges, bones and joints, etc. Although PTB is the most common presentation of TB, EPTB contributes considerably to morbidity, lifelong sequelae, and mortality [18].The mechanisms for EPTB dissemination are complicated [19].Notably, cases of PTB concurrent with EPTB are common in clinical practice, but data regarding such cases are limited. Given the variety of clinical presentations and the nonspeci c systemic symptoms of TB, a more profound understanding of the site distribution of TB is needed. In this study, we delineated the diagnostic types of TB observed in clinical practice and explored the association rules of concurrent PTB-EPTB cases in order to alert clinicians to the frequency of concurrent PTB-EPTB cases within a large TB sample population.
Tuberculous pleurisy is one of the most common forms of EPTB [20].Here we found that tuberculous pleurisy(23.62%), thought to primarily result from a hypersensitivity reaction to tuberculous protein [21], was the second most common type of TB observed. Previous studies had also noted concurrent PTB-EPTB cases [8,9], as Boonsarngsuketal. [8] demonstrated that 12.2% of TB cases had concurrent PTB-EPTB (120/986). In this study, concurrent PTB-EPTB was found in about 30% in TB patients. We also found that the strongest association rule of PTB concurrent with EPTB was found for PTB concurrence with bronchial tuberculosis (lift=1.09); indeed, here 7.65% of PTB cases had concurrent bronchial tuberculosis. This association may stem from the fact that bronchi are adjacent to the lungs and this close proximity predisposes PTB patients to bronchial tuberculosis. However, this outcome is not inevitable, as the proportion of EPTB cases with concurrent PTB varied depending on whether patients were primarily viewed as EPTB or PTB patients. Meanwhile, pharyngeal/laryngeal tuberculosis was an infrequent manifestation of EPTB and was usually seen as a complication of PTB [22].The strongest association rule of EPTB concurrent with PTB was pharyngeal/laryngeal tuberculosis concurrent with PTB (lift=1.11). In this study, 91.23% of pharyngeal/laryngeal tuberculosis patients had concurrent PTB, re ecting the fact that expulsion of M.tuberculosis by PTB patients would likely affect the larynx. Consistent with this supposition, most pharyngeal/laryngeal tuberculosis patients had concurrent PTB. These ndings suggest that once PTB was diagnosed, patients should be evaluated for bronchial tuberculosis rst .Conversely, once a diagnosis of pharyngeal/laryngeal tuberculosis or bronchial tuberculosis is con rmed, attention should be paid to evaluating the patient for PTB. In essence, the diagnosis of TB must be carefully undertaken to avoid misdiagnosis.
Although most types of tuberculous lesions can be found both in males and females, females (OR=1.119, 95%CI: 1.104-1.134) were more likely to have concurrent PTB-EPTB than males in this study. Jung et al. [23] also found female gender to be an independent predictor of concomitant endobronchial TB in patients with active PTB (OR=4.35,95%CI:1.78-10.63).Meanwhile, the magnitude of association rules of concurrent PTB-EPTB also varied with gender. For males in this study, tuberculous empyema with concurrent PTB was the strongest association rule (lift=1.20), with proportions of tuberculous empyema with concurrent PTB in males observed that exceeded 70%. In females, bronchial tuberculosis with concurrent PTB was the strongest association rule (lift=1.64), with a proportion of bronchial tuberculosis with concurrent PTB in females observed that was greater than 55%, although the difference of association rules was not clear, warranting additional investigation.
TB affects all age groups, but the overall best estimate for 2019 revealed that about 90% of cases were adults (aged ≥15 years), while TB prevalence has been shown to be strongly associated with age [1].Here we found that the strongest association rule in children and adolescents (<15 years) was PTB with concurrent tuberculous meningitis (lift=3.89), a result that may re ect particular physiological characteristics and immunological mechanisms unique to children and adolescents. We also found that the strongest association rule in TB patents aged 15-24 years was splenic tuberculosis with concurrent PTB (lift=2.23). Splenic tuberculosis is mainly caused by hematogenous dissemination of a small number of bacteria that directly spread to the spleen via lymphatic pathways and adjacent organs [24]. Meanwhile, our results showed that oviduct tuberculosis with concurrent PTB was the strongest association rule in groups of patients aged 25-34 years (lift=3.09) and 35-44 years (lift=1.71). Indeed, oviduct tuberculosis is an important chronic pelvic disease and an etiological factor underlying infertility. Transmission of infection usually occurs via hematogenous systems, while direct spread from abdominal organs and the peritoneum are also possible [25].Other EPTB types including isolated vocal cord tuberculosis and ankle tuberculosis were rarely reported; however, vocal cord tuberculosis with concurrent PTB was the strongest association rule in patients aged 45-54 years(lift=1.61), while ankle joint tuberculosis with concurrent PTB was the strongest association rule in patients aged 55-64 years (lift=1.55). Yet another form of EPTB, tuberculous pericarditis, refers to a M. tuberculosis infection of the membrane that covers the heart (pericardium). In endemic TB areas, tuberculous pericarditis has been found in 1% to 2% of people a icted with pulmonary TB [26]. We found that tuberculous pericarditis with concurrent PTB was the strongest association rule in ≥65-year-old patients (lift=1.42).
This study had several strengths, including its large-scale multi-center representative sample, its detailed analysis of diagnostic types of TB and its novel determinations of con dence/lift values of concurrent PTB-EPTB cases. However, the study also had several limitations. First, it may have been subject to Berkson's bias, since the study population included hospitalized TB patients of which concurrent PTB-EPTB cases had a high likelihood of being hospitalized causing the subsequent overestimation of their proportions within the general population. Therefore, data collected from whole population-based studies are still needed to clarify the associations. Secondly, most hospitals in our study were TB-specialized hospitals; thus, our ndings may not represent the general TB patient population or apply to settings elsewhere in the country. Thirdly, some TB-specialized hospitals in China do not admit pediatric TB patients and therefore our results may underestimate proportions of pediatric TB cases. Lastly, the analysis did not consider low-frequency disease complications and comorbidities in China (e.g.,HIV) and other factors such as new or retreatment, income, smoking ,etc.

Conclusions
In conclusion, the present study revealed several types of concurrent PTB-EPTB cases and analyzed the association rules between PTB and EPTB for the rst time in a large sample population. Concurrent PTB and tuberculous pleurisy was the most common concurrent PTB-EPTB case type observed. The strongest association rule in PTB with concurrent EPTB was PTB with concurrent bronchial tuberculosis. The strongest association rule in EPTB with concurrent PTB was pharyngeal/laryngeal tuberculosis with concurrent PTB. Con dence and lift values of concurrent PTB-EPTB cases varied with gender and age. Clinicians should be aware that concurrent PTB-EPTB cases are common and that these patients require administration of customized treatment regimens in order to achieve the best outcomes.

Declarations
Ethical Approval and Consent to participate:Given that the medical information of inpatients was recorded anonymously by case history, which would not bring any risk to the participants, the Ethics Committee of Beijing Chest Hospital, Capital Medical University approved this study with a waiver of informed consent from the patients. Consent for publication:Not applicable.
Availability of supporting data: Data are not publicly available. However, de-identi ed data can be obtained on a reasonable request to correspondence.
Competing interests: The authors declare that they have no competing interests.    The rst column represents the consequents (the "then" part of the rule), while the next column represents the antecedents (the "if" part of the rule).
ID displays the sequence of the association rules.
Instances display the cases of TB.