Study design, setting, and population
This was a prospective, observational study conducted in 13980 public service employees of two middle-sized towns in Finland (Jyväskylä, 8499 employees, mean 47 years with 80 % females, and Kuopio, 5481 employees, mean 46 years with 78 % females). In both towns, the employees represented a wide variation of professions, like nurses, teachers, caretakers, and sanitation workers. An invitation to the study and the first questionnaire were sent via e-mail to the employees’ e-mail addresses in March - April 2017. Answers were collected via an electronic reply form. One reminding message was sent if a subject had not responded within two weeks. The subjects reporting current cough in the first questionnaire formed the population for the cluster analysis.
In order to define the prognosis of the identified clusters, a second questionnaire was sent via e-mail in April 2018 to all participants who had suffered from cough during the first survey and who had provided a permission to follow-up. One reminding message was sent if a subject had not responded within two weeks. One phone contact was made if a subject had not answered within two weeks after the reminding message.
The study was approved by the Ethics Committee of Kuopio University Hospital (289/2015). Permission to conduct the study was obtained from officials of the towns. The invitation mail requesting participation in the study included detailed information about the study. The decision of the subject to reply was considered as an informed consent.
Questionnaires
The first questionnaire included 80 items. There were questions about the subject’s household, pets, moisture damage both in their workplace and at home, family incomes, occupation, physical activity, smoking history, alcohol consumption, current medications, recent somatic symptoms, disorders diagnosed by a doctor, and general health-related questions. Many questions were adopted from two previous studies, the Health Behaviour and Health among the Finnish Adult Population study (15) and the Finnish National FINRISK study (16). Asthma-, rhinosinusitis- and gastroesophageal reflux disease-related symptoms were enquired by questions currently suggested for epidemiologic studies (17-19). Depressive symptoms were asked by utilizing the Patient Health Questionnaire-2 (20). The patients who suffered from current cough also answered to detailed cough-related questions, like those about the frequency of coughing bouts and the duration of the cough episode. They also filled in a list of potential triggers of cough as well as the Leicester Cough Questionnaire (LCQ), which was utilized to measure the cough-related quality of life (C-QOL) (21). An English version of the first questionnaire can be found as a supplementary file (additional file 1).
In the second questionnaire 12 months later the patients were inquired whether they suffered from cough and how long the cough had lasted. The questionnaire also included questions about current smoking, current moisture damage both at the workplace and at home, current pets, and current medications. Both questionnaires were first tested in a preliminary sample of 25 subjects and slightly revised before the final study.
Definitions of variables that were formed on the basis of the raw data in the first questionnaire
Current asthma was defined as doctor’s diagnosis of asthma at any age and wheezing during the last 12 months (17). Chronic rhinosinusitis was present if there was either nasal blockage or nasal discharge (anterior or posterior nasal drip) and either facial pain/pressure or reduction/loss of smell for more than three months (18). Gastroesophageal reflux disease was present if there was heartburn and/or regurgitation on at least one day per week during the last three months (19). The number of cough background disorders was calculated by summing up these disorders, giving a value from zero to three. Idiopathic cough was defined as absence of any of these disorders. Autoimmune disorder was defined as presence of a doctor’s diagnosis of hypothyreosis, rheumatoid arthritis, or other autoimmune disorders. Presence of depressive symptoms was defined as a Patient Health Questionnaire-2 score of three or more (20). Symptom sum was calculated by summing all reported symptoms except those associated with airway disorders, giving a value from zero to 14. Trigger sum was calculated by summing all reported cough triggers. There were 11 triggers to be chosen. In addition, the LCQ question number 18 was utilized for speaking as a cough trigger, giving the maximal number of potential triggers 12. Allergy was defined as a self-reported allergy to pollens, animals or food. A family history of chronic cough was defined as the presence (now or in the past) of chronic (duration more than eight weeks) cough in parents, sisters or brothers.
Statistical analysis
All variables presented in the first questionnaire were included in partitional clustering with K-means method (12). Dimension reduction and cluster analysis steps were performed using R statistical software version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) with diffusionMap, NbClust and cluster packages.
At first phase, data were preprocessed. Right skewed (skewness>1) variables were normalized with log(x+1) function. Next, ordinal and continuous variables were scaled into 0 - 1 interval. Variable’s minimum value or the lowest class got value 0 and maximum value or the highest class 1. Binary variables remained unchanged. Value 0 indicated negative or ‘no’ alternative and value 1 positive or ‘yes’ alternative. After that, distance matrix between observations with scaled variables were calculated using Manhattan distance function. Diffusion maps dimension reduction method was applied to extract diffusion coordinates from distance matrix with function diffuse using default settings.
The number of clusters was evaluated by the 24 criteria provided by the software. After that, the extracted diffusion map coordinates were clustered into groups with k-means method. Cluster membership was added to original data for further analysis.
To validate the clustering, it was also performed by separating the population to two according to the hometown. Furthermore, the analyses were repeated by excluding those background variables with no plausible biological association with cough (like hometown, years of education, alcohol consumption etc.). The validation of the clustering also included the comparison of the prognosis between the clusters.
Statistical analysis between the clusters was performed by Mann-Whitney U test and chi-square test and the interrelationships of the variables was analyzed by the Spearman’s correlation coefficient (rs) using SPSS software version 22.0 (IBM SPSS Statistics for Windows. Armonk, NY, USA). Youden index (the value giving the best sum of sensitivity and the specificity) was utilized to define the cut-off values. The values are expressed by either means and standard deviations, medians and ranges, or percentages. A p value less than 0.05 was accepted as the level of statistical significance.