Socio-demographic Correlates of Diabetes Self-reporting Validity: An Adult’s Kurdish Population-based Study

Background: The aim was to assess the validity of self-reported according to the demographic and socio-economic feature in a the Dehgolan Prospective Cohort Study (DehPCS) Methods: We performed a cross-sectional analytical study on 4400 subjects aged 35-70 years of DehPCS. The reference for having diabetes was oral hypoglycemic drug consumption, insulin injection, or high FBS representing diabetes. Self-reported diabetes status was investigated by well-trained interviewers before the identication of diabetes status based on reference criteria. The validity criteria of self-reported diabetes were assessed using sensitivity, specicity, positive and negative predictive values. Socio-demographic correlates of self-reported agreement were examined by multinomial logistic regression. Results: 3996 agreed to participate in this study (participation=90.8%). The diabetes prevalence among the study population was 13.1% based on self-report and 9.7% based on reference. Of the 523 people who reported diabetes, 213 (41.28%) did not have diabetes. We found a good agreement of 92.3% with an acceptable kappa value of 65.1% between self-reported diabetes and reference. Self-reported diabetes also guaranteed sensitivity of 78.5%, specicity of 93.9%, positive and negative predictive value of 58.7% and of 98.0% respectively. Female, higher economic class, higher BMI, and family history of diabetes were increased the chance of false positive. Being male and aging and moderate economic class increased the chance of false positive. Conclusion: Self-reported diabetes is identied as a relatively valid tool that could fairly determine the diabetes prevalence in epidemiological studies. It should be noted that its validity is inuenced by some socio-demographic characteristics.

received incorrect information from specialists (5). However, studies show that about half of people with diabetes are aware of their disease (2) and only one-fth of these people have controlled diabetes. So, the accuracy and validity of disease estimates could be affected by self-reporting of people (12). Analyzing self-reported data can help us to better understand the quality of self-reported information. Identifying sociodemographic factors in uencing self-report validity can also be important for planning public health policies for more vulnerable groups.
In fact, it is important both to interpret existing data and to planning future research on the diagnosis of diabetes and its consequences. Understanding the causes of discrepancies between self-reported diabetes and standard criteria is an important basis for determining the most appropriate approach in future research programs. To this end, we used data from the DehPCS study to assess the validity of self-reported diabetes based on reference criteria, including the history of taking oral anti-diabetic drugs, insulin injection, or high fasting blood sugar (FBS).

Study population
The present study is a cross-sectional analytical study using enrollment phase data of the Dehgolan Prospective Cohort Study (DehPCS). DehPCS is one of 18 prospective epidemiological cohort studies in Iran (PERSIAN), which is being performed on the population of 35-70 years old, permanent residents of Dehgolan with the aim of assessing the risk factors of common non-communicable diseases in the region. All PERSIAN sites use the same protocol to conduct the study. The questionnaires used in this study have different sections including general factors (demographic, and socioeconomic characteristics, lifestyle, environmental exposure, occupational exposure, physical activity, and personal habits) medical factors (medical history, clinical symptoms, family medical history, drug use, reproductive history, oral health, general health, anthropometry, physical exam, blood and urine analysis) and nutritional factors (food frequency, eating habits, and supplementation). Sampling was done by a simple cluster sampling method. 4400 people were invited to participate in the study. The participation rate was 90.8% among eligible individuals. Out of a total of 3996 participants, 3976 had adequate information about diabetes self-reporting, blood samples, and taking medication or insulin injection and were considered for further assessment. The study design and rationale for conducting the study were published previously (13,14).

Data collection and measurements
In the rst step, participants were invited to the study site. Initially, the informed consent form was signed by participants. Then, to collect information, they were enrolled in the online software and received a unique code. All data were collected by expert interviewers who had completed the necessary training courses according to the executive protocol. For para-clinical tests, biological samples (blood and urine) were rst collected on an empty stomach. We measured the weight using the Seka scale and the height using the Seka stadiometer to the nearest 0.1 cm. Body mass index (BMI) was calculated as weight in kilograms divided by height in square meters. Blood pressure was measured using a Richter aneroid sphygmomanometer after at least 15 minutes of rest, with two measurements in the right arm at intervals of at least half an hour. The mean of the two measurements was considered as the mean of systolic and diastolic blood pressure. According to the JNC-7 criteria, people with systolic blood pressure ≥ 140 mmHg, or diastolic blood pressure ≥ 90 mmHg, or people with a history of taking antihypertensive drugs were considered hypertensive. The o cial age of the participants was considered based on their identity cards. Education was measured based on the number of years the person had studied. Economic status was calculated based on the wealth index using the Multiple Correspondence Analysis (MCA) method with analysis of principal components regarding durable goods, housing features, and other facilities. Individuals with a history of smoking less than 100 cigarettes during their lifetime were considered non-smokers. The use of illicit drugs was de ned as the use of drugs once a week for at least six months, and alcohol consumption as drinking about 200 ml of beer or 45 ml of alcohol once a week for at least six months. Family history of diabetes was also assessed in rst-and second-degree relatives. Second-degree relatives refer to people with whom we share 25% of the genome. It is noteworthy that we collected self-reporting diabetes data before identi cation of diabetes status based on reference criteria.

Diabetes measurement
Diabetes self-reporting was assessed by asking the following question, "Have you ever had diabetes in the past?" People who answered yes, were asked the next question, "Who told you that you had diabetes?" All those who answered diagnosed by a physician were considered to have self-reported diabetes. The reference criterion for the diagnosis of diabetes included abnormal fasting blood sugar (FBS) indicating diabetes or positive history of routine insulin use or oral hypoglycemic drugs. FBS ≥ 126 mg / dL (7 mmol / L) was considered diabetes. Drug use on the day of blood sampling was assessed with the following question, "Do you routinely use anti-diabetic drugs or insulin?" If the answer was yes, the used drugs were visually evaluated.

Statistical analysis
Diabetes self-reporting validation was performed using the following criteria. Sensitivity Likelihood Ratio (LR-) as false negative rate (FNR) divided by Sp. Kappa coe cient was another calculated statistics. kappa examines free chance concordance between two diagnostic approaches. 95% Con dence Intervals (CI) were calculated for all values based on the standard method for proportion. Validity was calculated overall and based on demographic, and socioeconomic characteristics, three categories of body mass index (BMI), personal habits, and hypertension status. Binary and multinomial logistic regression was used to examine concordance between self-reported diabetes and the reference value. To examine diagnostic characteristics of selfreported diabetes plus sex and age, we used Precision-Recall Curve (PRC). PRC presents PPV against Se. The twosided test with an alpha level of 0.05 was considered for statistical signi cance. All analysis was done by using Stata software version 16 (Stata Corp, College Station, TX, USA).

Results
Out of 3976 participants with adequate information about diabetes self-reporting and their reference criteria, 2241 (56.26%) participants were female and 1735 (43.74%) were male. The mean age of male and female participants was 47.98 ± 8.91 years and 48.78 ± 8.91 years, respectively. Most participants had a lower level of education than high school, and about 31% of them were illiterate. The mean BMI of the participants was 28.00 ± 4.58 kg/m 2 and 32.31% of them were in the obese group with BMI ≥ 30 kg/m 2 . In terms of blood pressure, 21.50 % of them had a systolic blood pressure ≥ 140 or a diastolic blood pressure ≥ 90. Also, 27.81% of people reported a history of diabetes in their rst-degree relatives. Demographic characteristics and basic information of the participants were shown in Table 1. The prevalence of diabetes based on self-report was 13.1%, and this estimate based on reference criteria was 9.7% among the study population. Of the 523 people who reported diabetes, 213 (41.28%) did not have diabetes according to the reference criteria. Among people treated for diabetes, 167 (63.74%) people had poorly controlled diabetes, and among people with diabetes (with high FBS or treated with drugs or insulin (21.50%)), 83 did not know they had it ( Figure 1). Table 2 shows the validation of diabetes self-reporting based on demographic, socioeconomic, and some individual variables. The percentage of general agreement and agreement based on kappa statistics was 92.3% and 65.1%, respectively. The estimated value of kappa statistics varied between 45.5% and 81.1% based on the characteristics of the individuals under study. In general, the kappa agreement was higher in men, older age groups, people with poor economic status, and people with normal weight, ex-smokers, and people with high blood pressure. The overall Se and Sp were 78.5% and 93.9%, respectively. Se increased and decreased with age and weight, respectively. The total PPV and NPV were 58.7% and 98.0%, respectively. Unlike Se, PPV was signi cantly higher in men than in women. With age, PPV increased by more than 38% so that in the age group over 60 years, it reached more than 72%. Also, PPV was higher among people with hypertension and people with a family history of diabetes. Figure 2 shows the PRC. The area under the curve was 64.24% for the full model and 61.60% for the reduced model. According to Table 3, in multivariate analysis, independent factors in uencing the increase in discrepancy between diabetes self-reporting and the reference value included female gender, celibacy, moderate to high economic status, higher BMI, and having a rst-degree relative with diabetes. Multinomial logistic regression results suggested that females, single people, those in the upper economic class, people with higher BMI, and those having a rst-degree relative with diabetes were more likely to falsely report diabetes (FP). Conversely, the probability of false reports of not having diabetes was higher in men, older people, and those in the lower economic class. However, female gender, older ages, higher BMI, previous history of smoking, high blood pressure, and family history of diabetes signi cantly increased the true reports of diabetes mellitus (TP) compared to those of non-diabetes (TN).

Discussion
In this study on the validity of self-reported diabetes in a large Kurdish population, we found self-reported diabetes had a moderate sensitivity of 78.5%, a high speci city of 93.9%, a fairly good positive predictive value for selfreporting diabetes of 58.7%, and a high negative predictive value for self-reporting no diabetes of 98.0%. The agreement between self-reported diabetes and reference criteria was fairly good with Kappa of 65.1% and concordance of 92.3%. Besides, we showed that the demographic, anthropometric, and habitual features of subjects had largely in uenced the accuracy of self-reported diabetes. In this case, being female, increase in age, increase in BMI, being an ex-smoker, having HTN, and family history of diabetes increase the odds of true positive rate in diabetes self-reports. We found 31% of diabetic participants (120 out of a total of 386) were not under any medication for diabetes. The previous reports on this issue showed almost the same statistics (15,16); however, we demonstrated an updated validation of diabetes self-reports among a large Kurdish population of Iran.
The epidemiological surveys commonly applied either self-report or medical records of chronic diseases to estimate their incidence or prevalence (17). Among chronic diseases, self-reports of diabetes were identi ed to be more accurate with a higher level of agreement (18)(19)(20). Our ndings on the accuracy of self-reported diabetes were in line with the recent similar studies that showed the sensitivity of 75-79.3% and speci city of 95.8-98.4% (16, 21).
However, older previous studies showed lower sensitivity of 61.5-69.7% for diabetes self-reports (20,22). This increasing trend in the accuracy of diabetes self-reports can be explained by the increase in awareness of society and the development of the health care system over time (23). Meanwhile, the difference in this accuracy over time can be due to the different demographic features of the studied population since we similar to previous studies revealed that the accuracy of diabetes self-reports was largely dependent on the baseline characteristics of study participants (4,16,20).
The results of the multivariable analysis showed that women were more likely to have a disagreement of selfreported diabetes with the reference, higher false positive and true positive rates and lower false negative rate than men. One explanation for this nding is that women take better self-care behaviors and use more health care services (24). Moreover, women take more attention to their dietary consumption. In this instance, they tend to count daily carbohydrates intake and consume less fat (25). Thus, they were more likely to nd themselves in diabetic condition and reported more true positives and false positives. We also found that increment in the age of study participants was associated with higher odds of true positive and false negative rates of self-reported diabetes. The higher false negative self-reports in older participants can be due to a recall bias because of Alzheimer's disease or age-related memory loss (26) and higher true positive self-reports among older individuals can be due to more health care delivery and more opportunity to undergo blood sugar testing in this population (24). We also observed that increment in BMI was associated with higher odds of the discordance between diabetes self-reports and the reference criteria, higher true positives, false negatives, and false positives of self-reported diabetes. In the previous studies in line with this study, obesity, as well as an increase in BMI, resulted in higher odds of diabetes development in this population and consequently higher true positive and false negative rates (16, 20,27). This nding can be attributed to insulin resistance condition in obesity as well as poor self-care of overweight and obese individuals (28, 29).
In this study, participants with HTN were more likely to truly report their diabetes. This nding can be due to better monitoring of other metabolic syndrome risk factors in this population and higher awareness about their health. In line with previous studies, we observed no signi cant change in the odds of false negative and false positive rates among populations with HTN (16).
Positive family history of diabetes, particularly from the rst-degree relatives, showed a high level of discordance in diabetes self-reports. In this instance, subjects with positive family history were more likely to develop diabetes that this issue explained higher true positive rates of diabetes self-reports in this group. Besides, similar to previous studies, subjects with positive family history tend to report diabetes more frequently which leads to higher false positive rates (16, 30).

Strengths and Limitations:
This study had several strengths worth to be stated. This study was a large population-based survey derived from the PERSIAN cohort of Iran and had a low risk of attrition bias with a high response rate (91%) of enrolled residents of Dehgolan, the Kurdish region of Iran. Thus, we could generalize our ndings to the whole Kurdish population of Iran. This study also had several limitations worth to be discussed. This study examined the self-reported prevalent diabetes; thus, this validation could not be applied for the studies investigating incident diabetes. As stated, this validation was conducted in the west part of Iran and due to the racial, ethnic, and socio-cultural diversity of other regions of Iran, we required further validation to determine the accuracy of self-reported diabetes in other population and to elucidate the impact of socio-cultural nature of each region on the accuracy and discordance of self-reported diabetes.

Conclusion
We have found self-reported diabetes with moderate sensitivity indicating high awareness of the general Kurdish population of Iran about their diabetic status, high speci city, fairly good PPV, and very high NPV, re ecting good accuracy of self-reported diabetes for detecting diabetes in this population. We also found good agreement between self-reported diabetes and reference criteria. Thus, diabetes self-reporting could be used as a relatively valid tool to identify diabetes prevalence in future epidemiological studies on the Kurdish population of Iran.
Besides, we revealed sociodemographic and habitual characteristics of individuals have largely affected this validity and should be considered to warrant more accurate estimation.

Consent for publication
Not applicable Availability of data and materials The datasets generated and/or analysed during the current study are not publicly available due to privacy and ethical restrictions but are available from the corresponding author on reasonable request. Figure 1 frequency overlap of self-reported diabetes and diabetes measured by reference criteria (high FBS + treatment) Figure 2 diagnostic characteristics of reduced model (Self-reported diabetes, Sex, Age) in comparison with full model (Selfreported diabetes, sex, age, marital status, economic status, BMI, smoking status, alcohol use, HTN, and family history of HTN)