2.1 Subjects
The study population was a stratified random sample of resident type 2 diabetic patients aged 18 to 75 years old, and were selected as per gender and urban/rural areas in the Shanxi chronic disease health surveillance sites. This study was approved by the ethics committee of Shanxi Provincial Center for Disease Control and Prevention (Number: SXCDCIRBPJ2019036001). The inclusion criteria could be listed as follows: (1) type 2 diabetes diagnosis following 1999 WHO diagnostic criteria for T2DM(11); (2) signed informed consent. The exclusion criteria comprise: (1) mental disorders; (2) pregnant women; (3) patients with serious acute or chronic diseases (4) patients with respiratory infectious diseases; (5) other patients unable to complete the information collection and refused to sign the informed consent form. All surveys were completed by 1830 patients, excluding 85 patients who had already received DR treatment, leaving 1745 patients for final inclusion in the analysis.
2.2 Data collection
This is a cross-sectional study, and the data was obtained from a four-part survey consisting of a questionnaire, physical examination, DR examination and laboratory tests. Detailed information about the variables could be available in the supplementary table 1.
The questionnaires were prepared by the National Task Force on Chronic Disease Surveillance and were administered centrally to patients using face-to-face questioning by uniformly trained and qualified surveyors in a separate set-up questionnaire questioning area, which could not be completed by the survey respondents themselves. The survey was investigated from 21 June 2019 to 1 August 2019. The questionnaire includes basic information, lifestyle habits and diabetes history of the survey respondents. The demographic information comprises age, sex, region, educational background and marital status (cohabiting and separated).
Lifestyle habits consist of smoking, daily diet, physical activity and sleep disorders. The daily diet included alcohol consumption and consumption of various types of food, including 12 food groups such as cereals (rice and products, refined cereals such as steamed bread and noodles), mixed cereals (coarse grains such as corn, buckwheat and millet), potatoes, legumes and their products, fresh vegetables, fresh fruits, dairy and its products, animal meat, poultry, aquatic products, eggs and nuts. Using the food frequency method, patients were asked about the frequency of intake and the average intake of each of the above food groups per time in the past year, and the weekly food intake was calculated (weekly food intake equals weekly food intake frequency multiplied by average intake of each food per time). The intake of each food group was graded into low intake, normal intake and high intake as per the dietary recommendation guidelines(12). Physical activity included work, agricultural and domestic physical activity, transport physical activity, recreational and exercise activity. We also collected the occurrence of sleep disorders. The details of the questionnaire enquiry on lifestyle habits are shown in the Supplementary material.
History of DM includes the type of DM, the duration of the diabetes and history of DR. Diabetes must be diagnosed by a hospital at or above the county/district level and meet 1999 WHO diagnostic criteria for T2DM(11); as diabetes is a chronic disease and only type 1 diabetes has an acute onset, T2DM has an insidious onset and patients are usually unconscious in the early stages, so the duration of the disease is usually counted from the date of diagnosis.
The physical examination is carried out in a separate, quiet and comfortable room by a uniformly trained and qualified surveyor using a specified type of measuring instrument. The measurement is carried out in the early morning, avoiding strenuous exercise, ensuring an empty stomach, emptying the bladder and avoiding contact with sources of irritation before the measurement. Measurements include height, weight, blood pressure and pulse rate. Two investigators are assigned to each measurement. Body Mass Index (BMI) is defined as a person’s weight in kilograms divided by the square of height in meters. The subject was asked to relax and sit still for 5 minutes before the blood pressure was measured, and the subject's left arm was measured three times, one minute apart. Blood pressure was measured in millimetres of mercury (mmHg), the instrument was accurate to 1 mmHg and the results were accurate to 1 mmHg. Pulse was measured in beat per minute (bpm), the instrument was accurate to 1 bpm and the results were accurate to 1 bpm.
The DR examination is carried out by a uniformly trained and qualified professional photographer in a separate darkroom using a non-dilated fundus camera (Canon CR-2) and accompanying computer equipment to take and record fundus photographs of the patient. 2 photographs are taken of each eye: one of the focal point on the optic papilla and one of the central recess. 4 fundus photographs of each patient are collected centrally and sent to the testing centre for diagnosis by two independent ophthalmologists. This eye is scored for DR according to the 2017 Guidelines for Imaging and Reading Diabetic Retinopathy Screening in China(13), and the diagnosis is subject to a consistency test, with divergent samples being re-evaluated by the director of the testing centre as the final result. Each eye is divided into no DR and any DR based on the DR score above. as long as one eye is diagnosed with DR, the patient is judged to be a DR patient.
Fasting plasma glucose (FPG), glycosylated haemoglobin (HbA1c), total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), alanine aminotransferase (ALT), aspartate transaminase (AST), γ-glutamyl transpeptidase (γ-GGT), total protein (Tpro), Albumin (ALB), Globulin (GLB), Urea nitrogen (BUN), blood creatinine (Scr), blood uric acid (UA), urine albumin (UAlb), urine creatinine (Ucr) were collected from the patient’s fasting blood and urine samples. A/G ratio was calculated as ALB divided by GLB; UACR equals UAlb divided by Ucr.
Laboratories at each survey site use uniform testing equipment and standards, and analytical tests can only be performed after laboratory performance has been verified. The above indicators were graded and assigned values, as detailed in Table 1.
( Table 1)
2.3 Algorithms and models
2.3.1 Elastic Net
Lasso regression does not consider correlations between characteristics and is not suitable for multicollinearity variables(3); Ridge regression does not have predictors with real coefficients of zero for model selection. in 2005, Zoo and Hastie proposed the Elastic Net penalty model, which is a convex combination(14) of Lasso and Ridge regression, whose estimates can be expressed as:
The penalty function could be listed as:
The penalty is expressed as:. When β=0, the elastic net is equivalent to Lasso, while when β=1, the elastic net is equivalent to Ridge regression. The elastic net regression combines the two to give the ideal sparse model when cross-validating feature selection and to compensate for the effects of correlation between observed variables.
2.3.2 Bayesian Networks (BNs)
BNs is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a DAG(4). BNs are an ideal tool for making predictions about events that occur, predicting the likelihood that any one of several possible known causes is a contributing factor. For example, a BNs can represent the probabilistic relationship between a disease and a symptom. BNs use graphical structures and network parameters to uniquely determine the joint probability distribution of the random variable = x{x1, xn}, which can be listed as:
2.3.3 MMHC algorithms
Structure learning for MMHC BNs is mainly implemented by the CB algorithm and the SS algorithm(9). The CB algorithm uses conditional independence tests to learn conditional independence constraints from the data. These constraints are in turn used to learn the structure of BNs. However, the premise is that conditional independence implies graphical separation (so that two independent variables cannot be connected by an arc). The SS-based algorithm is a generic optimization algorithm that ranks the fit scores of the network structure. the MMHC algorithm represents a hybrid algorithm that combines the CB algorithm and the SS-based algorithm(15) while using conditional independence tests as a way to find the global optimal solution in a reduced search space and thus construct the best network structure.
2.4 Statistical method
Statistical descriptive analysis of DR-related influences was performed using SPSS 26, with categorical variables expressed as rates (%). BNs structure was learned using the mmhc() function in the package “bnlearn” in R software (3.5.0). Parameter learning was performed using the maximum likelihood estimation method, implemented by Netica software (5.18).