Rationale and design of the Brazilian diabetes study: a prospective cohort of type 2 diabetes

Abstract Background Optimal control of traditional risk factors only partially attenuates the exceeding cardiovascular mortality of individuals with diabetes. Employment of machine learning (ML) techniques aimed at the identification of novel features of risk prediction is a compelling target to tackle residual cardiovascular risk. The objective of this study is to identify clinical phenotypes of T2D which are more prone to developing cardiovascular disease. Methods The Brazilian Diabetes Study is a single-center, ongoing, prospective registry of T2D individuals. Eligible patients are 30 years old or older, with a confirmed T2D diagnosis. After an initial visit for the signature of the informed consent form and medical history registration, all volunteers undergo biochemical analysis, echocardiography, carotid ultrasound, ophthalmologist visit, dual x-ray absorptiometry, coronary artery calcium score, polyneuropathy assessment, advanced glycation end-products reader, and ambulatory blood pressure monitoring. A 5-year follow-up will be conducted by yearly phone interviews for endpoints disclosure. The primary endpoint is the difference between ML-based clinical phenotypes in the incidence of a composite of death, myocardial infarction, revascularization, and stroke. Since June/2016, 1030 patients (mean age: 57 years, diabetes duration of 9.7 years, 58% male) were enrolled in our study. The mean follow-up time was 3.7 years in October/2021. Conclusion The BDS will be the first large population-based cohort dedicated to the identification of clinical phenotypes of T2D at higher risk of cardiovascular events. Data derived from this study will provide valuable information on risk estimation and prevention of cardiovascular and other diabetes-related events. ClinicalTrials.gov Identifier NCT04949152


Introduction
The steady increase in the global prevalence of diabetes, currently estimated to be as high as 463 million individuals, has fueled the burden of cardiovascular disease (CVD) and other diabetes-related complications 1 . In fact, compared to nondiabetics counterparts, individuals with diabetes have a 2-fold increased cardiovascular mortality and a 4-fold increased risk of peripheral vascular disease and limb amputation 2 . In addition, diabetes, alongside hypertension, remains a major cause of end-stage kidney disease and blindness, thus entailing significant loss of quality-adjusted life years 3,4 . In the past decades, the prevention of diabetic complications has been chiefly grounded by the achievement of stricter control of traditional risk factors, such as glycated hemoglobin, lowdensity lipoprotein cholesterol, and blood pressure 5,6 . Though reasonable, this strategy does not fully address the complex, multifactorial pathophysiology of diabetes 7 . As a matter of fact, cardiovascular mortality remains augmented even among individuals with optimal metabolic control 8 . Thereafter, growing attention has been directed to the development of risk prediction models dedicated to the early detection of individuals at higher risk of complications to whom earlier tailored clinical interventions yield the greater therapeutical benefit 5 .
In this context, the employment of artificial intelligence (AI) tools has been pursued as a compelling strategy intended to refine the accuracy of existing risk equations and ascertain novel features of risk prediction [9][10][11] . Machine learning (ML) models showed better accuracy than current algorithms, which are based on regression models from large clinical studies, for predicting outcomes, such as diabetic retinopathy 12 , limb amputation 13 , hospitalization for heart failure 14 and hypoglycemia 15,16 . ML-based algorithms also forecasted avoidable costs, increasing relevance for costeffectiveness analysis of novel antidiabetic drugs to face the sharply rising diabetes-related global healthcare expenditures 17 . Moreover, data-based hierarchical clustering identified subgroups of patients with newly diagnosed diabetes more prone to develop chronic kidney disease and retinopathy 18 . Despite all these advances, whether these models accurately identify those at higher risk of atherosclerotic cardiovascular disease remains unanswered.
To address this unmet challenge, we designed the Brazilian Diabetes Study (BDS) as a population-based, prospective, ongoing cohort of adults with T2DM. Participants have been enrolled in in-depth clinical evaluation, biochemical analysis, and advanced cardiac imaging exams. Since June/2016, the BDS has enrolled 1030 participants who are currently undergoing annual appointments to assess predefined outcomes. The main goal is to identify clinical phenotypes of T2DM which are more prone to developing CVD. Furthermore, the dataset derived from this cohort may be an insightful source of information on current treatment status and may provide ML-based models with data for generating novel risk estimate algorithms and cost-effectiveness analysis of clinical interventions.

Study design and participants
The Brazilian Diabetes Study is a prospective, ongoing, single-center, cohort of T2DM (clinicaltrials.gov: NCT04949152). Clinical and laboratory analyses are performed by the Atherosclerosis and Vascular Biology Laboratory (Aterolab), situated at the Clinical Research Center at the University of Campinas (Unicamp), Brazil. Social media and newspaper campaigns to boost recruitment are implemented. Eligible participants are 30 years old or older, from both sexes, with a confirmed diagnosis of T2DM according to the latest ADA criteria 19 . The study was approved by the local ethics committee (CAAE: 89525518.8.1001.5404) and complied with the Declaration of Helsinki principles 20 .
Eligible patients are invited to the research center for an explanation of the study protocol. After signing the informed consent form, participants have their demographical, anthropometrical, and medical history registered and are examined by a licensed doctor. After this visit, the following exams are scheduled at baseline and after 5 years from enrollment: blood and urine sample collection; ambulatory blood pressure monitoring; advanced-glycation end-products measurement; echocardiogram; carotid ultrasound; ophthalmologic evaluation; dual X-ray absorptiometry; bone densitometry; handgrip strength; usual gait speed test; and coronary artery calcium score. Between the first and the 5-year visit, patients are contacted yearly by phone to disclose endpoints ( Table 1).
The clinical research data management was based on the Research Electronic Data Capture (REDCap, Vanderbilt, USA) platform. Access to this system is restricted to investigators and the exportation of data for analysis is de-identified to protect individual confidentiality. All researchers have been equally responsible for the obtainment and storage of information collected throughout this research. Results from both image and biochemical analysis are stored in this system and backed up to a dedicated storage rack remotely at the Unicamp data center. A dedicated investigator is responsible for generating a summary of exams' results and their delivery to participants.

First visit
On this occasion, participants are interviewed by a licensed physician for whole medical history taking. Information regarding the time since diagnosis of diabetes, medications in use, and comorbidities are registered. The race is selfreported. Participants are considered as having established cardiovascular disease if they presented any of the following: coronary heart disease, cerebrovascular disease, or peripheral artery disease. Subjects then undergo a complete physical examination, including registration of waist circumference, weight, height, and body mass index. Socioeconomic status is registered as years of study and current family income. Physical activity, as minutes and daily frequency per week, is also registered. In addition, at least 3 different phone numbers are recorded for follow-up.

Blood pressure measurement
Blood pressure measurements are performed using the HEM-7113, Omron Healthcare (São Paulo, Brazil) device according to the latest guidelines 21 . After 3 min resting, three consecutive measures are obtained with the patient sitting and then standing, and the mean value of the last two measures for each position are considered. Orthostatic hypotension is diagnosed when a systolic blood pressure difference is 20 mmHg, or diastolic blood pressure difference is 10 mmHg, is found between sited and upward measurements.

Electrocardiogram
A digital 15-min electrocardiogram was performed using the WinCardio (Micromed Surface Digital Electrocardiogram, Brazil). This software allowed further heart rate variability analysis.

Diabetic distal polyneuropathy
Diabetic distal polyneuropathy (DDP) is evaluated according to The Michigan Neuropathy Screening Instrument (MNSI), as previously validated 22,23 . In short, patients answer a 15-question survey on DDP-related symptoms. Later, they undergo a lower extremity examination to assess neurological reflexes and tactile and vibratory sensitivity. Each abnormal finding score 0.5 or 1, and scoring above 7 in the questionnaire, or 2.5 in the physical examination, has positive and negative predictive values of 84 and 73%, respectively 24 .

Ambulatory blood pressure monitoring
Ambulatory Blood Pressure Monitoring (ABPM) is executed using the 90,207 Spacelabs Healthcare (Washington, USA) an automated, according to the latest ESC guidelines 25 . This exam allowed the identification of white-coat hypertension, defined as elevated blood pressure values in the office, but normal readings out of this setting. Furthermore, this exam permitted the diagnosis of masked hypertension, defined as elevated values on daily routine measures, but normal blood pressure in the office. Blood pressure thresholds for hypertension are 140/90, 130/80, 135/85 and 120/70 mmHg for office blood pressure, 24 h, day and night measures, respectively 25 . Other outcomes will include blood pressure variability and nocturnal dipping characterization 26 .

Blood and urine samples
After 12 h fasting, peripheral blood samples were obtained according to proper guidelines 27 , centrifuged at 3500 rpm and analyzed for the following measurements: complete blood count, fasting glucose, glycated hemoglobin, lipid profile, triglycerides, thyroid-stimulating hormone, urea, creatinine, sodium, potassium, calcium, phosphorous, C-reactive protein, aspartate aminotransferase, alanine aminotransferase, ultrasensitive troponin T and brain-type natriuretic peptide. Glomerular filtration rate (Gfr) was estimated using the CKD-EPI equation. Urinalysis was also performed on the same day and analyzed for urinary albumin to creatinine and protein to creatinine ratio.

AGE Reader
The autofluorescence reader (AGE Reader; DiagnOptics, Groningen, the Netherlands, serial number: 09-10138) illuminates a skin surface of 4 cm 2 , guarded against surrounding light, with an excitation light source with peak intensity at 370 nm. Emission light and reflected excitation light from the skin area are measured with a spectrometer in the 300-600 nm range, using glass fiber. Autofluorescence was computed by dividing the average light intensity of the emission spectrum 420-600 nm by the average light intensity of the reflected excitation spectrum 300-420 nm and expressed in arbitrary units (AU) 28 . The measurements are performed in triplicate, and the average value is considered the definitive value of the AGE-Skin autofluorescence. AGE-Skin autofluorescence of all patients is assessed at the volar side of the arm, 10 cm below the elbow fold, in areas without tattoos, scars, cream or sunscreen 28 .

Echocardiography assessment
Transthoracic echocardiography is performed by fully licensed cardiologists with specialization in cardiovascular imaging, following technical recommendations and measurement techniques according to the latest American Society of Echocardiography guidelines 29 . Heart scan images were acquired with a 1.5-4.5 MHz phased array transducer (Epiq CVX, Philips, Eindhoven, The Netherlands), and images processing with the Echo PAC software version 8.0 (GE Healthcare). Variable assessment and interpretation followed their respective guidelines: cardiac chambers diameters, chambers volumes, left ventricle (LV) mass, LV and right ventricular (RV) systolic function and global longitudinal, circumferential, and radial strain assessed by speckle tracking. For the LV diastolic function analysis, it considered tissue Doppler myocardial velocities, mitral wave inflow velocities, indexed left atrial volume, and tricuspid regurgitation peak velocities as recommended in ASE guidelines 30,31 .

Carotid doppler ultrasound
Trained cardiologists performed carotid Doppler ultrasound with a 5-13 MHz linear array transducer (Epiq CVX, Philips, Eindhoven, The Netherlands). Briefly, the longitudinal image of the bilateral common, internal, and external carotid artery, and the vertebral artery, was scanned for atherosclerotic plaque detection, following ASE guidelines 32 . The carotid intima-media thickness (CIMT) was measured from the common carotid artery 20 mm from the carotid bulb and at least 10 mm from the bifurcation using a semi-automated method.
Carotid atherosclerosis was considered if participants presented any of the following: (i) atherosclerotic plaque, defined as a localized projection of more than 1.5 mm into the lumen or thickening of 50% of the artery compared with an adjacent wall; (ii) IMT 1 mm; or (iii) mean IMT above the 75 th percentile, as previously determined for our population in the ELSA-Brazil study 33 .

Usual gait speed test
A usual gait speed test is performed on a surface with 10 meters of length, free from irregularities or obstacles. Four ground-referenced signalings are placed along the route: a starting point (0 meters/x feet), acceleration section (0-2 meters), measurement section (2-8 meters), deceleration section (8-10 meters), and arrival point (10 meters). During the test, the examiner activates the chronograph when the subject's first foot touched the 2 meters mark and interrupts the measurement when the last foot exceeded 8 meters. The assessment is performed three times, with 1-minute intervals in-between. The final measure corresponds to the average speed of the three assessments.

Ophthalmologic evaluation
The ophthalmological evaluation measures the best-corrected visual acuity (BCVA), following the Snellen

Objectives
The general objective is to identify the clinical phenotypes of T2DM which are more prone to developing CVD. Among the specific aims of the study are: (i) investigating the association between clinical features or biomarkers with clinical or subclinical CVD; (ii) generating algorithms based on artificial intelligence intended to estimate the risk of diabetic-related events; (iii) elaborate a database for Markov-based modeling to estimate the cost-effectiveness of clinical interventions; (iv) development of risk prediction algorithms for recurrent CV events in individuals with established CVD.

Cluster analysis
Cohort individuals are clustered based on the similarity of their attributes. To achieve this goal, K-means clustering algorithm aimed at classifying each object of the dataset to their respective cluster based on randomization process (Random Forest) for positioning the initial cluster centers, or centroids, as close to optimal as possible. Based on previous studies, attributes are the following: age, diabetes duration, BMI, homeostasis model assessment (HOMA) 2 estimates of b-cell function (HOMA2-B) and insulin resistance (HOMA2-IR). 18 k-means cluster is performed using kmeansruns package in R version 3.3.1, and cluster stability is assessed by resampling the dataset 5000 times and computing Yule's Q coefficient.

Machine learning models
Machine learning techniques are employed to evaluate the main risk factors for cardiovascular outcomes. The following variables are considered as dependent: age, sex, years of study, income, diabetes duration, hypertension, LDL-C, CVD, BMI, smoking status, glycated hemoglobin (A1c), FBG, glomerular filtration rate (GFR), skin autofluorescence AGE reader, intima-media thickness in carotid ultrasound and lean-to-total mass. The independent variable is any major acute cardiovascular event (myocardial infarction, stroke, and revascularization). Other independent variables include microvascular complications, such as diabetic nephropathy (proteinuria or GFR < 60 ml/min/1.73 m 2 ), diabetic retinopathy, and polyneuropathy; major adverse renal (doubling creatinine, GFR decline greater than 50% from baseline, treatment for end-stage kidney disease by dialysis or kidney transplantation, new-onset proteinuria) outcomes.

Discussion
The Brazilian Diabetes Study is the largest population-based diabetes cohort ever performed in a developing country. From June/2016 and July/2021, 1030 participants have been enrolled in the cohort. Baseline characteristics are summarized in Table 2. Other strengths are as follow: (i) the Brazilian Heart Study investigators, a group with a strong background in clinical research, collected the data; (ii) examinations include highly specialized tests performed with the lastgeneration type of equipment by fully licensed doctors and trained cardiologists; (iii) in the future, data from this cohort may become a resource for hypothesis-generating and mechanistic studies on diabetes complications. This study also has some limitations. More importantly, attending all steps of the study protocol will be timeconsuming and may require an absence from work. This, conceivably, favors recruitment of highly motivated, and presumably more educated, individuals whose treatment and risk factors control tend to be enhanced. Furthermore, as the reassessment of micro-and macrovascular complications are expected to occur after 5 years from enrollment a significant follow-up lost may occur. Moreover, we expect that most of the participants will be inhabitants of the metropolitan region where the research center is located, which presents a higher human development index and facilitated access to healthcare services compared to other Brazilian regions. This should be borne in mind when extrapolating our results to other populations.
In line with the aforesaid, other limitations include the following: (i) whether participants have been on regular attendance to healthcare facilities has not been reported in the interview at the first visit; (ii) as individuals at both primary and secondary CV prevention were enrolled, some of the analysis dedicated to the assessment of risk factors for incident CVD may include a smaller number of participants who had no prior CV event with a potential influence on statistical power. Finally, since the COVID19 outbreak in Brazil, in January 2020, the recruitment rate has been decelerated and it is currently under gradual recovery to the prior levels. As a result, the initially stablished targets of sample size (3 thousand participants before Jan/2023) shall be reassessed considering the current circumstances.

Conclusion
The BDS will be the first large population-based cohort dedicated to the identification of clinical phenotypes of T2D at higher risk of cardiovascular events. Data derived from this study will provide valuable information on risk estimation and prevention of cardiovascular and other diabetesrelated events.

Declaration of financial/other relationships
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Author contributions
Professor Carvalho had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Data availability statement
The Brazilian Diabetes Study investigators will hold intellectual property over data. Its availability may be considered upon reasonable request.