Cohort Prole: The China Northwest Cohort study

The China Northwest Cohort study (CNC), a community population-based prospective observational study, aimed to investigate specic aetiology causes of NCDs and the long-term health hazards of NCDs in the northwest region of China with different ethnic groups. A total of 118,572 participants (aged 35–74 years and including the Han, Uygur, Kazakh, Hui and Tibetan ethnic groups from ve provinces of northwest China) were recruited between June 2018 and May 2019. Approximately 10% of the participants will be actively followed every 3 years via face-to-face interviews with the use of concise questionnaires to review risk exposures and disease incidence, as well as to review biological specimens, including blood, saliva and stool samples, which will be collected. Passive follow-ups will be conducted by periodic linking (every 6 months) of baseline survey data to establish electronic disease registries. The questionnaire survey, a regular medical examination and the storage of blood samples were conducted in the CNC baseline survey for all of the participants. Several other items from the medical examination were recorded for approximately 40% of the participants. and 1.3%, respectively. Many lifestyle and medical history factors were different across the ethnic groups. The genetic information from the multi-ethnic individuals, combined with abundant personal and environmental information, an important opportunity to reveal

addition, we excluded the participants who had severe mental diseases or disabilities that hindered their ability to communicate in order to obtain accurate data. All the participants were asked to bring their national identity cards in order to exclude individuals who were not local permanent residents because these individuals were unavailable to be followed. All of the participants signed the informed written consents, which allowed us to access their medical records and their long-term storage supplies of blood samples for solely medical research purposes.
How often will participants be followed?
The National Death Surveillance System (NDSS), Chronic Disease Registries (CDR), The National Central Cancer Registry (NCCR) and National Health Insurance Claim Datasets (NHICD) enabled us to conduct passive follow-ups by periodic linking (every 6 months) of the baseline survey data to the NDSS, CDR, NCCR and NHICD. The NDSS and CDR are both based on the China's Disease surveillance points system. The causes of death of the study participants were regularly monitored through the use of o cial death certi cates that were reported to the regional Center for Disease Control and Prevention (CDC), wherein the NDSS was based; additionally, the NDSS was available in all 13 study regions. Information on chronic diseases (diabetes, hypertension, heart failure, stroke and cancer, among other diseases) is currently being collected through linkage with the NCCR, CDR and NHICD. The new national health insurance claim datasets are now fully established in all 13 study regions, and future incidences of chronic disease and the hospitalization information will be primarily extracted from the NHICD. Because the quality of the data from the NCCR and CDR is insu cient, the NCCR and CDR were not primary data sources but were complementary sources for the passive follow-up. The unique national identity card and health insurance card numbers were used as key variables to link the baseline data to the NDSS, CDR and NHICD. The diagnoses of these diseases are based on well-accepted international standards.
The causes of death are coded according to the 10 th version of the international statistical classi cation of diseases (ICD-10) (21).
Approximately 10% of the participants will be actively followed every 3 years via face-to-face interviews, in order to estimate the changes of lifestyle and exposure factors and to identify the participants who have permanently moved out of the study regions, as well as to minimize the under-reporting of vital status and death data. In addition, biological specimens, including blood, saliva and stool samples, are planned to be collected. The rate of loss of individuals to the follow-up will aim to be less than 10%.
What has been measured?
In general, all the participants were asked to fast overnight for at least 8 hours before arriving at the local clinical centre, after which questionnaire surveys, physical examinations, laboratory tests and biological sample collections were conducted. A tablet computer with a self-developed application (the CNC App) was used to collect the questionnaire information, of which the main content referred to the study of the China Kadoorie Biobank as a reference (22), with the information being collected via face-toface interviews that were performed by well-trained technicians. In addition, blood samples and anthropometric measurements were performed by professional physicians or nurses. The entire visiting procedure, including the electronic questionnaire interview, medical examinations and sample collections, typically took 60-75 min for each participant to complete.

Questionnaire survey
The face-to-face questionnaire interview was conducted to collect baseline information about the participants, including demographic and social-economic characteristics (sex, date of birth, ethnic group, education, marital status, occupation and income, among other factors), lifestyle factors (tea and coffee drinking habits, alcohol consumption, smoking history and dietary status, among other factors), environmental exposures (passive smoking and indoor air pollution, among other factors), medical history, mental health status and reproductive history (Table 1). To ensure the quality of the data from the questionnaire survey, reviewers were trained on the unifying process and the mode of query.

Medical examinations
The medical examinations were primarily conducted by using the resources and personnel at the local clinical centres. Standardized training for the doctors, nurses and technicians was implemented before the investigation. The items of the medical examinations for all of the participants included height, weight, waist circumference, heart rate, body fat percentage (BFP), visceral fat index (VFI), basal metabolic rate (BMR), bone mass, muscle mass, total body water (TBW) and resting blood pressure.
The project group (the provincial CDC or a local medical college) had the primary responsibility of quality control, as guided by the alternative methods of the study. The values of the medical examinations were estimated to have optimal consistency among the devices that were used in 5 provinces. In addition, other items (lung function, blood and urine routine examinations, liver and kidney function tests, glycosylated haemoglobin measurements and B-ultrasonic examinations, among other tests) have been selectively tested in a portion of the participants (approximately 40%).

Blood samples
After at least 8 hours of fasting, 10 ml venous blood samples were collected from all of the participants at the time of the baseline survey. The venous blood was transferred into one 5.0 mL ethylenediamine tetra acetic acid dipotassium (EDTA K2) anticoagulation tube and one 5.0 mL vacuum tube without anticoagulation. Plasma, serum and buffy coat samples were separated from the whole blood via centrifugation for 10 min at the relative centrifugal force of 3000 g at 4 ℃. Due to limited funds, partial serum samples (approximately 40%) were forwarded to measure various biochemical indexes, including fasting blood glucose, blood lipids, kidney function and high-sensitivity C-reactive protein (hsCRP) tests, among other indexes, within 2 hours.
During the baseline survey period, ve provincial biobanks were established, which were responsible for the storage and management of the biological samples of the CNC. All of the blood samples were stored in a -20 °C freezer before being transferred to be stored at -80 °C in cryogenic refrigerators that were located in each provincial biobank in the local CDC, medical college or a liated hospital of the university via cold chain. To ensure the safety of the biological samples, a portion of the blood samples (including 5 ml plasma and 5 ml whole blood of the participants that were stored in four other provincial Biobanks [except for the Xi'an biobank]) were required to be transferred to the Xi'an biobanks in Shaanxi province as a back-up storage. Due to the long distance between the Shaanxi province and the other 4 provincial sites (especially Xinjiang), the quality control of each shipment was a key issue that had to be speci cally considered. To prevent any blood sample from thawing during the shipment from each site to the Xi'an biobank, an electronic thermometer was placed in each sample's dry ice box, in order to dynamically monitor the temperature; additionally, the temperature of each of the samples was maintained at a range of -60--80 °C in the old chain trucks.
A computer-based scanning and packing system for the biological samples was developed, in order to assist in blood sample entry into the cryogenic refrigerators in the biobank. Moreover, quality control requirements were strictly followed throughout the process of blood sample collection, shipment and storage, such as checking the identity of the sample and the participant ID, regulating the storage time and standardizing the volume of the samples and the temperature conditions of the samples on site, in order to ensure high-quality blood samples. In total, approximately 0.88 million blood samples (including plasma, serum, buffy coat and whole blood samples) have been stored in the CNC biobank.

Discussion
What has it found? Table 2 shows the results of the social-economic characteristics of the participants (in total and across the ve provinces). In general, almost 120,000 participants were involved in this cohort in Northwest China. Among the total participants, the proportions of Han, Uygur, Kazakh, Hui and Tibetan ethnicities were 75.3%, 13.0%, 1.7%, 8.2% and 1.3%, respectively. There were more female participants (59.8%) than male participants, the mean age of the participants was 52.41 years and the majority of the participants were of Han ethnicity, had middle school educational levels and were employed. Nearly 75% of the participants had a household income less than 50,000 yuan (RMB) per year. In addition, the ethnicities of the investigated participants in the provinces of Gansu, Shaanxi and Qinghai were of the Han ethnicity (over 90%), with the Xinjiang and Ningxia provinces having 67.2% (Uygur, Kazakh and Hui) and 37.3% (Hui) participants of the muslin ethnicity and a small proportion of the Tibetan ethnicities being observed in the Gansu (7.2%) and Qinghai (1.3%) provinces. The Shaanxi province was the main contributor to the CNC cohort, due to it having the largest population. The participants in the Shaanxi province had better education levels and higher household income than the other four provinces. The proportions of the participants with college or above education levels and those participants with a household income over 100,000 yuan (RMB) per year were 35.1% and 15.9%, respectively, in the Shaanxi province. These proportions among the other four provinces ranged from 0.6-7.9% for college and above education and from 2.0-5.6% for a household income over 100,000 yuan (RMB). Table 3 reports the lifestyle and medical history factors across the different ethnic groups. Speci cally, the participants from the Hui and Uygur ethnic groups preferred to drink tea, with the proportion of individuals usually drinking tea being 47.9% and 38.8%, respectively. Kazakh (30.0%) and Tibetan participants (19.9%) had a higher proportion of current smokers. There was also a higher proportion of participants who usually drank alcohol in the Han (9.2%) and Tibetan (8.2%) ethnic groups. The participants from the Uygur, Kazakh, Hui and Tibetan ethnic groups consumed more animal food (but less salt), compared with the Han ethnic group; however, the frequency of plant food intake among Tibetan participants was the lowest (compared to the other groups), and egg and dairy foods were most frequently eaten by participants in the Kazakh ethnic group. The proportion of daily exercise was higher in the Hui (21.8%) and Han (20.7%) ethnic groups. In addition, the overall prevalences of self-reported diabetes and CVD were low in the baseline survey, and these prevalences were extremely low among several ethnic groups (other than Han ethnicity). This may be related to the fact that there is less access to health service and chronic disease management; thus, the improvement of healthcare utilization should be an urgent consideration in Northwestern China. For life satisfaction, the results showed that more participants of other ethnicities were very satis ed with their life than those participants of the Han ethnicity. The number of pregnancy and delivery cases was 2.91 and 2.42 times, respectively, in the number of total participants. When compared with the Han ethnic group, the women in the other four ethnic groups had more incidences of pregnancy and delivery. In addition, the prevalences of being overweight, as well as obesity and hypertension, were 37.2%, 16.1% and 30.1%, respectively, among the total number of participants. The participants from the Uygur, Kazakh, Hui and Tibetan ethnic groups had higher prevalences of being overweight (24≤BMI<28) and being obese (BMI ≥28), when compared with those in the Han ethnic group.
Similar to the other factors of lifestyle and medical history, the prevalence of hypertension (as de ned by the Chinese guidelines for prevention and treatment of hypertension[23]) was also different across the ve ethnic groups. In general, the prevalence of hypertension was higher in the Hui, Tibetan and Han ethnic groups (> 30%), and there was a higher proportion of stage 2 and stage 3 hypertension in these ethnic groups. The prevalences of stage 2 and stage 3 hypertension were 13.4%, 11.4% and 11.1% in the Tibetan, Uygur and Hui ethnic groups, respectively.
What are the main strength and weakness?
There are several limitations and strengths that need to be addressed. First, the information on the personal history of diseases, physical conditions and some lifestyle factors (such as smoking, drinking and dietary food intake, among other factors) were self-reported, and recall bias could not be avoided. Second, we only included the participants who were aged in the range of 35 to 74 years, which may have ignored the information on early life exposures. Third, only 40% of the participants were measured for biochemical indexes, due to limited funding. To the best of our knowledge, as being a unique, large-scale cohort study conducted in Northwest China, our study has some strengths. First, because of the speci c features regarding lifestyle, dietary habits, climate, geography and rapid social and economic transitions in Northwest China, this cohort study will provide important data and support in estimating the prevalence and risk factors of NCDs in Northwest China, which will allow us to further provide an opportunity of broadly investigating the aetiology of NCDs. Second, this cohort was comprised of multi-ethnic individuals, urban and rural inhabitants, plateau and basin residents, populations living in remote mountains areas and individuals living in highly concentrated and air polluted areas. These features allowed us to evaluate the effects of various environmental exposures on different health-related outcomes. Third, the large number of obtained blood samples (approximately 0.88 million), including whole blood, plasma, serum and buffy coat samples, were collected at the baseline measurements. These valuable samples allowed us to obtain and identify information regarding the genetic and epigenetic variants for health outcomes. In addition, the genetic information from the multi-ethnic individuals, combined with abundant personal and environmental information (including speci c lifestyle and dietary habits that were collected in the baseline measurements), provided an important opportunity to reveal complex and speci c mechanisms of the genetic and environmental factors associated with NCDs. Finally, the information on the follow-up outcomes, including the diagnosed disease that are linked to this cohort database through the NDSS, CDR and NHICD systems, enabled the reliable and real-time con rmation of the participants' health statuses and the tracking of medical histories.
Can I obtain the data? Where can I nd out more?
The China Northwest Cohort Study welcomes collaboration throughout the world to maximize the use of these data. Currently, the data are not available because they contain some sensitive information. However, possible collaborators are invited to contact the corresponding author (Shaonong Dang: tjdshn@xjtu.edu.cn).

Pro le in a nutshell
•The China Northwest Cohort study (CNC), a community population-based prospective observational study, aimed to investigate speci c aetiology causes of NCDs and the long-term health hazards of NCDs in the northwest region of China with different ethnic groups. •Approximately 10% of the participants will be actively followed every 3 years via face-to-face interviews with the use of concise questionnaires to review risk exposures and disease incidence, as well as to review biological specimens, including blood, saliva and stool samples, which will be collected. Passive follow-ups will be conducted by periodic linking (every 6 months) of baseline survey data to establish electronic disease registries.
•The questionnaire survey, a regular medical examination and the storage of blood samples were conducted in the CNC baseline survey for all of the participants. Several other items from the medical examination were recorded for approximately 40% of the participants.
•Collaborations from all over the world are welcome. Please contact the corresponding author with speci c research ideas or questions at: tjdshn@xjtu.edu.cn.

Declarations
Ethics approval and consent to participate The study was was conducted in accordance with the declaration of Helsinki and approved by the Human Research Ethics Committee of the Xi'an Jiaotong University Health Science Center (No: XJTU2016-411). Written informed consent was obtained from individual or guardian participants.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analysed during this study are included in this published article.

Competing interests
The authors declare no competing interests.