Development of Information Road map for Personalized Colorectal Cancer Screening in Iran

Background A data-driven colorectal cancer screening strategy based on personalized approach can improve health outcomes, facilitate early stratification of at-risk patients and reduce health care costs. This study aims to develop an information road map for personalized colorectal cancer screening in Iran. Methods This study is a Mix-Method Research (MMR) which consisted of three phases: phase I, development of a checklist with 275-items for assessing required data elements of personalized colorectal cancer screening; phase II, situational analysis of colorectal cancer screening dataset according to the checklist; phase III, development of national information road map for personalized colorectal cancer screening with in-depth interview and focus groups. Results Personalized datasets of colorectal cancer screening were defined in four dimensions, including clinical dataset (5 sub-dimensions, 162 items), genetic dataset (2 sub-dimensions, 67 items), demographic dataset (1 sub-dimension, 6 items) and a social determinant dataset (3 sub-dimensions, 40 items). The next step data elements of colorectal cancer screening based on personalized datasets were analyzed. Of the 275-items, only 96 items are recorded. Only 17.8% of clinical dataset of screening program were entered. The highest data elements of clinical dimension were related to pathological datasets (53.6%) in the present screening program. The lowest data elements of the clinical dimensions were related to the clinical history dataset (3.4%). 73% of pedigree data elements and 15.33% of social determinant datasets were entered. In the final step, a national information road map of personalized CRC screening with 6 layers (information leadership, personalized datasets, data integration, data architecture, data descriptor, and screening program layers) was developed.

3 quality of data in personalized screening.

Background
Colorectal cancer (CRC) is the fourth leading cause of cancer mortality worldwide (1). The global burden of colorectal cancer (CRC) is expected to increase by 60% to more than 2.2 million new colorectal cancer cases and 1.1 million deaths by 2030 (1). According to the GLOBOCAN, 2012 v1.0, the incidence of CRC will double by 2030 in Iran (2). The economic burden of CRC in Iran was estimated at US$298,148,718 in 2012. Concering the high economic burden and mortality rate of colorectal cancer on the health care organizations, policies should be approved to reduce the disease and increase the prevention and early diagnosis of CRC (3). Studies show that CRC is one of the most preventable cancers if detected early (4). Colorectal cancer incidence and mortality are reduced by regular screening (5)(6)(7)(8). Also, screening can improve patient safety and ultimately reduce health care costs (9).
Decision-making approach for screening programs has become more complex in recent decades (10,11). CRC is heterogeneous cancer that is caused by multiple risk factors (12,13). Tumor heterogeneity as the greatest challenges must be considerable (14). Thus, customizing and personalizing decision-making for screening is recommended (15). Although there is an increasing number of screening strategies using molecular technologies, all of them do not have the personalized criteria for screening approaches (16). Personalized medicine aims are to offer and design appropriate diagnosis and treatment by individual patient's characteristics (17,18).
Personalized medicine focused on the integration of genomics and clinical dataset for supporting prevention strategies (19). Developing CRC personalized prevention could lead to the more effective usage of health resources (20,21). Also, personalized cancer care as with individual clinical assessment approach can minimize cost and reduce efficacy (22). Personalized and patient-specific screening schedule to facilitate early stratification of at-risk individuals and detect significant biomarkers for predicting clinical status in individual patients (23). While individualized screening is an affordable strategy, there are challenges related to implementing personalized CRC screening (24). One of the big challenges facing personalized strategies is a lack of data (25).
The Complement of data in risk assessment of precise cancer screening is an important factor (26).
Also, the meaningful use of data for personalized protocols is essential. Integration and precise interpretation of massive amount of data play a vital role in empowering personalized medicine (18).
A data driven cancer screening strategy based on personalized approach to improve health outcomes and manage healthcare costs (27). The CRC screening process involves collecting and analyzing a massive volume of clinical data for selecting of appropriate evidence-based interventions (28).
Therefore, the lack of clinical data as barriers to implementing of the personalized protocol is considerable (29). Panahiazar and coauthors explored some of the challenges in using data in personalized strategies, including a variation of the data, the quality of the data, the volume and velocity of the data (18). We need vast amounts of data (clinical, environmental and genetic datasets) for personalized programs. These huge datasets extract from different and heterogeneous sources.
The integration of these data elements is a core barriers of personalized programs (30). One of the other challenges is the quality of data. Completeness and quality of data is a very important factor in the decision-making process of cancer prevention (16). In this paper, the national personalized datasets for colorectal cancer screening were developed and current datasets of CRC screening according to the personalized format were analyzed. Also, to address the above challenges, in our present, we explored an information road map for personalized colorectal cancer screening.
Information road map can describe the relationship between multiple sources and heterogeneous data components. Generally, the information road map optimizes integration of heterogeneous process (31). With regard to the importance of this subject, our study was developed information road map for personalized colorectal cancer screening in Iran. In this project, we must fulfill the requirements set out by the availability of the screening documents in the research institute; we aim to develop a personalized screening road map for integrating of the heterologous datasets.

Methods
The methodology used in this paper is a Mix Method Research (MMR) which combines quantitative and qualitative methodology. This project was conducted in Research institute Gastroenterology and Liver Disease (RIGLD), Shahid Beheshti University in Iran. The comprehensive plan for the screening of colorectal cancer has been designed by RIGLD from the last 18 years (32). This project was performed during 2016 to 2017.
In the first step, a 287-items Checklist 1: (yes Τ no0) was developed for the assessing required data elements for personalized screening. This checklist contained 4 dimensions sourced from the literature. In this step, all books, articles, research projects, thesis, manual and scientific reports were extracted from MEDLINE, IEEE, Scholar, Web of Sciences, Scopus, ProQuest and databases related to personalized colorectal cancer screening. We synthesized reliable evidence from multiple sources for determining personalized datasets of this checklist. Content validity of the developed checklist was assessed based on literature reviews and opinions of the experts related to the CRC screening program. The descriptive analyses were performed by SPSS software version 24 which the evaluating current status of the RIGLD data set by our checklist.
The second step of this paper was a qualitative study. A National information road map of personalized CRC screening was developed in this step. A literature reviews was conducted for identifying components of the information road map. Then both in-depth interview and focus group discussions were performed by clinicians and technical experts of CRC screening. In this study, 37 experts with 15 different fields related to CRC screening were participated in the interview. The participants consisted of three epidemiologists, six genetic experts, one biochemist, one molecular biologist, two pathologists, one oncologist, seven gastroenterologists, five general medicines, one anesthesiologist, one surgeon, one psychometric, two social medicines, three nutritionists, two internal medicines and one statistician. The researchers explained the study and obtained initial consent for further contact from participants. Also, the researcher asked for consent to audio-record the interviews. Semi-structured interview with six major themes (personalized data set, data architecture, data integration, data descriptor, monitoring program, screening program) was used.
Participants' experiences about components of information road map were collected by interview. We used content analysis and frequency distributions for data analyzing. For final approval of information road map, focus groups with semi-structured discussions were conducted. 18 experts with 8 different fields related to CRC screening & information technology were participated in focus group discussion. 6 The participants consisted of two information technologists, three software engineers, three statisticians, six genetic experts, one oncologist, one gastroenterologist and one pathologist. This team had executive experience in the screening program at least 5 years. Two focus groups were conducted after in-depth interview to approve the final road map.

Results
This article was presented significant dimensions in the three parts. The first part, national personalized datasets for screening was described. In the second part, data components of present CRC screening datasets based on approved datasets of personalized screening was assessed. In the third part, the information road map of personalized CRC screening was developed.
National datasets of personalized CRC screening  Table 1). The lowest data elements of the clinical dimension were related to the clinical history dataset (3.4%). 73% of the pedigree data element was entered. Table 2 was illustrated situational analysis of demographic & a social determinant datasets based on personalized CRC screening. Datasets related to the perspective of the participant was considered as part of the social determinants dataset. However, there were no data elements of perspective of participant in the CRC screening program (Table 3). Table 1 7 Situational analysis of Clinical dataset based on personalized CRC screening (59,60   We developed a national information roadmap for personalized colorectal cancer screening by indepth interviews and focus group discussion (Table 4). This roadmap was approved by CRC experts.
Information leadership is at the top of this roadmap (Fig. 1). This roadmap has 6 layers, including information leadership, personalized dataset, data integration, data architecture, data descriptor, and screening program layers. Information leadership describes a process that leads to data resource management and information infrastructure organization (33). Second layer is a personalized dataset.

Discussions
Personalized datasets of colorectal cancer screening were defined in four dimensions with 275-items. The next step data elements of colorectal cancer screening based on personalized datasets were analyzed. In the final step, the national information road map of personalized CRC screening with 6 layers was developed.
Personalized screening approaches can optimize the efficiency, equity, and safety of cancer screening, but will require precise and comprehensive patient information (34). Recent studies show that accurate and detailed information can support personalizes prevention strategies (24,35,36). It has been emphasized that screening recommendations require comprehensive patient information (34). Concering the vital role of patient information in an effective and affordable screening, we developed national personalized datasets with 275-items. This personalized dataset has been approved by experts in 15 different fields related to CRC screening. Despite the importance of accurate and complete datasets in the more effective management of the personalized screening 13 program, the existing datasets have limited and inadequate elements in our study. Of the 275-items, only 96 items are recorded. Only 17.8% of the clinical dataset of screening program was entered.
While clinical information plays a key role in risk assessment (37). Determining individual risk factors as significant elements of successful implementation of personalizing CRC screening is considerable.
Sufficient and up-to-date risk factor's information is an integral part of the more efficient individual screening (22). It is necessary to have a standard documentation tool for evaluation of clinical datasets (38). We developed a standard checklist of the personalized dataset for the evaluation of the clinical dataset in this study. Detailed information is needed in the clinical part of the personalized dataset.
In this present, only 9% of colonoscopy datasets were entered. Patient-centered colonoscopy has become a critical issue in colorectal cancer screening (39). The safety and effectiveness of the cancer screening program depend on the quality of the colonoscopy. For the high quality of this procedure, documentation of detailed and complete data is the significant parameter (40). In this research, the colonoscopy data was collected inadequately; more information needs to be added to the report.
Colonoscopy plays a key role in the screening process. The success of the screening program depends on the quality of the colonoscopy. Colonoscopy reports reflect the quality of the colonoscopy procedure (41). Colonoscopy reporting was poor in this study and data collection process should be revised.
The highest data entry of clinical dimension was related to the pathological dataset in screening program of RIGLD. According to studies, pathology sample is obtained in 30-50% of colonoscopy interventions (42). Accurate histopathological data is a requirement for providing high-quality care services to patients with colorectal cancer (42). Precise pathology reports can enhance screening recommendations for follow-up. Quality Assurance Task Group presents key data indicators for pathology documentation to achieve continuous quality improvement (CQI) (43).
The study proved that the highest data entry of genetic dimension was related to pedigree dataset.
Pedigree is useful for tracking and presenting a detailed family history data (44)(45)(46). The Pedigree data element is powerful datasets can support genetic dimension in screening approaches (46).
14 The lifestyle factor plays as an important predictor for screening participant (47-51). While only 11.7% of lifestyle datasets are entered in the present system. Also, participants 'perspective datasets, as a part of the social determinant dimension are not documented. In general, 62.5% of social determinant datasets were entered.
We need a standard dataset for comprehensive data documentation (52). Therefore, standard datasets for complete and accurate data gathering is one of the requirements of the personalized screening system. In this paper, the standard personalized dataset was developed and the next step present datasets were assessed by this standard dataset and incomplete data elements were identified. Studies show that information gap and deficiencies can affect decision making of clinicians (53)(54)(55). Thus, addressing present datasets deficiencies is necessary for better screening decision making.
In addition, completeness and accuracy of data are essential factors for increasing the effectiveness of the screening program. Clinical and genetic information should integrate for individual risk stratification in personalized prevention (56,57). To meet this need, we developed a national road map of personalized CRC screening. Data leadership is at the top of this roadmap. Information leadership as a layer of roadmap facilitate effective decision making (33). Successful CRC screening depends on precise and data driven plan (58). In our information roadmap, there is continued interaction between personalized dataset and components of the screening plan such as vision, missions and goals of the program in the development road map. One of the caveats of this study is the fact that the data sample related to one of the screening centers.
Personalized colorectal cancer screening is a comprehensive approach to prevention based on each person's unique datasets. One of the big challenges in this approach is the provision of complete and precise data. In this study, we developed a standard tool for information gap analysis. The results of this analysis can be considered in the screening program planning and quality improvement of documentation. The integrity of the high volume of information in precise screening program is another problem. The developed roadmap identifies various data components of the program and integrates all of information segments. It can be used as a tool for data process reengineering. 15

Conclusions
Due to the importance of integrated data in personalized screening approach and the lack of such data sets, researchers were conducted this project. Personalized prevention based on integration dataset plays key role for more efficient implementation of the screening program. Developed road map can be used for integration and interoperability of screening datasets. The efficiency of the decision-making process in screening plan can reduce by data deficiencies. Eliminating data deficiencies can improve the quality of documentation and may lead to improved screening performance. Therefore, the reason of data deficiencies and missing value should be identified and eliminated. In this study, information deficiencies were identified by a standard instrument. Entering data was inadequate and poor in the screening program. Implementation of national roadmap can assist to improve quality of data in personalized screening. According to recent studies, the use of standard datasets and indicators can help to identify information gaps and facilitate evidence base decision making. Establishing an individual screening program requires a comprehensive and accurate database. In our study, data gaps were analyzed by national checklist and a roadmap was developed for interaction and integration of heterogeneous data.
In the future study the present map will be implemented and the results will be reported. Continuous monitoring of the data process via this roadmap can facilitate quality improvement of personalized screening. Implementation of this map is expected to lead to comprehensive data for decision making.

Ethics approval and consent to participate
The ethical approval was accepted by RIGLD ethical committee.

Availability of data and material
The data that supported the findings of this study are available from the corresponding author on request

Competing interests
No conflict of interest declared.

Funding
This study was funded with the authors' own contributions.

Authors' contributions
Elham Masert: Design research, gathering data, analysis and Interpretation of data