Methods and implementation of a Hospital-Based Cancer Registry in a major city in a low-to middle-income country: the case of Cali, Colombia

To describe our experience upon developing and implementing a hospital-based cancer registry (HBCR) in a quaternary-level of care private non-profit academic medical center in Cali, Colombia. HBCRs capture, in a given institution, every single patient with a confirmed malignancy. In this study, all cases evaluated between 2014 and 2018 were included in the HBCR. In compliance with the International Agency for Research on Cancer recommendations, cases were classified as analytic or non-analytic. Data derived from an exhaustive selection of patients was stored in a computing platform owned by the institution, meeting the 2016 Facility Oncology Registry Data Standards recommendations. Quality control was performed by evaluating comparability, timeliness, validity, and completeness. A total of 24,405 new cases were registered between 2014 and 2018, from which 4253 (17.4%) died. Among all cases, based on the anatomic location, most common malignancies were breast (n = 1554), thyroid (n = 1346), hematolymphoid (n = 1251), prostatic (n = 805), and colorectal (n = 624). The behavior of the new cases was consistent with an incremental trend. Upon implementing the HBCR, major challenges were identified (i.e., a precise definition of cases, the development of processes for capturing new cases, a standardized data collection strategy, and carrying-out an appropriate patient follow-up). Based on our experience, the success of an HBCR largely relies on the interest from the institution, the engagement of stakeholders and financial support, that is, it depends on the adequate access over time to funding, technological, and staffing resources.


Introduction
A cancer registry is an information system designed to collect, store, analyze, and evaluate cancer data from a given population [1]. This activity is characterized by its consistency and is strictly controlled over time, ensuring the capture of new cases and update of those already included. Collected data represent a primary source for epidemiological research on cancer predictors as well as for planning and evaluation of healthcare services [2].
There are mainly two types of cancer registries. One that measures the impact of the disease in specific demographics, known as Population-Based Cancer Registry (PBCR) [3]. On the other hand, Hospital-Based Cancer Registries (HBCRs), mainly evaluate the burden of the disease and the quality of healthcare services, as well as the organizational and administrative support from the institution [4,5]. Although PBCRs are a valuable source of information, in Colombia, these do not often include clinical data, thus limiting the assessment of important variables, such as accuracy of diagnosis, quality of treatment, demand for health services, among others [6].
Although the benefits of implementing an HBCR have been evidenced [7,8], its success over time requires the interest from the institution, the engagement of stakeholders and financial support. For instance, in Colombia, the National Cancer Institute (INC by its Spanish acronym: Instituto Nacional de Cancerología) is the only institution with an HBCR that has published data [9,10]. The purpose of this study is to describe our experience upon developing and implementing the Institutional Cancer Registry (RIC by its Spanish acronym: Registro Institucional de Cáncer) in Fundación Valle del Lili (FVL). Our methodology may serve as a role model for other health centers in the country, Latin America and the Caribbean.

Population and registry area
FVL is a quaternary-level of care private non-profit academic medical center in Cali, the capital of Valle del Cauca, a State located in the Southwestern region of Colombia. FVL serves as a reference center that delivers healthcare services to an estimate of 12,700 patients per year. Referrals mainly come from the Southwestern States of Colombia (Valle del Cauca, Cauca, and Nariño), but also from some other Latin American and Caribbean countries. In 2018, FVL reported a total of 14,421 cases of patients with cancer [11].
Cancer treatment facilities include the following services: pathology, clinical laboratory, diagnostic imaging, hematology, oncology, surgical oncology, chemotherapy, radiotherapy, nuclear medicine, transplants, and palliative care. Furthermore, there is a Department of Data Management (including statistics), Department of Clinical Management (good clinical practices and health care quality), and an Epidemiological Surveillance Committee. FVL is located in the urban area of Cali (District 17, San Joaquín neighborhood). The RIC is an HBCR that started to function on April 13, 2018, and its database includes patients with diagnosed cancer from January 1, 2014.

Registry organization
The RIC is located at the Centro de Investigaciones Clínicas (CIC) of FVL, which belongs to the institution's Research and Innovation Sub-directorate. FVL funds this registry. The work team comprises individuals from various functional specialties: a physician, a statistics professional with a master's degree in statistics, a systems engineer, and a data entry specialist (auxiliary nurse). The process is supervised by specialized physicians: a pathologist and an oncologist.
Cancer registries are considered one of the primary sources of cancer information in Colombia and are part of the public health surveillance system, according to Act 1384, 2010 [12].
The RIC is advised by two cancer registries: Registro Poblacional de Cancer de Cali (RPCC) and Smilow Cancer Hospital-Yale Cancer Center Tumor Registry. The first is a PBCR created in 1962, and it is funded and supported by La Universidad del Valle, a public university. RPCC is affiliated to Department of Pathology-School of Medicine, and has more than 50 years of experience in cancer-related registration data in the city and is a pioneer registry in Latin America [3,13,14]. It also had the adviced of Smilow Cancer Hospital-Yale Cancer Center Tumor Registry, the oldest tumor registry in the USA, organized in 1926, that operates under the leadership of the Cancer Committee, in accordance with the American College of Surgeons Commission on Cancer (ACoS CoC), Connecticut Tumor Registry and SEER Coding Manuals. It is located in the Yale Department of Therapeutic Radiology.

Implementation process
The implementation approach was done according to four implementation outcomes (acceptability, adoption, feasibility, and cost) [15] (Table 1). We established different strategies focused mainly on stakeholders and decision-makers of our hospital that included chief executive officer (CEO), executive board, cancer committee, tumor board, cancer specialists, and cancer researchers. The strategies included presentations about cancer registries and their impact in the cancer epidemiology and control (Departments of Data Management and Clinical Management and Epidemiological Surveillance Committee), participation in different hospital meetings as tumor board and research meetings. Then, we create a small working group to develop the HBCR with specialized support and consulting of RPCC and Smilow Cancer Hospital-Yale Cancer Center Tumor Registry; this group reviewed the most relevant literature related to cancer registration, methodologies, and statistical analysis. Finally, the group proposed a work plan that included capacity building in cancer registration and methods (participation in International Association of Cancer Registries courses, and conferences), development of cancer registries software tool, data collection process (including extraction of different sources of information and education in cancer and staging coding), results presentations (periodic reports, annual report and participation in scientific and academic events) and sustainability (funding).

Case definition
Individuals of any age, sex, and origin who were treated in any FVL service and were diagnosed with a malignant tumor, regardless of its anatomical location. The diagnosis basis may be microscopical (fluid cytology, peripheral blood, marrow, histology of primary tumors, and autopsy) or macroscopical (clinical, surgical, and imaging diagnosis). The following types of cancer are included: single or multiple primary malignant tumors, central nervous system tumors, in situ breast and cervical cancers, uncertain behaviors tumors, metastatic tumors, basal and squamous cell skin carcinomas.
The definition of the class of case provided by the International Agency for Research on Cancer (IARC) is accepted for analytic and non-analytic cases [16]. Table 2 shows the different classes of cases in the registry. Analytic cases are those included in the hospital annual report and are used to assess in terms of caring for cancer patients; conversely, non-analytic cases are excluded from most tabulations, especially from survival estimates but may be included in tabulations assessing the cancer burden of the hospital, among others [17].

Data collection
The RIC collects data through software designed and created by FVL, Sistema de Información del Registro Institucional de Cáncer (SIRIC, by its Spanish acronym). Then, data are stored into four modules: patient identification, cancer identification, the first course of treatment, and outcomes.   [18]. Furthermore, the registry includes breast, cervical, and childhood cancer data collected for Mandatory Notification Record established by the National System of Public Health Surveillance (SIVIGILA: Sistema Nacional de Vigilancia en Salud Pública, by its Spanish acronym) of the Colombian National Health Institute (INS: Instituto Nacional de Salud, by its Spanish acronym) [19], as well as, information from Resolution 0247, 2014 enacted in 2014, that sets the report of patients with cancer in the High-Cost Diseases Fund (CAC: Cuenta de Alto Costo, by its Spanish name) established by the Ministry of Health [20].
Cancer diagnostic coding is done according to the International Classification of Diseases for Oncology, third edition (ICD-O-3) [21], and the staging is done using the American Joint Committee on Cancer, eighth edition (AJCC 8th) [22].
Case finding is a mixture of an automatic and manual process. The RIC obtains the information passively through the Epidemiological Surveillance Committee (with the mandatory notification forms for tumors of public health interest in Colombia), and the pathology and clinical laboratory reports (who carry out a mark of malignancy: present or absent) in all the samples analyzed.
Active information recruitment is carried out through the Department of Data Management and the Cancer Functional Unit. First, the medical records of all patients treated in the hospital with ICD-10 codes corresponding with malignancy are searched. Then, a manual review of all medical records obtained for malignancy to verify if they are cancer cases is done. In the same way, the data are compared with the information available in the Cancer Functional Unit to capture patients who only come to receive treatment (those who have not been initially diagnosed or followed up within the hospital).
Once cases are identified, a data abstraction process by modules is realized. Patient identification and outcomes modules are obtained automatically by crossing the data between the different databases (hospital discharge, vital statistics, medical records). In contrast, for cancer identification and the first course of treatment modules, data are obtained through manual review from clinical records.
For training in coding, abstracting, and staging, we used The Cancer Registry CASEbook published by April Fritz [23,24].

Data sources
Four primary data sources have been identified for the RIC: (1) Department of Data Management; (2) Department of Pathology and Clinical Laboratory; (3) Epidemiological Surveillance Committee; (4) Cancer Functional Unit. All the data are stored in the RIC Data System (SIRIC), which has been designed to manage and store cancer cases for the registry. Figure 1 summarizes the data capture and collection process conducted for the HBCR. Data are presented in a structured digital format.

Follow-up
Once a cancer case is identified within the hospital, to update the modules, SIRIC established a crosslink between the following databases:

Cancer Functional Unit
A hospital´s service that serves as a follow-up cancer patients sentinel. All the patients who are going to receive systemic therapy and bone marrow transplantation are followed through with this service.

Department of Data Management
This area is responsible for managing and saving data about vital statistics, High-Cost Diseases Fund reports and Hospital discharges. In the case of the deceased, it records the date of death, while for the living, it enters the last date of contact with the hospital. Also, this department stores information regarding oncological surgery and radiotherapy procedures. It is the most relevant database in the hospital.

RPCC
A transference process between this PBCR and our HBCR was defined, to update the date of the last contact, because there are patients that die at different health facilities in the city, and RPCC has a long experience in this data collection.
If a patient continues his management in an institution outside of Cali, the information and follow-up capacity is limited since there is no possibility of monitoring the patient once he leaves the city. For this reason, RIC classified the cases as analytical and non-analytical, according to IARC recommendations (see Table 2). Non-analytical cases impact the care burden (volume), but not on institutional management outcomes (quality and performance of care).
To determine the date of death or last contact after finishing the period of treatment in the hospital, we have an agreement with the RPCC. This collaborative inter-institutional alliance allows crossing bidirectionally the information to complete valuable data in both registries.

Quality control
There is no standard method for quality control of information from HBCR. All the techniques used are an extrapolation from quality control performed in PBCR. The recommended indicators are comparability, timeliness, validity, and completeness [25,26], being validity and completeness [27] essential for this process.
In our HBCR, a randomized review of at least 10% of the cases is performed in each calendar year. A general practitioner trained in filling out information from the HBCR reviews the information recorded in each case, verifying the consistency of the data and looking for possible errors in both coding and tumor identification. When doubts arise despite this process, a review is carried out together with a pathologist and an oncologist to guarantee data quality.

Statistical analysis
Statistical analysis is mainly descriptive. Absolute frequencies per year are presented according to the primary site and site group (systems). This information is stratified by analytic and non-analytic cases. Distribution by sex, as well as trends for the leading cancer types defined by the 10-year Plan for Cancer Control in Colombia 2012-2020 [28], are also shown.
Kaplan-Meier's non-parametric method is used for survival analysis. Survival is estimated using the patient's diagnosis date and death date (event) or last follow-up (censorship). For the 5-year survival analysis, the period analysis is carried out, described by Brenner and Gefeller [29] because there is no complete information for 5 years of follow-up.
HBCRs cannot be used to obtain incidence measurements of cancer because the population of which such cases are part of cannot be identified [30]. Therefore, the ICR does not generate incidence data.

Ethical considerations
FVL's HBCR complies with the Standards and Guidelines for Cancer Registration in Europe (IARC Technical Publication No. 40) [31].
The registry was reviewed and approved by the Institutional Review Board (Protocol number 1337), followed the ethical principles for medical research outlined by the Declaration of Helsinki [32] and took into account the regulations of Resolution 8430/1993 of the Ministry of Health of Colombia [33]. The board considered the registry as a national and local public health interest. It declared as not necessary the informed consent because we will not make any contact with the patients (all data were obtained retrospectively from four data sources), being our primary purpose to evaluate the burden of cancer and quality of healthcare services and administrative support.
To protect the identity and guarantee the security of sensitive information, the SIRIC houses the data in a double-layer architecture. The data layer is on the internal FVL server, which, in turn, is protected by a firewall that guarantees the hospital's information protection. Additionally, the data are encrypted and masked through a numerical system under the SHA-512 feature set, and to view it is necessary to be within the clinic's LAN or have authorization assigned through a VPN.

Preliminary results of the registry
The RIC database includes patients with diagnosed cancer since January 1, 2014. In 2014-2018, a total of 29,370 cancer cases were treated in FVL, 8.3% (n = 2439) were reported as dead within the institution, 58.87% (n = 17,290) were women, and 50.83% (n = 14,928) belonged to analytic cases according to the IARC definition.
From its foundation on October 20, 1982, to December 31, 2018, FVL has provided the RPCC a total of 21,641 cases evidencing that 73.68% of tumors diagnosed and treated in FVL belong to patients living in Cali. Table 3 presents cancer distribution cases in FVL between 2014 and 2018 for both sexes. The most common types of tumors include breast (n = 4315), hematolymphoid (n = 3481), thyroid (n = 3056), prostate (n = 2733) and colorectal (n = 1265). Figure 2 shows the frequency distribution for the main anatomical locations according RIC estimation. Breast cancer was the most frequent in women (n = 4275), and the prostate was the most frequent in men (n = 2733). Figure 3 presents the top ten cancer sites by sex.
The RIC database includes all cancer cases diagnosed as of January 1, 2014; when crosslinked this information with the RPCC, an increasing and consistent trend was observed for new cases that occurred in 2014-2018. Table 4 presented case distribution cases per year, period 2014-2018. It showed new and prevalent cases at the same period. All cases from 2014 were defined as new. Table 5 showed the frequency of the top ten cancer sites for both HBCR and PBCR. Breast cancer was the most frequent site for both registries.
The top three States with the highest number of cases treated in the FVL were Valle del Cauca (n = 12,741), Cauca (n = 948) and Nariño (n = 188). Figure 4 shows

Discussion
The HBCR of FVL is the first of its kind in the Southwestern region of Colombia and the second nationwide. It collects data of oncologic patients treated in a university hospital in Cali, a referral facility in Colombia and Latin America for the management of cancer. It was created to improve cancer patient data availability and quality, and to establish and strengthen the cancer control program.
The epidemiological transition observed in low-and middle-income countries have induced a change in the leading causes of morbidity and mortality. Non-communicable diseases, such as cancer, have displaced externally caused injuries and infectious diseases, and are now a motor cause of morbidity, mortality and a public health challenge [34]. Therefore, it is necessary to consolidate complete, reliable, and lasting sources of data that allow establishing policies and strategies for cancer control within the region.
HBCRs have different purposes as opposed to PBCRs. Some of their functions include providing an objective assessment of oncological patients' needs, cancer programs, and health care quality within a health institution [2]. A systematic review performed in 2017 found that HBCRs   consider other purposes such as epidemiological and clinical research, education, policy development, clinical practice guideline implementation assessment, and cancer control programs planning and monitoring, including prevention, detection, treatment, and palliative care [5,27]. We plan to establish ourselves as a reliable organization that contributes to the consolidation of cancer information in our environment. In this way, we will be able to guarantee strategic allies in the public and private sectors to impact on a large scale at prevention, education, and policy development for cancer control.
There are different implementation experiences regarding HBCRs worldwide; for example, Japan has 397 HBCRs that provide evidence for clinical measurements and create more accurate health policies for its population [35]. Some countries such as Australia [36], Sweden [37], United States  [38], and Thailand [39] also have HBCRs. All of them have reported successful experiences that enhanced cancer quality treatment and information systems consolidation. Experiences with HBCRs have also been recorded in low-and middle-income countries such as Nigeria, which has 19 hospital-based cancer registries, most of which were created in 2009, with the Nigerian National System of Cancer Registries support. At least 11 of these registries belong to reference hospitals within the country. These have contributed to improving cancer programs, a better understanding of the region's response capability against cancer, and optimal coverage of cancer data. Such strategies are precious in low-to middle-income countries with weak surveillance systems and scarce financial, human, and infrastructural resources for cancer management and control [7]. Nations from Latin America and the Caribbean have limited experiences with this type of registry, creating an information gap regarding cancer and their outcomes.
When comparing these findings with those of highincome countries, we observed that the United States has multicentric hospital-based registries such as the National Cancer Database (NCDB), a nationwide database that contains approximately 34 million records from > 1500 HBCRs in the United States and Puerto Rico [38]. Therefore, it is essential to create and consolidate HBCRs in regions such as Latin America and the Caribbean that will provide feedback to PBCRs and contribute to the development of further knowledge about cancer and its impact in different contexts.
An increase in the number of cases was found for 2018, and two factors could be related: (1) the implementation of the HBCR in the hospital and (2) the increase in the supply of oncology services in the hospital. The first one could be related to identifying critical data sources and the integration process to the registry (completeness). The second one, the hospital has increased its operational capacity regarding cancer management by implementing the functional cancer unit, the number of healthcare workers involved in cancer management, and agreements with health insurance for cancer patients' care.
This work showed the methods of a HBCR in Cali, Colombia. This city has the first PBCR of Latin America, which has been used as a cancer information source since 1962. The RPCC generates quality data included in all eleven volumes of Cancer Incidence in Five Continents (CI5) as follows: 50 years of incidence (1962-2012), 30 years of mortality , and 15 years of survival (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) [3]. Its background and expertise have motivated the creation of cancer information systems at a regional and national level; for example, in 1990, the INC supported 13 HBCRs created in different cities [40]. However, although this initiative is valuable and essential, it is not the only element for success because most registries have disappeared over time, and only one remains current. Based on the experience of establishing the RIC at FVL, we recommend starting with early goals and showing short and medium-term results, demonstrating the usefulness of data for clinical and administrative decisionmaking. An example of this is the annual cancer reports for the institution and active participation at Cancer Boards and the Cancer Committee (which are institutional discussion spaces).
Data quality control is one of the pillars in cancer registration, and it is a significant challenge. Unfortunately, there is no standard method for information quality control from HBCRs, and initiatives in this issue should be promoted and oriented by IARC and other scientific societies in the future. Also, we consider relevant the organization and planning of training and workshops geared especially for HBCRs because, currently, the data quality activities offered are focused on the PBCRs.
Computer platforms used for information storage is an essential part of cancer registry planning. It permits to collect, consolidate, and store data more safely. In the world, different software and commercial platforms are available for this task; for example, IARC promotes the use of CanReg5 [41], while in the USA, the National Program of Cancer Registries (NPCR) recommends the use of Registry PlusTM [42].
Similarly, on the market, there are platforms such as METRIQ® [43] and OncoLog [44]. They are private companies' creations for commercial purposes. However, economic issues limit their use in low-and middle-income countries because an additional budget is needed for annual membership payment and licenses purchase. The language barrier plays an essential role since most of the platforms are in English, representing a limitation in non-English-speaking countries. Due to these limitations, each country has identified the ideal way to solve this need, for instance, the case of Japan that registers its data in HosCanR, a standard software to register cancer information, developed and distributed by the National Cancer Center [35].
In contrast, some HBCRs from countries such as Pakistan and Italy use Microsoft Access and Excel. On the one hand, to facilitate the platform uses in our native language (Spanish), limit annual spending, and include specific details related to the institutional research and administrative needs, a multidisciplinary institutional team developed a custom software platform (SIRIC) for our HBCR. On the other hand, a novel proposal to evaluate cancer registry software made by researchers from Iran was considered for our platform's development [45].
The financial support of a cancer registry is part of the cornerstone for its sustainability and continuity over time. In Colombia, a study estimated the cost operating cancer registry in five PBCRs; it showed an almost three-fold variation in average cost per case (77,932-214,082 Colombian pesos or USD 41-113 in 2013) across registries, even though some differences in terms of data collection approaches, types of data collected, activities performed, the volume of cases collected, the number of reporting sources, follow-up and geographic area, no clear associations have been reported between population size, case volume, or the number of abstracts handled and the cost per case in our country [46]. These results are an essential guide to conduct proper budget planning in new cancer registries implementation in Latin America and the Caribbean, where funding opportunities are limited.
Currently, numerous efforts have been made to optimize information systems cost-effectiveness. However, HBCRs are usually expensive and do not receive enough support from IARC or other international communities [27]. One way to optimize economic resources is with machine learning, a strategy that seeks to improve time and human costs at processing large amounts of data. Using the information core computing science employs algorithms to classify, interpret, and predict quickly and accurately, becoming an excellent ally during the automation processes [47]. Moreover, big data have been explored as a technological tool, capable of integrating different cancer information systems and transform raw data into structured clinical information through advanced mathematical algorithms and high-technology electronic platforms [48]. Through this process, it is possible to obtain a standard format in disaggregated data, achieving greater clinical sense, approximations of reality, and higher quality in the analyzes. Consequently, this process results in highly accurate decision-making. The availability and use of these technologies result in an intersectional and interdisciplinary collaboration that could increase data collection and management.
Intersectoral and governmental support is essential to consolidate cancer registries because their information helps develop policies, guidelines, and comprehensive models for cancer control. In Colombia, Act 1384/2010 (Sandra Ceballos Act) regulates cancer care actions within the country [12], and the Colombian 10-Year Plan for Cancer Control proposed strategies to reduce the prevalence of modifiable risk factors and cancer-related deaths. The Act 1384 is aimed to improve patients' and survivors' quality of life, guaranteeing the generation of scientific knowledge and its availability in decision-making, as well as, strengthening human talent management for controlling such disease [28]. These governmental initiatives have been made possible as a result of information provided by the RPCC and combined efforts of different national institutions.
Finally, different strategic actors of Cali joined through the initiative City Cancer Challenge 2025 (C/Can 2015), whose purpose is to design, plan, and implement better solutions for cancer care in the region [45]. The initiative has in their city objectives, the creation of an integrated information system for oncological services management where HBCRs have a crucial role in their development. Therefore, the creation of new HBCRs should be promoted based on the existing registries' experience with active support.

Conclusion
The consolidation of cancer information is an increasing need for adequate decision-making in cancer policies and research. Health care facilities as hospitals are an essential primary source of information on diagnosis, treatment, and follow-up of cancer patients. That is why the creation and strengthening of HBCR are becoming important, especially at cancer care reference centers in Latin America.
Challenges related to the creation of a HBCR include a clear case definition; an adequate process for identifying new cases (exhaustivity); data collection standardization, appropriate follow-ups, data quality control, and information disclosure for decision-making. These processes depend on economic, technological, and human resources availability, and they are related to institutional commitment, administrative interest (involving decision-makers and stakeholders), and financial support.