Cancer stage at diagnosis informs the healthcare team of the patient’s prognosis and aids in determining the most effective treatment approach (1). It describes the extent or spread of cancer at the initial diagnosis and after staging investigations for distant disease before any treatment has been delivered. Population-level collection of staging data can guide health service planning and evaluate cancer control and early detection initiatives (2). When linked with other national and international data sources, it can explore stage-specific cancer outcomes, geographic and socioeconomic variation, and survival (3).
Our recent scoping review determining cancer stage in population-based cancer registries (PBCRs) identified three categories of staging classification systems for assigning stage: 1) tumour–node–metastasis (TNM)-based, 2) categorisation by local, regional and distant spread, and 3) miscellaneous systems (4). In Australian clinical practice, cancer stage is classified primarily using the most widely used American Joint Committee of Cancer (AJCC) 8th edition TNM staging classification system for solid tumours (4, 5). The TNM classification describes the extent of the primary tumour (T category), the involvement of nearby lymph nodes (N category), and the presence or absence of distant metastasis (M category). Based on the findings of TNM and occasionally non-anatomic values (e.g., Gleason score and prostate-specific antigen level for prostate cancer), an overall summary stage group can be assigned, ranging from Stage I-IV (5). To apply TNM categories accurately, certain staging rules and classifications are necessary, which involve considering the diagnosis date, the timeframe for staging, and utilising prefix stage classifications. The AJCC staging system utilises prefixes including “p” for the pathological stage (pTNM), “c” for the clinical stage (cTNM), and “y” for the post-therapy stage (yTNM), which assists the PBCR in determining the stage at diagnosis (5).
Cancer staging information is typically documented in unstructured free-text format, dispersed across various sources, such as multidisciplinary team meeting notes, medical correspondence, hospital-based cancer databases, and pathology and radiology reports, instead of being stored in structured data fields (6). This unstructured approach makes it challenging to systematically capture assessments of cancer stage in clearly defined data fields suitable for population-level analysis. Moreover, achieving a comprehensive assessment of the stage at diagnosis involves correlating data from multiple diagnostic tests and physician reviews, all of which must align with the staging classification system (5). These individual pieces of staging information may be distributed across different medical records or locations, often spanning several weeks of clinical investigative processes.
Current approaches to the routine collection of cancer stage at diagnosis in PBCRs are constrained by the absence of standardised methodologies for collecting staging data, resulting in poor quality or incomplete data, and difficulties in accessing relevant data sources (4, 7, 8). The use of various staging classification systems also creates challenges in achieving harmonisation and comparisons across jurisdictions and countries (4). The collection of cancer stage at diagnosis in PBCRs has been found to encompass a variety of methods, relying on a wide range of data sources connected to routine data pipelines and collection processes – highlighting the diversity and complexity of how cancer stage information is gathered in PBCRs (4). Additionally, multiple staging classification systems are employed, leading to the use of staging conversion systems and increasing the risk of misclassification (4). The collection of cancer stage in PBCRs has not always been justified due to the substantial effort and time required for manual review and input, which has generally been the primary method for staging (1, 2). This is challenging for PBCRs who often have limited financial and physical resources (3), such as digital health infrastructure and workforce, in addition to the added complication of evolving rules and guidelines in staging systems (9).
In Australia, many PBCRs typically do not collect or report cancer stage information as part of their routine data collection practices, prioritising reporting incidence and mortality rates (10). The inability to meet the demand for cancer stage information to assess outcomes and evaluate healthcare at the population level for cancer control has been a long-standing concern (11, 12). To investigate how to progress this unmet need, each Australian state and territory cancer registry is collaborating with Cancer Australia (a government agency established in 2006 to benefit all Australians affected by cancer) to scope out current collection methods and explore sustainable solutions for routine capture. The Staging, Treatment and Recurrence (STaR) project in 2015 was an early initiative aimed at collecting 2011 cancer stage data in Australian PBCRs (13).
2011 STaR project and current cancer staging approaches in Australia
The 2011 STaR project, piloted by Cancer Australia in collaboration with the Australian Institute of Health and Welfare (AIHW) and state and territory cancer registries, is the sole national-level initiative for gathering staging data (13). The staging data only captured those diagnosed in 2011, and the data from this pilot remains the most recent available (8). It aimed to improve cancer outcomes by providing consistent and accurate staging information to healthcare professionals, researchers, and patients (14).
The STaR project required PBCRs to provide a registry-derived stage (RD-stage) for the top five highest-incidence cancers diagnosed in 2011 (prostate, breast, lung, colorectal and melanoma) (2, 13). RD-stage was defined as the stage category at diagnosis obtained from notification sources routinely available to PBCRs and derived using simplified AJCC business rules and algorithms developed by the Victorian Cancer Registry (VCR) (2). Business rules were developed to articulate the decision-making process used to define each stage category (15). Although the STaR project yielded nearly comprehensive national cancer staging information, with the exception of lung cancer – where almost one-third of staging data remained unknown – it required significant manual effort and training for registry coders to extract TNM data from the mandatory notification sources, as well as adequate resources for applying business rules (15). Additionally, the time spent on deriving RD-stage impacted routine coding processes (2), for example in Western Australia participation was entirely dependent on short-term additional project funding which was not sustained.
Since the conclusion of the STaR project, a limited number of state and territory cancer registries have continued collecting staging information within the constraints of their data pipelines and available resources – Victoria was the only PBCR that continued with RD-stage business processes (10). Other PBCR approaches range from foundational efforts like manually collecting explicit pathological stage data (pTNM) from pathology reports, to developing data science techniques such as natural language processing (NLP) and machine learning (ML) to automate and facilitate extracting information from relevant data sources, reducing or eliminating manual intervention. Figure 1 summarises the current operational business processes of each jurisdictional PBCR for routinely recording cancer stage, the activities that are currently under development by PBCRs, including NLP and ML extraction, as well as efforts and barriers to enhancing data availability (10). The Western Australian Cancer Registry (WACR) is developing NLP and ML techniques through the Western Australian (WA) Cancer Staging Project and, to recognise the implications of utilising diverse data sources for staging, it has proposed a tiered framework for the ongoing collection of cancer stage at diagnosis.
[INSERT Figure 1]
The need for a Cancer Staging Tiered Framework
Our recent research recommends the use of a tiered framework to standardise cancer stage collection, addressing variable data maturity levels among PBCRs throughout Australia (4, 8). The tiered approach not only promotes data standardisation and comparability but also serves as an implementation strategy for capturing stage at diagnosis using existing data, allowing adjustments as data quality and completeness improve. The cancer staging tiered framework enables PBCRs to assess their data systematically, preventing the comparison of incomparable data, and recognising the variability in staging information. By providing a structured methodology, it also ensures adaptability to future updates in staging classification systems, templated and structured pathology reporting advancements, and enhancements in data accessibility. Such an approach has the potential to harness all available data sources, address gaps in national cancer staging comparisons, and yield more accurate estimates of cancer stage at diagnosis, ultimately permitting assessment of patient outcomes and healthcare evaluation at the population level.
Aim
The aim of this paper is to demonstrate the application of a cancer staging tiered framework by the WA Cancer Staging Project in the WACR to establish a standardised method for collecting cancer stage information in PBCRs.
This paper does not adhere to a standard research format, and therefore its remaining structure is organised as follows: 1) Overview of the current approach to collecting cancer stage in the WACR, 2) Development of the cancer staging tiered framework, encompassing the business rules, 3) Application of the cancer staging tiered framework in the WACR to breast, colorectal and melanoma staging data. This is followed by discussions on the quality implications and appropriate use of staging data, the transition between tiers, and considerations for futureproofing the framework.
Collecting cancer stage in the WACR
WACR background
Since 1982, the WACR has provided population-based data on invasive cancer incidence and outcomes (survival and mortality) for use in health service planning, cancer control evaluation, and to support cancer-related research (16). The main sources of information to the WACR are reports from pathologists, haematologists, and radiation oncologists, supplemented by death registrations, hospital statistical discharge records, as well as information from hospital files and clinical information systems. The WACR collects detailed information on patient demographics, tumour-specific details, and diagnosis information.
WA Cancer Staging Project
In collaboration with the WACR, the Cancer Network WA has provided funding to Curtin University since June 2021 to support the WA Cancer Staging Project, which aims to develop and deliver statewide population-based staging in the registry. The project is establishing sustainable data collection methods, including NLP and ML algorithms, to decrease reliance on manual extraction. A Project Advisory Group (PAG), along with expert tumour-specific clinical working groups, offer strategic advice and guidance to the project. Further information on the WA Cancer Staging Project has been published in our recent process evaluation, exploring key stakeholders perceptions of implementing cancer staging into the WACR (8). The findings from our process evaluation highlighted major barriers to collecting cancer staging data, primarily stemming from a lack of standardisation and resulting in limited opportunities for benchmarking and fostering collaboration in cancer research and care.
Collecting cancer stage
The WACR relies primarily on pathology data as the source of cancer incidence. Extent of disease (regional and distant involvement) information often captured in radiology reports necessary for cancer stage is not routinely notified to the WACR (16). Starting in 2018, the WACR has opportunistically collected cancer staging data by manually extracting TNM information from pathology reports during routine coding. This data has been collected based on explicit reporting of TNM values within the pathology report, has not undergone validation and remains incomplete in its capture. For example, only patients who undergo resection of their primary tumour will have pathological stage (pTNM) documented in their pathology report for WACR coding staff to collect. Consequently, there is a possibility of under-staging patients without additional clinical correlation to determine the extent of the disease. This approach results in the exclusion of patients who are not suitable for resection of the primary tumour, especially those with advanced disease.
To facilitate the routine and comprehensive collection of cancer stage in both WACR and other Australian PBCRs, steps must be taken including integration of additional data sources, implementation of staging procedures (business rules), and infrastructure reform. The capacity of the WACR to collect cancer staging within the routine coding process has been limited by the manual effort required, need for trained personnel, the restricted data entry fields in the bespoke WACR database, and the incompleteness of cancer staging information due to the lack of access to radiology reports and other data sources (such as multidisciplinary team (MDT) meeting notes), as highlighted in our process evaluation (8).
To address these challenges in the WACR, the database and data collection tool will need to be enhanced to incorporate additional data fields capturing staging information and other important data elements from multiple sources, including coded hospital admitted patient data (known as the hospital morbidity data collection (HMDC) in WA), containing International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) coding. During our process evaluation, a significant concern arose regarding the outdated WACR database’s ability to accommodate staging information (8). Since updating the existing fields in the registry’s database is not currently possible, the WA Cancer Staging Project has created Research Electronic Data Capture (REDCap) platforms to store and manage all cancer staging information that is currently being collected (17). In the absence of primary sources such as radiology reports (e.g. computed tomography (CT) scans, positron emission tomography (PET) scans, and magnetic resonance imaging (MRI)), the WACR must rely on secondary data, specifically the HMDC, to collect information on disease spread for staging purposes. All HMDC data elements are collected as individual variables in REDCap separate from TNM information obtained from primary sources (e.g., pathology reports). The HMDC data can complement primary sources and storing them individually enables assessment of dependence on secondary sources and how this reliance might evolve over time. The inclusion of secondary data in the routine collection process requires systematic review of all HMDC records that occur within a certain pre-specified time frame of the initial diagnosis date. The timing rule for HMDC collection inclusion was taken from the AJCC and 2011 STaR definitions for determining stage at diagnosis, which states 4 months (120 days) from the date of diagnosis as the window for staging data collection (5, 18). The time frame restriction is critical for accurately determining the extent of the disease prior to initiating first treatment and ensuring the most accurate estimation of TNM staging at diagnosis (5). The WA Cancer Staging Project worked closely with the PAG to define business rules for utilising all data sources in cancer stage assignment. These rules cover various aspects, including: defining inclusion dates for primary and secondary data (e.g., 120 days from the date of diagnosis); determining priority through decision-tree logic (for instance, favouring more advanced TNM values in case of conflicting clinical reports); and allocating stages within the cancer staging tiered framework, as examples. These business rules were also heavily informed by those used in the 2011 STaR project.
Developing the Cancer Staging Tiered Framework
At the outset of the WA Cancer Staging Project, the framework was developed to provide guidance and flexibility for the collection of cancer stage data. It acknowledged the diversity of stage data collection in Australia, emphasising that it is not a one-size-fits-all approach, and recognising that data restrictions are often encountered (4). The tiered framework is a set of rules for staging that incorporates different available data sources and presents an explicit hierarchy of completeness (Figure 2). Tier 1 (the gold standard) facilitates the full AJCC TNM Staging Classification and provides staging information suitable for both epidemiological and clinical use. The lowest level (Tier 3) describes pathology derived stage using basic information available to all registries. While this tier is the least complex, and therefore most achievable, there is a significant risk of under-staging. Tier 2 provides a middle ground by incorporating available secondary data sources to partially fill the gap between full AJCC TNM and pathology derived stage. The tiered framework was aimed at ensuring long-term data integrity, facilitating interoperability (i.e., explicit understanding of the level of staging) for sharing and collaborating using staged data, and, lastly, standardisation for stage categorisation and reporting across time and/or jurisdictions.
[INSERT Figure 2]
Project Advisory Group Involvement
The cancer staging tiered framework was collaboratively developed with the WA Cancer Staging Project’s PAG, which included a range of expertise including healthcare professionals and specialists, Department of Health WA registry and coding staff, consumer representatives (patients with lived cancer experience), health researchers, and cancer organisations (8). The development of the tiered framework was an iterative process that included reviewing the evidence-base and presenting the evidence at consultative meetings with the PAG. Additionally, working groups, primarily consisting of clinical staff specialising in the specific cancer type for which cancer stage data were being collected, were actively involved in the development of the business rules (See Supplementary Material 1 for list of PAG and working group members). Figure 3 summarises this process.
[INSERT Figure 3]
Business rules at each tier
The following section details the business rules of the tiered framework as implemented in the WA Cancer Staging Project for collection of stage at diagnosis for the following tumour groups: breast cancer, colorectal cancer, and melanoma. The subsequent section will then discuss the quality implications and appropriate use of information at each tier, the process of transitioning between tiers, and future proofing the cancer staging tiered framework.
Tier 1: Complete AJCC TNM
A Tier 1 classification is reached when complete AJCC TNM 8th edition can be collected, including all individual TNM values (category and subcategory), summary stage, AJCC staging version, prefix classifications, tumour-specific fields (e.g., depth of invasion for colorectal cancer or hormone receptors status for breast cancer), and information related to the data source (e.g., MDT meeting notes) (see Figure 2). Complete data is collected from trusted clinical sources, such as MDT software. In some cases, the complete TNM staging information may be available in the pathology report if the reporting pathologist has transferred across clinical staging information (such as the M category) from the electronic health record or MDT notes into the report. This is often reported using the “c” prefix to denote clinical staging information. See Table 1. This tier would also be suitable for including any prognostic staging scores. Additional information on the collection of prognostic staging is provided in the section titled ‘Future proofing the cancer staging tiered framework’. The recommended minimum dataset and data sources for each tier is available in Supplemental Material 2. Tier 1 data is suitable for clinical and epidemiological population-based analyses.
Table 1. Tier 1: Complete AJCC TNM (Colorectal Cancer)
Pathology Report with Clinical Staging Information
|
Summary Stage
|
Tumour (T)
|
Nodes (N)
|
Metastasis (M)
|
pT1
|
pN2
|
cM1a
|
Stage IVA
|
Based on the data available above, the final TNM derived is pT1N2cM1a, Stage IVA. The bolded “p” and “c” represent “pathological” and “clinical” as the value’s data source.
Tier 2: Registry-Derived stage
The WACR derives RD-stage using available data sources where full AJCC TNM cannot be collected. Within the cancer staging tiered framework, this is classified as Tier 2. Tier 2 builds and expands on Australian RD-stage methods (STaR project), and the collection includes individual TNM values where possible, summary stage, AJCC staging version, prefix classifications, tumour-specific fields, and the data source of TNM values (e.g., currently pathology reports or HMDC). In the WACR, Tier 2 leverages data supplementation from secondary data in the HMDC to make assumptions for nodal and distant metastases (see Table 2). HMDC is available from public and private facilities. To assign a summary stage, assumptions are made that missing/not stated variables are considered absent (i.e., NX=N0 or MX=M0). For example, if a secondary metastatic disease code is present in HMDC (M=1), summary stage IV is assigned in colorectal cancer. In contrast, the absence of a secondary metastatic disease code (MX = M0) and a positive nodal involvement code (N=1) would result in assigning summary stage III. A limitation of Tier 2 collection is that the subcategory of nodal and distant metastases (i.e., N2a – four to six regional lymph nodes are positive in colorectal cancer) cannot be attained from ICD-10-AM coding in the HMDC. The ICD-10-AM coding only provide binary (yes/no) detail as to whether there are involved lymph nodes or secondary metastases present and does not provide the count of involved nodes or distant sites required for subcategory classification (see Table 3). This may potentially lead to under-staging; however, it still allows for the appropriate allocation of TNM within the main (umbrella) stage category. Since RD-stage is derived solely from this limited dataset and excludes additional factors like radiology reports and clinical correlation to assign stage, the data generated is recommended for primary use in population-based epidemiological studies.
Table 2. Tier 2: RD-Stage Only (Colorectal Cancer)
Pathology Report
|
Hospital Morbidity Data Collection
|
Summary Stage
|
Tumour (T)
|
Nodes (N)
|
Metastasis (M)
|
Nodes (N)
|
Metastasis (M)
|
pT1
|
NX
|
MX
|
N1
|
M0
|
RD-Stage III
|
Based on the data available above, the final TNM is T1N1M0, RD-Stage III, using the business rule assumptions that a positive lymph node ICD-code is equal to N1.
Table 3. Tier 1 and 2 Comparison (Colorectal Cancer)
Pathology Report
|
Hospital Morbidity Data Collection
|
Tier 1
Summary Stage
|
Tier 2
Summary Stage
|
Tumour (T)
|
Nodes (N)
|
Metastasis (M)
|
Nodes (N)
|
Metastasis (M)
|
pT1
|
pN2a
|
M0
|
N1
|
M0
|
Stage IIIA
|
Stage III
|
Based on the data available above, the Tier 1 TNM derived is T1N2aM0, Stage IIIA. If pN2a was not available in the pathology report and HMDC is the only data source for nodal involvement, this will be categorised as Stage III (pT1 from pathology and N1M0 from HMDC) without the subcategory detail of Stage IIIA because information on the number of involved nodes is not available in HMDC. Nodal values are bolded for easy comparison.
Tier 3: Pathology Stage
Tier 3 (also “Pathology Stage”) collection is when Tiers 1 or 2 cannot be collected, and the WACR will collect the pathological stage described only in pathology reports. The collection includes individual pTNM scores where available, AJCC staging version, prefix classifications, and tumour-specific fields (see Table 4). In the WACR, not all patient events are recorded in HMDC, with the most common examples being patients who are treated privately or as an outpatient as they are not admitted to a public hospital. A limitation of relying solely on pathology stage is that the summary stage may not accurately represent the complete extent of disease, potentially resulting in under-staging, especially when patients have clinically confirmed metastatic disease that is not reported in HMDC. Additionally, this approach is susceptible to bias since it predominantly includes patients undergoing surgical modalities. In instances where patients have received neoadjuvant therapy (NAT) before resection – an often preferred approach for certain cancer types – the pathology report may not explicitly indicate whether NAT was administered prior to the resection. This can occur due to the pathologist’s lack of awareness regarding the patient’s prior treatment or their failure to use the “yTNM” classification. The data generated through the Tier 3 method is recommended for epidemiological studies to offer a minimum level of insight into disease patterns and the population-level burden of the disease.
Table 4. Tier 3: Pathology Stage (Colorectal Cancer)
Pathology Report
|
Summary Stage
|
Tumour (T)
|
Nodes (N)
|
Metastasis (M)
|
pT1
|
pN2a
|
pMX
|
Stage IIIA
|
Based on the data available above, the final pathological stage is pT1N2aMX, Stage IIIA. This data does not consider admitted hospital data, clinical correlation with radiological imaging or MDT consultation. The bolded “p” represents “pathological” as the value’s data source.
Applying the Cancer Staging Tiered Framework: The WACR experience
Due to current data pipeline and infrastructure in the WACR, Tier 2 is typically achievable in most instances and is anticipated to remain the primary classification for most cases in the WACR well into the future. However, Tier 1 (Complete AJCC TNM) remains the gold standard for stage collection in the registry, should the data sources become available (8).
Table 5 illustrates the utilisation of the cancer staging tiered framework in the analysis of breast cancer, colorectal cancer, and melanoma cases collected in the WA Cancer Staging Project. Within the melanoma cases reported for 2019-2020 (N=3049), the staging data exhibited varying levels of completeness across the defined tiers: Tier 1 was reached by only 1% of the cohort (N=20), while Tier 2 displayed an expected 98% completion rate (N=2981). Notably, Tier 3 showed no instances (0%, N=4), and 1% of cases were categorised as unstageable due to the absence of available staging data (N=44).
In contrast, the staging data for colorectal cancer cases in 2019 (N=999) showed more diverse results across tiers: Tier 1 was attained by 18% (N=182), Tier 2 by 60% (N=598), and Tier 3 by 2% (N=21). Additionally, 20% of colorectal cases were classified as unstageable (N=198). The staging data for breast cancer cases in the same year (N=1712) most closely resembled the distribution observed in colorectal cancer cases. Specifically, Tier 1 was attained by 5% of cases (N=84), Tier 2 by 72% (N=1229), Tier 3 by 1% (N=18). Notably, unstageable cases comprised 22% of the breast cancer cohort (N=381).
The primary reason for tier distribution differences among the cancer types can be attributed to the distinct treatment approaches adopted for each group of cancers as recommended by the optimal care pathways.(19) As an example, stage data is frequently found in histopathology reports for cancer types that necessitate immediate resection post-diagnosis and have a higher incidence of early-stage cancer detection, like melanoma.
Table 5. Tiered staging framework application in melanoma, colorectal cancers, and breast cancers
Tier
|
Melanoma
2019-2020 (N=3049)
N (%)
|
Colorectal Cancer
2019 (N=999)
N (%)
|
Breast Cancer
2019 (N=1712)
N (%)
|
Tier 1
Full AJCC TNM
|
20 (1)
|
182 (18)
|
84 (5)
|
Tier 2
RD-Stage
|
2981 (98)
|
598 (60)
|
1229 (72)
|
Tier 3
Pathology Stage
|
4 (0)
|
21 (2)
|
18 (1)
|
Unstageable
|
44 (1)
|
198 (20)
|
381 (22)
|
Quality implications and appropriate use of staging data
The cancer staging tiered framework allows for the utilisation of staging data, irrespective of its level of completeness. Additionally, this framework enables PBCRs to assess their data in comparison to other PBCRs and exercise caution when interpreting data across various tiers.
To ensure the reliability and accuracy of the cancer stage collection at each tier, ongoing data quality validation processes are in progress. The staging data, currently extracted from pathology reports and supplemented by HMDC where necessary, undergoes ad-hoc validation. In this process, the WA Cancer Staging Project-collected data is compared with hospital clinical datasets containing staging data, particularly clinician-collected databases. As the WACR acquires additional existing stage datasets, further validation will take place. Additionally, oversight for HMDC is carried out by the Department of Health WA Data Quality Team, which executes formal validation processes on the dataset (20).
The depth and specificity of information available at each tier directly influence the accuracy and quality of cancer staging. When considering individual patients, the issue of data completeness arises because of variations in treatment pathways; not all patients undergo the same number of healthcare service interactions, resulting in differences in the availability of the detailed information required for stage calculation. Tier 1 achieves full data completeness by relying on comprehensive clinical data, offering the highest level of clinical accuracy in cancer staging as it directly draws from patient-specific clinical information, such as clinical and pathological correlation. In contrast, Tier 2, relies on assumptions about nodal and distant metastases based on secondary administrative data (hospital admitted patient data). While still providing reasonably accurate staging information, there may be some reduction in clinical accuracy. The accuracy of Tier 2 varies according to the cancer type; for example, lung cancer, with a higher incidence of metastatic disease at diagnosis (13), will likely yield more frequent metastatic disease codes in hospital admitted patient data compared to cancer types diagnosed at earlier stages. Lastly, Tier 3 exclusively utilises data from pathology, potentially resulting in an incomplete collection of the extent of disease. The clinical accuracy of this tier also varies according to the cancer type. For instance, earlier-stage cancers, such as melanoma (13), where surgical resection of the primary tumour is the initial treatment, are more likely to have pathological staging available. In contrast, cancers diagnosed at later stage, where resection is not an option, may lack this information. The tiered approach balances clinical accuracy with data availability, ensuring that cancer staging remains relevant and informative across various data sources and contexts.
Despite potential concerns surrounding the conclusions drawn from the lower tier’s limited data, the insights it provides support a broader understanding of the population’s disease patterns, prevalence, and trends. It also aids in evaluating common risk factors and assessing overall disease burden at a population level, which can prove invaluable throughout public health planning, resource allocation, and policymaking decisions. Even at lower tiers, extracting staging data provides a valuable resource for epidemiological insights that would remain unknown if staging were completely unreported.
When analysing staging information across multiple tiers, the use of information should be targeted at the lower tier, as it offers a more conservative and standardised approach, minimising potential risks of misclassification associated with lower-tier data (for example, under-staging). For example, in a calendar year with staging data covering both Tier 1 and Tier 2 cases, the analyst should treat the entire cohort as Tier 2 and be used specifically for epidemiological analysis only, following the business rules for Tier 2. In certain cases, the staging data may be subject to separate analysis. For instance, if the public sector’s data contains only Tier 1 staging information due to the integration of an MDT meeting software in WA during a calendar year of data capture, this cohort could be analysed for clinical use, as well as for epidemiological use, while keeping the private sector data, which may not have adopted MDT meeting software, separated. The inclusion of the tier alongside staging details allows the analyst to interpret and utilise the information appropriately.
Future improvements in staging information and quality could involve integrating Tier 1 and 2 staging data with the routine collection of Patient Reported Outcomes Measures (PROMs) and Patient Reported Experience Measures (PREMs). PROMs capture patients' self-reported information on health-related aspects, such as symptoms and quality of life, while PREMs gather feedback on overall experiences with healthcare services, assessing satisfaction and perceptions of care (21). Increasingly utilised in Australian registries, PROMs and PREMs have demonstrated benefits, including enhancing transparency of care, facilitating quality assessment, and enabling cost-effectiveness analysis (22). These tools offer the potential for further comparisons with cancer treatments and cancer registries, informing healthcare delivery (22). Integrating staging information with routine patient-reported data provides a more holistic understanding of both the clinical and patient-centred aspects of cancer care. Embedding this data in PBCRs not only supports continuous quality improvement by identifying areas for enhancement in clinical care and patient experiences but also creates opportunities for population research on the relationship between clinical outcomes and patient-reported data, contributing to evidence-based practices. The importance of collecting PROMs and PREMs as essential quality measures has been emphasised by the WA Cancer Staging Project’s PAG and is acknowledged in the literature, including the Australian Cancer Plan (23, 24).
Transition between tiers
Lower tiers are only employed when capacity does not exist to collect the highest tier. A PBCR may collect staging information at all three tiers at any given time. As new data sources emerge, a PBCR might transition towards a higher tier for a greater proportion of cases. For instance, using the earlier example, if MDT meeting software is integrated across the public health sector in WA and linked into the WACR, this integration could facilitate a shift to Tier 1 collection for select cases. This shift is feasible due to the expectation that MDT meeting data will contain explicit clinical stage (cTNM). However, the private sector might lack this capacity, necessitating the continued use of Tier 2 or 3 collection.
Future proofing the tiered staging framework
A tiered staging framework that standardises the collection of cancer stage in PBCRs not only enhances data consistency and comparability, but also ensures adaptability to improved access to more comprehensive data and updates in staging classifications. The framework presented in this paper captures anatomic stage information, offering insights into the extent and location of cancer within the body, as indicated by TNM. However, staging classifications continue to evolve in response to advancements in diagnostic and treatment technologies, alongside the discovery of clinically relevant tumour markers. There is a shift towards a more personalised approach that combines anatomic staging with biological and molecular markers (9). This amalgamation aims to provide a more precise prognostic stage, with the ultimate goal of enhancing prognosis prediction and optimising treatment delivery, leading toward a more individualised approach to cancer management and better outcomes (9). To incorporate prognostic staging into data collection, additional data variables are necessary. Typically, these data variables are available and summarised within data sources at a Tier 1 level, such as in clinical MDT notes. PBCRs in Australia have centred their efforts on the collection of anatomic stage data (10). The data sources necessary for capturing these additional data variables for prognostic staging are either absent from their minimum datasets or have not yet been utilised to enhance their staging information. If a PBCR aims to integrate prognostic staging into its data collection, this expansion would logically align within a Tier 1, considering it has the technical capability to capture the necessary data variables within the existing structure. The tiered framework demonstrates a dynamic framework adaptable to changes in cancer staging classifications and data inputs.
Integration of high-quality data sources and improvement in data collection processes, as advances occur in their use and availability, is necessary for enhancing the collection of high tier cancer staging data. The inherent flexibility of the tiered framework, enabling registries with limited data (Tiers 2 or 3) to adapt their staging information collection according to available resources, provides a versatile approach to facilitate comparability between population-based cancer registries with varied resourcing. This approach effectively mitigates the risk of data collection efforts being abandoned due to constraints related to data sources and infrastructure. The framework also motivates registries to continually refine their data collection procedures, particularly in recognising potential improvements, which may assist to futureproof the tiered staging framework. For instance, the four major pathology laboratories organisations in WA are actively improving the completeness of pathology data supplied to the WACR. This was an outcome of a pathology roadshow delivered by the WA Cancer Staging Project following on from a process evaluation recommendation (8). Pathologists who are leading the way in the WA Cancer Staging Project’s pathology working group are currently assessing their compliance with structured reporting standards, resulting in the generation of more robust, complete, and comprehensive staging data. This enriched data will be funnelled into the WACR data pipeline for NLP and ML extraction, subsequently, channelling it into the tiered approach to result in a larger number of patients with complete data and minimising the number of unstageable cases.
The WACR is currently engaged in active discussions with the Australasian Association of Cancer Registries (AACR), highlighting the WA experience (10). As a result, the AACR is working towards establishing a comprehensive national tiered staging framework within Australia, taking into consideration a full assessment of the diverse data sources each state and territory PBCR has access to and their feasibility to collect each tier (10). A national framework would enable all PBCRs to engage in the national collection of cancer stage data, thereby fostering national benchmarking. To support the implementation of a national framework, our framework, as experienced in the WA setting, provides an estimation of the data sources required to achieve each tier and has assisted with establishing a national tiered staging framework. While exploring potential next steps, assessing the feasibility of applying our business rules to other PBCRs in Australia, and testing the adaptability and effectiveness of the framework, may prove valuable for advancing a national staging initiative.
Strengths and limitations of this study
A wide range of experts, including a project advisory group and expert clinical working groups comprising healthcare professionals and specialists, the Department of Health WA registry and coding staff, consumer representatives (lived experience of cancer), and delegates from major cancer organisations, collaborated in developing the cancer staging tiered framework. The framework’s flexibility in accommodating diverse data collection approaches in recording stage at diagnosis recognises the need for tailored solutions. It allows long-term data integrity, interoperability, and standardisation for effective cancer-stage data management. Each tier within the framework incorporates adaptable business rules that can evolve alongside improved resources, data pipelines/sources, and technical capabilities. Additionally, the framework permits the utilisation of staging data regardless of its completeness, facilitates inter-PBCR data comparison, and distinguishes between clinical and epidemiological applications for better data interpretation. A limitation of the framework is that it requires additional resources and time for data collection and management, potentially posing logistical challenges for some PBCRs.