Median follow-up was similar between the data sources at each year (Figure 1).
Completeness of death reporting
The majority of reported deaths at each data freeze timepoint were reported from both data source; some deaths were only covered by trial-specific data collection and more were only reported from the NHS Digital dataset (Figure 2).
The numbers of deaths occurring in only one data set and the year in which the death appeared in the data set are shown in Table 1. For either data source, there are only a few that are persistently missing from the second data source (5 only available in trial specific data collection and 8 in NHS digital data).
Table 1: For each data source, the numbers of deaths in each year only known from that data source, including the year at which the death was known
Year of freeze
|
Total
|
Year of event appearing only in this data source
|
|
|
2013
|
2014
|
2015
|
2016
|
2017
|
new
|
Available only in trial-specific data collection
|
2013
|
4
|
-
|
-
|
-
|
-
|
-
|
4
|
2014
|
9
|
3
|
-
|
-
|
-
|
-
|
6
|
2015
|
6
|
2
|
4
|
-
|
-
|
-
|
0
|
2016
|
11
|
2
|
3
|
0
|
-
|
-
|
6
|
2017
|
15
|
2
|
3
|
0
|
5
|
-
|
5
|
2018*
|
11
|
2
|
3
|
0
|
3
|
0
|
3
|
|
|
|
|
|
|
|
|
Available only in NHS Digital dataset
|
2013
|
21
|
-
|
-
|
-
|
-
|
-
|
21
|
2014
|
31
|
9
|
-
|
-
|
-
|
-
|
22
|
2015
|
85
|
5
|
18
|
-
|
-
|
-
|
62
|
2016
|
47
|
4
|
4
|
16
|
-
|
-
|
23
|
2017
|
92
|
4
|
4
|
16
|
11
|
-
|
57
|
2018
|
68
|
4
|
4
|
12
|
6
|
15
|
27
|
Table 1 Legend: The total number of deaths only in one data source in each year are given in the total column. The columns to the right, are a breakdown of the total. The first five columns show when the death first appeared in the data set. The far right column, labelled ‘new’, gives the deaths only from the data source in that year.
*For example: In 2018 of the 11 deaths only available in the trial specific data collection (top part of table), two were known of in 2013, three in 2014, 3 in 2016 and 3 new in 2018. With the assumption that deaths from 2013 and 2014 will not be found after 2018, two plus three deaths, result in five deaths only in trial specific data and completely missing from NHS Digital data.
Agreement in death reporting
In 2013 and 2014, the median time difference between reported death dates in the trial and NHS Digital data is 10 days, ranging from one day to nearly a year additional file 2 Table A2. The disparities appear to vanish in 2016 and 2017 (additional file 2 Table A2) because a new policy was employed by the trial team to simply copy the date of death given by NHS Digital data into the BOSS dataset. By 2018, the original trial date of death was favoured (additional file 2 table A2).
Timeliness of death reporting
Figure 3 depicts the year of first reporting a death in the BOSS trial-specific data against the year of first reporting in NHS Digital data and therefore visualises the time lag in reporting between the two datasets. Many deaths are reported in the same year by both data sources. The remainder are mainly reported earlier by NHS Digital. In 2015, the small number of same-year-reported deaths is due to the trial-specific data extraction error, discovered by the statisticians in 2016.
The time lag between date of death and data freeze in which the death information appears is presented in Figure 4. Deaths have been assigned to a respective reporting period if the death date falls within the period from one data freeze to another. This way, deaths which happen in one period but are reported in a later one are marked. This problem with time lag is more apparent in the trial-specific data collection data, with one obvious typo included in the data.
The closeness of the date of death to the data freeze was investigated for the RCHD death data (additional file 3 Table 3). In our data sets, death dates are at least 7 weeks or more before the data freeze date.
Agreements in Cause of death
Table 2: Oesophageal cancer as cause of death within the two datasets for 2018
|
|
BOSS trial-specific data
|
|
|
|
OAC
|
Other
|
total
|
NHS Digital data
|
OAC
|
15
|
4
|
19
|
Other
|
2
|
267
|
269
|
|
Total
|
17
|
271
|
288
|
Table 2 legend: The table displays the number of deaths with oesophageal adenocarcinoma as reported cause of death in each data source. Agreements are shown, if both datasets state a death due to the diagnosis of interest or both report a death due to any other cause. OAC –oesophageal adenocarcinoma other; other - non-OAC cause of death
In the 2018 data freeze, including all previous years, there were a total of 288 deaths reported by both BOSS trial data and NHS Digital (Table 2). Assessing the agreement in reported cause of death – diagnosis, there were 282 cases (97.9%) with agreeing diagnoses. Given the limited analysis, this means that the data sources accordingly distinguished between a death caused by oesophageal cancer and any other cause. 2.1% were disagreeing causes; in two cases the trial data reported a death due to oesophageal cancer which was not logged in NHS Digital. In four cases NHS Digital reported oesophageal cancer as primary cause of death which was not confirmed by trial staff.
Implications for overall survival estimates
The impact on data maturity and overall survival, not split by allocated treatment group, is given in Figure 5, depicted for every year of data freeze and separated by data source.
Diagnosis of new cases oesophageal cancer
Between both data sets, there were 47 cases of oesophageal cancer reported until the 15 Nov 2016. Five were only reported in the NHS Digital data, 34 only in the trial specific data, and the remaining 8 were reported in both datasets
Of the eight shared cases, none has the same date of diagnosis given in both datasets. The time difference ranges between two and 28 days and therefore suggests different information sources for both datasets.
For the 42 trial-specific oesophageal cancer diagnoses, there was a median of 3.6 years (range 0.1-5.3) from randomisation to the cancer being diagnosed. For the 13 oesophageal cancer diagnoses in the NHS Digital data there was a median of 2.8 years (range 0.3-5.5) from randomisation to the cancer being diagnosed. Due to the limited number of cases and the differences in the datasets, no time-to-event analysis was conducted.