Antigenic diversity and dengue disease risk

Summary: Many pathogens continuously change their protein structure in response to immune-driven selection, resulting in weakened protection. In addition, for some pathogens such as dengue virus, poorly targeted immunity is associated with increased risk of severe disease, through a mechanism known as antibody-dependent enhancement. However, it remains a mystery whether the antigenic distance between an individual’s first infection and subsequent exposures dictate disease risk, explaining the observed large-scale differences in dengue hospitalisations across years. Here we develop an inferential framework that combines detailed antigenic and genetic characterisation of viruses, and hospitalised cases from 21 years of surveillance in Bangkok, Thailand to identify the role of the antigenic profile of circulating viruses in determining disease risk. We find that the risk of hospitalisation depends on both the specific order of infecting serotypes and the antigenic distance between an individual’s primary and secondary infections, with risk maximised at intermediate antigenic distances. These findings suggest immune imprinting helps determine dengue disease risk, and provides a pathway to monitor the changing risk profile of populations and to quantifying risk profiles of candidate vaccines.

It has been shown that in an endemic setting, there are shifts in the antigenic properties of the DENV strains that circulate within any year, and long-term trends with circulating viruses tending to become more antigenically distant over decadal time frames 13 . However, it remains unknown if the antigenic relationship between circulating viruses and those that have previously infected individuals has any bearing on the risk of severe disease. Knowing this could improve our ability to predict the potential risk of newly circulating viruses. It would also provide biological insight into a role for antigenic imprinting to explain an individual's long-term disease risk, where the specific virus that first infects an individual largely determines future disease risk when they are exposed to antigenically different viruses [14][15][16] . Further, the specific antigenic properties of strains used in vaccines could be evaluated for their potential to cause disease from generating sub-neutralisation titres, the magnitude of which may differ across settings due to heterogeneity in strain-specific immune histories and the antigenic characteristics of circulating viruses 17,18 .
Understanding the role of antigenic space in determining disease risk is highly complex. We need a detailed characterisation of the viruses circulating within a community over long time periods, the antigenic properties of the circulating viruses, and an understanding of who is getting sick in that same setting. Hospitalised cases in surveillance systems are also overwhelmingly from secondary infections, which means that the majority of primary infections and their antigenic signature are unobserved 19 . Here we overcome these hurdles through combining detailed genetic (N=2,587 viruses) and antigenic (N=348 viruses) characterisation of DENV isolated in Bangkok, Thailand, from 1994 to 2014, with long-term serotype and age-specific linelist case data from a large children's hospital in the city (N=15,281 cases in individuals 1-14y) ( Figure 1A, Figure S1, and Methods). We develop a mathematical framework that integrates over birth-cohorts lifetime exposures to the virus to explore whether the specific serotypes and antigenic distance between viruses causing primary and secondary infections are linked to disease risk.

Antigenic and genetic characterisation of dengue viruses
We use a detailed characterisation of antigenic space, where viruses from Thailand and 20 other countries were used in neutralisation assays 13 . The viruses were individually tested by plaque (immunofocus) reduction neutralisation test (PRNT), with antisera from African green monkeys that had been inoculated with reference viruses that capture the breadth of DENV antigenic diversity (total of 8,643 neutralisation measurements) 13 . This allowed us to build a threedimensional antigenic map, where the Euclidean distance between any two viruses is inversely proportional to the capacity for the antiserum raised against one virus to neutralise the other virus. To increase the resolution of the population antigenic profile in any year, we used full genome sequences to place additional viruses 20 circulating in Bangkok from 1994 to 2014 on the antigenic map ( Figure 1B). For each sequenced virus, we identified the genetically closest virus that was used to construct the original antigenic map and gave the sequenced virus the same coordinates. As the original map was developed with a broad representativeness of circulating viruses, there was little genetic difference between those used to build the map and the extra sequenced viruses (median difference of 4 amino acids across the genome) ( Figure 1C). We consider the sequenced viruses from a serotype as representative of the viruses of that serotype circulating in Bangkok in the study period. This enhanced antigenic map therefore can be used to capture the changing antigenic profile of the virus population.
By considering the viruses that are possibly responsible for an individual's primary and then secondary infections (i.e., sequential in time), we find that the overall mean inter-serotype distance is 5.74 units, ranging from a mean distance of 3.82 units for individuals with a primary DENV-3 4 infection followed by a DENV-4 infection (equivalent to a 14.1 fold change in dilution), to a mean distance of 6.46 units for individuals with DENV-1 followed by a DENV-2 infection (equivalent to an 88.0 fold change in dilution) ( Figure 1D, Table S3). The antigenic distance can be significantly greater or shorter depending on the specific viruses within each serotype responsible for the primary and secondary infections (coefficient of variation of 0.24 across the different serotype pairs) ( Table S3).

Risk of severe dengue by serotype pair and antigenic distance
We use the age and serotype information of cases caused by secondary infections that attended our surveillance hospital to explore whether the specific order of serotypes an individual was infected with, as well as the antigenic distance between the specific viruses, is linked to risk of disease. Secondary infections were identified as part of standard protocols in this hospital based on IgM/IgG ratios between acute and convalescent samples 19 . While we do not know the serotype, virus, and age of each individual's primary infection, we can use the age and serotype information along with our observed distribution of viruses in prior years to integrate over all possible infection histories, where we also estimate the annual force of infection by serotype (Methods).
To model the sequential infections for each case, we assume that the force of infection for a given serotype is equivalent for people at risk of primary and secondary infection, but following a primary infection, susceptibility to other serotypes is mediated by cross-protection that lasts one year, with the magnitude of cross-protection estimated by our model. We assume no homotypic reinfections 10,11 . To assess the role of antigenic diversity in determining disease risk from a secondary infection, we develop three different models where the probability of becoming a case in hospital depends on (1) the identity of the secondary infecting serotype only, (2) the serotype of both the primary and secondary infection, and (3) the serotype of the primary and secondary infection and the antigenic distance between the two infecting viruses (full model). These different model hypotheses were compared using several model comparison metrics (Tables S1, S2). The model that considers the serotype of both the primary and secondary infection performed better than the model only considering the serotype of the secondary infection (ΔDIC = 63). The full model that further incorporates the antigenic distances separating viruses has the best performance as compared to the model only considering the serotype of both the primary and secondary infection (ΔDIC = 32). This provides strong evidence that the serotypes of both the historic primary infecting virus and the secondary virus are needed to explain the observed patterns of disease in hospitals, and moderate evidence that the specific viruses of both infections are also important.
In the full model, we find that the probability of disease was greatest for those with a primary DENV-2 and a secondary DENV-1 infection, with a relative risk of 2.15 (95% credible interval (CrI): 1.49-2.85) compared to a DENV-1 followed by DENV-3 infection (the reference pair) 5 ( Figure 2A). Disease risk was consistently lower for secondary DENV-4 infections, consistent with other findings 21 . We estimate that in the first year following primary infection there is a 69.8% (95% CrI: 61.1-76.7%) reduction in the probability of infection, consistent with temporary crossprotection as identified elsewhere 22 . The mean antigenic distance between each serotype pair was not associated with the probability of disease for that pair (p-value 0.77). We find similar relative risk of disease by serotype-pair for models that did and did not include antigenic distances between virus pairs ( Figure S2A), suggesting that underlying serotype-pair risk differences are mediated through immune mechanisms not linked to quantitative measures of antibody titre. This may include proinflammatory cytokines induced by cross-reactive T cells or other factors [9][10][11][12]23 that are parallel to the progression of humoral immunity.
We find that over and above the effect of the infecting serotype of the primary and secondary infections, disease risk is maximised at intermediate antigenic distances between the two infecting viruses ( Figure 2B). We find that peak risk occurs when the antigenic distance between the primary and secondary infection is around 5.5 units (linear titre difference of 45.3). This provides evidence that the changing antigenic profile of circulating viruses within a serotype shifts the disease risk of the population. By translating antigenic distance to an approximation of the absolute titres generated from a primary infection when measured against different secondary infecting viruses, we estimate that peak risk occurs around an absolute titre of around 1:35 ( Figure S3). These findings indicate a clear role of antibody titres in driving disease risk, with the specific pair of viruses that lead to primary and secondary infections altering the probability of disease. The magnitude of titres linked to peak risk is consistent with previous work 6,7 that identified the association between intermediate titres averaged across four serotypes and disease risk in longitudinal cohort studies, where the specific antigenic properties of circulating viruses could not be considered due to limited prototype antigens used in the serologic assay.
Using the full model, we find an average annual force of infection of 0.047 (95% CrI: 0.043-0.052) across the 18 years, with an overall steady decline over the time series ( Figure S4), consistent with that found elsewhere 24 . Unlike previous efforts, our model also allows us to estimate serotypespecific changes in the force of infection ( Figure 3A). We identify a pattern of cycling between the serotypes, with a mean period of 6.3 (95% CrI: 5.6-7.8) years between recurring epidemic peaks caused by the same serotype (Table S4). We find limited correlation between the annual force of infection of the different serotypes (mean absolute correlation of 0.17) (Table S5). We find a strong association between the annual force of infection and the mean number of hospitalised cases across all ages for each serotype in each year (mean correlation coefficient of 0.82; Table  S6). To evaluate the adequacy of our model estimations, we reconstruct the serotype and agespecific counts of secondary cases hospitalised in each year from 1997 to 2014, using the posteriors of the full model to simulate the infection history of each person in Bangkok, Thailand (Methods). Results of our reconstruction are consistent with our original data of secondary cases, even for 6 years with distinct serotype patterns ( Figure 3B, Figure S5), and can explain 88.1% of the deviance in the annual serotype and age-specific case counts in our surveillance hospital.

Evolution of population disease risk by year and age
We use our full model to capture the changing disease risk for individual birth cohorts. We compare the risk of hospitalisation among those experiencing a secondary infection to focus on the role of antigenic properties of the circulating viruses in driving disease risk, rather than the role of extrinsic factors (e.g., climate factors) (Methods). We find that the relative risk of hospitalisation following a secondary infection can be up to 1.4 times as high for some age groups and years, as compared to the overall mean risk of hospitalisation from a secondary DENV infection across all age groups and years in the study ( Figure 4A). It also shows a distinct pattern beyond the empirical analysis using secondary dengue cases observed in the hospital alone ( Figure S1).
We find that overall, the risk of disease from a secondary infection is much less variable across age groups within a year ( Figure 4B) than across years ( Figure 4C). To assess how the antigenic properties of the circulating viruses and the serotype-specific epidemic patterns can drive the changing risk profile in the population by year, we compare the variability in disease risks by year across the different models. We find a greater year-to-year variability in underlying disease risk when using both the virus and serotype information to estimate disease risks (coefficient of variation of 0.20 for the full model; Figure 4C), compared to using only serotype information to estimate disease risks (coefficient of variation of 0.12 and 0.16 for the other two serotype-based models; Figure 4C). Therefore, alongside the primary and secondary infecting serotypes, interannual variation in the antigenic properties of the circulating viruses has important contributions to the changes in disease risk ( Figures S8-9).
To assess the extent to which the fluctuating risk of disease by year is driven by the cyclical epidemic nature of the four dengue serotypes as shown in Figure 3A, we repeated the above analysis but instead assumed that all serotypes circulate with the same constant force of infection across all years, with the antigenic properties of the circulating viruses remaining the same. In this scenario, we find that the disease risk from a secondary infection has substantially weakened variance across age groups and years ( Figure S6). These comparisons suggest that the cyclical epidemic patterns of the four serotypes mediate both the cyclical antigenic priming of individuals and their subsequent secondary infections, which in turn propel the development of severe dengue. The extent of such disease progression depends on both the specific serotype and the antigenic characteristics of the sequential infecting viruses.
We further use the full model to assess the respective importance of antigenic imprinting or secondary infection in determining disease risk. In a hypothetical scenario, if disease risk from a secondary infection depends solely on the nature of the primary infecting virus, individuals with the same primary infecting virus would have no variation in the risk of disease across subsequent years, regardless of the secondary infecting virus. Conversely, if disease risk depends solely on the secondary infecting virus, individuals with the same initial infecting virus would have high variability in the year-to-year risk of disease, as the primary infecting virus becomes irrelevant. We find that the variation in the risk of subsequent disease is comparable between individuals with a specific primary infecting virus across different possible secondary viruses (coefficient of variation of 0.86 (95% CrI: 0.27 to 1.68)) and those with a specific secondary infecting virus across different possible primary infecting viruses (coefficient of variation of 0.69 (95% CrI: 0.24 to 1.38)). This shows an approximately equal contribution of the primary and secondary infecting virus to the risk of disease.

Discussion
Our findings challenge the prevailing paradigm that the introduction of a new serotype is responsible for shifts in disease risk in a population. Instead, our findings suggest a more nuanced picture where the specific impact of a new virus on patterns of disease will depend on both the characteristics of that virus and the population immunity derived from previous circulations of antigenically different viruses, with the impact of a particular virus being potentially different across populations with different exposure histories. Our findings are consistent with the concept of original antigenic sin and other related hypotheses, such as immune imprinting, where an individual's first infection largely determines which viruses they are most protected against and contributes to future disease risk by other related viruses 15,25,26 . Our approach to bring together sequence data, antigenic maps, and surveillance data into integrative frameworks will be relevant to other antigenically variable pathogens, including influenza, SARS-CoV-2, and norovirus 1,2,27 . They also provide a route to determining the evolutionary pathways that viruses take in adapting to local immunity.
Our findings support a role for the standardised antigenic characterisation of circulating DENV strains against reference antisera mounting immunity to a diverse set of DENV, as is done with influenza 28 . By characterising the virus-specific immune history of populations, we could quantify setting-specific risk profiles to existing and emerging strains. Systematically quantifying the properties of human sera against previous and current circulating strains will help discern the changes in immune profiles and pathogenesis after natural infection and vaccination [29][30][31] . This will become increasingly important as dengue vaccines begin to get used 12,17,18 . The antigenic distance between vaccine strains and locally circulating strains may help explain underlying differences in the efficacy of vaccines across populations, with the potential of a long-term goal of tailored vaccines based on local antigenic profiles.

Figure 1. Long-term hospital-based dengue case data along with the antigenic and genetic characterisation of dengue viruses in Bangkok, Thailand. (A)
Monthly number of secondary dengue cases hospitalised in the Queen Sirikit National Institute of Child Health in Bangkok, Thailand, from 1997 to 2014. Infecting age and year were known for each case, with 69.7% of cases diagnosed with the infecting serotype. The inset illustrates the aggregation of the original case linelist data into the serotype and age-specific case counts per year. Table S7 summarises the age groups analysed for each year. (B) Two-dimensional antigenic map of 2,594 Thailand viruses across four serotypes, coloured by year of isolation. Each coloured circle indicates one of the 348 dengue viruses antigenically characterised using PRNT assay. The size of each circle indicates the number of sequenced viruses placed onto the corresponding map location. Serotype clusters are labelled. Each grid square side in any direction represents one unit of antigenic distance, which is equivalent to a twofold dilution of antiserum in the PRNT assay. (C) Time-calibrated maximum clade credibility phylogenies built with sequenced viruses from each serotype. Coloured circles at the tips of each phylogeny indicate viruses selected for antigenic characterisation. In the inset, coloured bars indicate the distribution of the amino acid (AA) differences across the whole genome of each sequenced virus as compared to its genetically closest virus used for antigenic characterisation, while the grey curve indicates the same distribution but compares each sequenced virus to an antigenically characterised virus randomly selected from those of the same serotype.   Reconstruction of the yearly hospitalised secondary dengue cases by serotype and age. Results for 2008 and 2013, two years with distinct serotype patterns, are shown to illustrate the adequacy of model estimations. Figure S5 provides results for each year from 1997 to 2014. Vertical bars show the serotype and age-specific counts of secondary cases observed in our surveillance hospital. Dots and error bars indicate the median and 95% CrI of the corresponding case counts estimated by simulating the infection histories of individuals using posteriors inferred from the full model (Methods). The observed and estimated case counts are coloured by the identity of the secondary serotype. D1 to D4 indicate the secondary cases of each serotype. ND indicates cases without serotype information.   (Table S7). Colour in the heat map corresponds to the disease risk, which is estimated by adjusting the marginal probability of hospitalisation with the marginal probability of secondary infection for each cohort of individuals based on their year and age of secondary infections (Methods). (B) Mean relative disease risk for individuals of each age across years from 2008 to 2014. (C) Mean relative disease risk for individuals acquiring secondary infection in each year, which averages the estimated relative disease risk over different ages in the same year. In (B) and (C), colours indicate the estimates using different models, with the line and shaded regions indicating the respective mean and 95% confidence interval.

Methods
Hospitalised dengue case data. Linelist data of hospitalised dengue cases was from the Queen Sirikit National Institute of Child Health (QSNICH) in Bangkok, Thailand. QSNICH is the only public children's hospital at the tertiary level in Bangkok that serves dengue cases in children requiring hospitalisation 19 . All suspected dengue cases that underwent hospitalisation at QSNICH were tested using reverse transcriptase polymerase chain reaction or IgM/IgG serology at the Armed Forces Research Institute of Medical Sciences 19 . Primary or secondary infection was determined using dengue hemagglutination inhibition assay and/or dengue IgM/IgG capture enzyme-linked immunosorbent assay (ELISA) 19 . The infecting serotype was determined using serotype-specific PCR and/or antigen-capture ELISA 19 . From 1997 to 2014, there were 11,918 secondary cases, 2,464 primary cases, and 899 cases without infection parity. We only used secondary cases. The infant cases (N = 38) were excluded to avoid the influence of maternally-derived antibodies 9,32 . Only 335 (2.8%) secondary cases were with age above 14, partly because young adults tended to attend general hospitals. Using the age groups between 1 and 14, we identified 11,546 secondary cases in QSNICH from 1997 to 2014. The yearly age-specific population data in Bangkok was retrieved from 33 .

Full genome sequencing of dengue viruses.
The selection criteria of QSNICH serum samples for virus isolation and full genome sequencing were described previously 13,20 . In total 1,848 dengue viruses isolated from QSNICH patients over the 21-year period from 1994 to 2014, including 622 DENV-1, 438 DENV-2, 424 DENV-3, and 364 DENV-4 strains, were sequenced at Walter Reed Army Institute of Research (WRAIR), using Illumina MiSeq or Roche 454 sequencing. An additional 739 dengue viruses isolated from other locations in Thailand were also underwent full genome sequencing at WRAIR. Assembly of consensus genomes and construction of consensus sequences were previously described 13,20 .
The Maximum Clade Credibility phylogenies of all strains from the same serotype, based on full genome sequences, were built using BEAST v1.10.4 34 , under a HKY codon substitution model 35 , a relaxed lognormal clock model 36 , and a skygrid population size model 37 . Three independent chains were run for each serotype, with parameters sampled every 10,000 iterations. Runs were optimised using the GPU BEAGLE 4 library 38 . For each serotype, combined chains were manually checked for convergence using the Tracer software 39 , with effective sampling size (ESS) values > 200. For DENV-2 to DENV-4, the chains were run for 1 to 1.5 billion iterations, where we manually removed the initial 10% iterations as burn-in depending on the serotype. For DENV-1, considering the large size of the sequence dataset, we allowed some ESS to be between 150 and 200 to avoid prohibitive computational time. Two chains were run for 900 millions iterations with the initial 18% iterations as burn-in, while the third chain was run for 1.2 billion iterations with the initial 40% iterations as burn-in to correct for convergence issues.

Antigenic characterisation of the representative dengue viruses isolated from Bangkok.
Among the 1,848 dengue viruses isolated from Bangkok, 348 antigenically representative strains, including 87 DENV-1, 80 DENV-2, 90 DENV-3, and 91 DENV-4 strains, were selected for the antigenic measurements using PRNT assay. The titration experiments used a panel of 20 antisera from African green monkeys 13 . The selection criteria of the representative viruses and antisera for antigenic measurements and the adjustment of titres for experimental conditions were described previously 13 . We constructed the original antigenic map of dengue viruses using the approach of antigenic cartography 40,41 , as described previously 13  To provide map coordinates for each of these viruses uncharted in the original antigenic map, we developed the following data curation algorithm. First, consider DENV strains of serotype si isolated at year T, with sequencing data but no antigenic data. We identified all antigenically characterised viruses of the same serotype si isolated between year T-2 and T+2. For each sequenced virus of serotype si isolated at year T, we identified the genetically closest virus from this subset of antigenically characterised viruses and gave the sequenced virus the same coordinates. If multiple sequenced viruses of serotype si isolated at year T were given the same map coordinates, they were merged as a single virus. Then, we identified all antigenically characterised viruses of DENV-2 isolated between 2006 and 2010. For each pair of identified viruses, we used their map coordinates to calculate the corresponding antigenic distance. We excluded those virus pairs with antigenic distance greater than 0.75 (i.e., mean antigenic distance due to the variability of PRNT measurements 42 ). We partitioned the remaining viruses into four groups using the k-means clustering 40 , and then aggregated viruses of each group into a single virus by averaging their map coordinates. These procedures introduced 48 additional viruses into the original antigenic map.
Our study used the 3D antigenic map to conduct all the inference and analyses, and 2D antigenic map only for the visualisation of Figure 1B. To ensure the coverage of antigenic distance data for all years and ages of secondary dengue cases hospitalised in QSNICH, we excluded those older age groups with no antigenic distance data (Table S7). The final case data aligned with the antigenic distance data contains 6,903 hospitalised secondary cases, with 69.7% of cases diagnosed with the infecting serotype of the secondary infection. We only analyse the secondary dengue cases observed in hospital, with no information about their primary infections. By considering the serotype, virus, and age of each individual's primary infection as latent variables, the probability that an individual of age aT at year T is hospitalised as a severe dengue case due to the secondary infection with serotype si in that year is given by:

Modelling dengue virus infection and disease process.
(1) where the two terms on the right-hand side are explained as follows.
Primary infection. The first term is the conditional probability of getting secondary infection with serotype si at year T, given the previous primary infection with serotype sj at year Y.
Temporary cross-protection. The immune responses elicited by primary infection generate both serotype-specific antibodies and cross-reactive antibodies 11,12 . The serotype-specific antibodies often maintain at a relatively high concentration over time, providing life-long protection against homotypic reinfections 10,11 . Following primary infection, cross-reactive antibodies can provide temporary cross-protection to reduce the risk of getting heterotypic infections. However, the waning of immunity over time may reduce the cross-reactive antibodies to a low concentration and give rise to sub-neutralising antibody titres that enhance the risk of severe disease 6,7 .
Therefore, we make two assumptions: (1) The homotypic reinfections are unlikely to occur or be observed in hospitals, and hence excluded from our analysis; and (2) the temporary crossprotection reduces the susceptibility to heterotypic infections by a factor of 1  − in the first year following primary infection. To account for the effect of temporary cross-protection, we introduce the effective FOI to adjust the risk of infection at year to characterise the probability of disease given the sequential infections with primary serotype sj and then secondary serotype si.
Relative probability of disease by antigenic distance. Alongside the effect of the two specific serotypes causing primary and secondary infections, we further consider the influence of the specific pair of viruses that are potentially responsible for an individual's primary and secondary infections on the probability of disease from a secondary infection. This is captured by the term To incorporate uncertainty in the virus causing each infection, we average the virus-specific relative probability of disease, as described by Eq. (4), over all possible pairs of viruses that could result in the given trajectory of primary and secondary infections as follows: In our case, let be the whole dataset of the antigenic distance between the two viruses that are responsible for an individual's primary and secondary infections. We partition the range  (Table S2).

Serotype-pair model of dengue disease process.
We consider an alternative model where the probability of disease from a secondary infection depends on the serotype of both the primary and secondary infection: as the ratio between the number of secondary cases having serotype information in year T and the total number of secondary cases observed in year T.
The probability that an individual of age aT at year T acquiring secondary infection with serotype si at year T becomes a severe case with known serotype in hospital is given by: The probability that an individual of age aT at year T acquiring secondary infection with serotype si at year T becomes a severe case in hospital but with no serotype information is given by:

Let ( )
, T N T a be the population size for individuals of age aT at year T in Bangkok. The expected number of secondary cases with known serotype si in our hospital is given by: The expected number of secondary cases with no serotype information in our surveillance hospital is given by: where U T a is the oldest age analysed for year T, for which individuals older than U T a in year T are excluded as they do not have antigenic distance data (Table S7).

Model fitting: Prior settings and diagnostics.
We infer all model parameters in a Bayesian framework using Markov Chain Monte Carlo (MCMC) method with Hamiltonian Monte Carlo sampling 45 . We implement all inference models using CmdStanR 46 and RStan 47 packages in R. We fit the model using the following weakly informative priors 48 : (1) Serotype-specific FOI per year was sampled with the following prior distribution: (2) Relative change in FOI due to temporary cross-protection was sampled with the prior: ( ) 1 normal 0 , 0.5 0.5 0 ,10 w * ith  +   .
(3) Based on our previous estimates of the case reporting rates in Thailand 33 , the probability of disease by the serotype of the primary and secondary infection was sampled using below two steps: First, DENV-1 followed by DENV-3 infection was regarded as the reference pair, with the corresponding probability of disease sampled with the prior: Then, the probability of disease for each of the other 11 serotype pairs relative to that of the reference pair was sampled with the prior distribution: To fit each model, we use four independent chains with random initialisations. Each chain was run for 45,000 iterations with the initial 30,000 iterations as warm-up. We validate the convergence of 24 MCMC chains using the trace plot and Gelman-Rubin R statistic 45 . We obtain the posterior samples using a thinning interval of 50.
Reconstruction of the serotype and age-specific counts of hospitalised secondary cases in each year from 1997 to 2014.
Using an individual-based stochastic simulation model parameterised with the posteriors inferred from the full model, we simulate the infection and disease process of each person in Bangkok over the study period. We randomly select 40 posterior samples inferred from the full model, each of which is simulated with 100 stochastic realisations. This simulator has below components.
Pre-calculation of the probability of disease. Before simulations, we first use Eq. Initial demographics. The first year in simulation is set to 1994. Based on the age-specific population data in Bangkok 33 , a total of 76,688 newborns fully susceptible to all four serotypes are introduced at the start of each simulation realisation.

New primary and secondary infections generated in each simulation year.
At the start of each year T, we identify all seronaive individuals fully susceptible to all four serotypes and all exposed individuals only having primary infections before year T.
Then, we simulate the occurrence of primary infections at year T. The probability that each seronaive individual gets primary infection with serotype sj at year T is given by: The probability that each seronaive individual escapes all four serotypes to remain fully susceptible in year T is given by ( ) Then, we simulate the occurrence of secondary infections at year T. The probability that each exposed individual only having a primary infection with serotype sj at year Y acquires secondary infection with a heterotypic serotype si at year T is given by: The probability that each exposed individual further escapes the secondary infection at year T is given by: Therefore, for each individual exposed only once before year T due to primary infection with serotype sj at year Y, the process of getting secondary infection with a heterotypic serotype i j s s  or escaping all three heterotypic serotypes at year T is simulated using a multinomial distribution with probabilities:

New hospitalisation of secondary cases per year.
The probability that an individual having a primary infection with serotype sj at year Y and a secondary infection with serotype si at year T becomes severe case in hospital with known serotype is given by: The probability that an individual having the same infection trajectory becomes a severe case in hospital but with no serotype information is given by:

the disease process of each secondary infection at year
T is simulated using a multinomial distribution with probabilities: , , , ; 1 , , , ; 1 , , , Update of demographics. The following procedures run in succession at the end of each year T: (1) The age of each individual increases by one year; (2) individuals older than U T a (i.e., oldest age analysed for year T, Table S7) are removed; (3) each remaining individual is randomly selected to remove according to the death rate at year T; and (4) newborns are generated according to the agespecific population data for the subsequent year T + 1.

Characterisation of the evolution of population disease risk by year and age.
We extend the above simulation approach to estimate the changing disease risk for individual birth cohorts over year and age. We compare the estimated disease risks using the full model, the serotype-pair model, and the serotype model. With the parameters inferred from each model, we randomly select 40 posterior samples and use each of them to simulate 100 stochastic realisations. Each realisation first simulates the infection process of the primary and secondary infections for each person in Bangkok across the study period, and then calculates the disease risk by adjusting the marginal probability of hospitalisation with the marginal probability of secondary infection for each cohort of individuals based on their year and age of secondary infections. This simulator has the following components: Initial demographics. The first year in simulation is set to 1994. Based on the age-specific population data 33 , a total of 76,688 newborns fully susceptible to all four serotypes are introduced at the start of each simulation realisation.

Infection process of each individual.
At the start of each year T, we identify all seronaive individuals fully susceptible to all four serotypes and all exposed individuals only having primary infections before year T.
Then, we simulate the occurrence of primary infections at year T. For each seronaive individual, the process of getting primary infection with one serotype or escaping all four serotypes at year T is simulated using a multinomial distribution with probabilities: For each secondary infection with a given serotype si at year T, the infecting virus is randomly chosen from all co-circulating viruses of the same serotype with probability: , , , , using the posterior estimation of the serotype-specific FOI per year and the effect of temporary cross-protection.
where the last term on the right-hand side is calculated using the infection trajectories of individuals generated from the simulation. For example, using the full model, we have Data and materials availability. All code and data necessary to reproduce the analyses will be available at the time of publication.

Ethical approval.
This study explored the aggregate counts of hospitalised patients using previously published antigenic maps and dengue viral sequences. The study was approved by the institutional review board of Walter Reed Army Institute of Research (WRAIR #2624) and was determined to not constitute human subjects research. The study was also approved by In each year from 1997 to 2007, older age groups without antigenic distance data were excluded from the analysis (Table S7)     In each year from 1997 to 2007, older age groups without antigenic distance data were excluded from the analysis (Table S7). The observed and estimated case counts are coloured by the identity of the secondary serotype. D1 to D4 indicate the secondary cases of each serotype. ND indicates cases without serotype information.   Figure 1D, Figure S7B uses the inter-serotype antigenic distances without considering the order of the possible infecting viruses.  Table S1. Comparison of three models. The serotype model assumes that the probability of disease given a secondary infection only depends on the identity of the secondary infecting serotype. The serotype-pair model assumes that the probability of disease given a secondary infection depends on the serotype of both the primary and secondary infection. The full model assumes that the probability of disease given a secondary infection depends not only on the serotype of the primary and secondary infection, but also on the antigenic distance between the two viruses that are responsible for the primary and secondary infections.

Model
Leave-one-out (LOO) crossvalidation 49,50 Deviance information criterion (DIC) 51 LOOIC a (95% CI d )   50 . b The effective number of parameters (p_loo) calculated using the LOO method 50 . c The effective number of parameters (pD) estimated using the posterior median of parameters 52 . d Confidence interval.