Vaccine adverse event enrichment tests

Vaccination safety is critical for individual and public health. Many existing methods have been used to conduct safety studies with the VAERS (Vaccine Adverse Event Reporting System) database. However, these methods frequently identify many adverse event (AE) signals and they are often hard to interpret in a biological context. The AE ontology introduces biologically meaningful structures to the Vaccine Adverse Event Reporting System (VAERS) database by connecting similar AEs, which provides meaningful interpretation for the underlying safety issues. In this paper, we develop rigorous statistical methods to identify “interesting" AE groups by performing AE enrichment analysis. We extend existing gene enrichment tests to perform AE enrichment analysis, while incorporating the special features of the AE data. The proposed methods were evaluated using simulation studies and were further illustrated on two studies using VAERS data. The proposed methods were implemented in R package AEenrich and can be installed from the Comprehensive R Archive Network, CRAN, and source code are available at https://github.com/umich‐biostatistics/AEenrich.


INTRODUCTION
The Centers for Disease Control and Prevention (CDC) and the U.S. Food and Drug Administration (FDA) conduct post-licensure vaccine safety monitoring using the Vaccine Adverse Event Reporting System (VAERS). 1,2 VAERS accepts spontaneous reports of suspected vaccine adverse events (AEs) after administration of any vaccine licensed in the United States from 1990 to present. As a national public health surveillance resource, VAERS is a key component in ensuring the safety of vaccines. The VAERS report form includes data fields for date of vaccination, vaccine(s) administered, date of onset and description of the AE, recovery status, and other relevant information. VAERS data from the primary reports, with identifying patient information removed, are publicly available on the VAERS website (www.vaers.hhs.gov/data/index); for a detailed description of the VAERS system and its limitations, see Reference 3. Numerous methods have been used to conduct safety studies with the VAERS database. [4][5][6][7][8][9][10][11][12][13][14][15][16] In these methods, a contingency table is generally created to display counts for all vaccine and AE pairs during a specified time period. In this table, each row represents a vaccine and each column represents an AE. Each cell in the table contains the number of VAERS reports that mention both that vaccine and that event for a defined period. A statistical measure is then calculated to quantify the association between an AE and a vaccine. A large value of the measure shows a strong association, which might indicate a vaccine safety problem (called "signal"). A signal is considered evidence that an AE might be caused by vaccination and warrants further investigation or action. However, these methods frequently identify many AE signals and they are often hard to interpret in a biological context. AEs are naturally related; for example, events of retching, dysphagia and reflux all belong to an AE at a higher-level (ie, the abnormal digestive system). AE group analysis (ie, enrichment analysis) is able to provide meaningful interpretation for the underlying safety issues. For example, strong signals of muscle weakness, loss of sensation and seizures might indicate a potential problem in the nervous system. Moreover, enrichment analysis is able to detect AE groups containing weak/moderate but concordant AE signals. For example, the AE group of respiratory system includes five AEs: pneumonia, sinusitis, asthma, bronchitis, and rhinorrhea, which all carry weak to moderate signals. Each individual AE might not be detected due to lack of statistical power, but the respiratory system as a whole might be detected by the enrichment analysis. In this paper, we use the phrase "enriched AE groups" to describe AE groups containing unexpectedly large number safety signals.
The largest resource for AE ontology is MedDRA (Medical Dictionary for Regulatory Activities). 17 It has a five-level hierarchy. VAERS uses the second lowest term, "Preferred Terms" (PT), which is a distinct descriptor for a symptom, sign and disease. Related PTs are grouped into higher-level AE terms, including "High Level Group Terms" (HLGT) and "System Organ Classes" (SOC). Higher layers of HLGT and SOC represent biologically and clinically meaningful categories for the AEs observed on the lower PT level. The AE ontology has been used to classify AE signals. [18][19][20][21] For example, 18 showed that most AE signals identified on the PT level were found to be in behavior/neurological AEs on the SOC level. However, these findings are based on an ad hoc strategy of comparing proportions of signaled AEs between AE groups. In this paper, we present rigorous statistical methods to identify AE group that are associated with a vaccine of interest and quantify AE group uncertainty in the enrichment analysis.
Over the last few decades, bioinformatics methods have used gene ontology to systematically dissect large gene lists in order to assemble a summary of the most enriched and pertinent biology. The first idea for gene enrichment analysis is to take the user's preselected significant genes, and then compare the difference between the proportion of significant genes that fall into the gene set and the proportion of significant genes that do not. 22 There are several drawbacks of this method as discussed in References 23,24. For example, it depends on an arbitrary threshold to select the significant genes, and it is incapable of detecting low but concordant signals (below the used threshold) from genes within a gene set. Also, it relies on the gene-gene independence assumption that is known to be biologically invalid. A more recent approach is the Gene Set Enrichment Analysis (GSEA) method, 25,26 which assumes that few major gene expression changes have a considerable effect on pathways function, and the sum of several weaker and concurrent changes in pathways' genes impact the general functioning as well. 27 GSEA uses gene ranks based on a difference measure, such as fold change, and compares the distribution of gene ranks from the gene set to the distribution for the rest of the genes based on a Kolmogorov-Smirnov statistic. Compared to the "cut-off" strategy based on gene significance, GSEA is more statistical powerful by using more data information of the ranks.
The enrichment analyses for gene sets can be applied to study AE groups. However, important issues exist when gene enrichment analysis is applied to AE enrichment analysis. Unlike continuous gene expression data, AE data are counts, and a large amount of AEs have a zero count. For example, in the VAERS dataset, approximately 40% AEs were never mentioned with the "FLU4" vaccine, resulting in 40% AEs with a zero count. The current gene enrichment tests can not handle excessive zeros. Additionally, we encountered 20% ties in a ratio measure (as defined in the Methods section below) with the count data. The current GSEA assigns random ranks to the tied statistics, which can lead to inaccurate results. In this work, we extend the current enrichment tests to appropriately address these two issues to perform AE enrichment analysis. The proposed method was implemented as R package AEenrich https://CRAN.R-project.org/package= AEenrich.

Data structure
For a particular vaccine (denoted as the target vaccine), we create a 2 × N contingency table (see Table 1), with two rows for the target vaccine (Yes/No) and N columns for the AEs reported in the VAERS database during a study period. In this table, n 1i is the number of VAERS reports that mention both the target vaccine and the ith AE in a defined period, n ⋅i is the total number of reports that mention the ith AE, n 1⋅ is the total number of reports that mention the target vaccine, and n ⋅⋅ is the total number of reports in the study period.

AEKS: AE enrichment analysis based on modified K-S statistic
In this section, we extend the current GSEA 25,26 to handle AE data with ties and excessive zero. Poisson distribution has been commonly used to model the n 1i , 4,11,28 where i is the reporting ratio (RR) for the ith AE with the target vaccine, with a large value indicating a strong safety signal. RRs are the statistics of interest and we will use their maximum likelihood estimates n 1i n ⋅i 's as observed values. Our goal is to determine whether members of a AE group tend to have higher RRs.

2.2.1
Calculate enrichment score for each AE group 1. Rank order the N AEs based on the statistic RR. Assume that there are J distinct RRs (J ≤ N) and we order the N AEs from the highest to the lowest rank as Extend GSEA to handle tied RRs. Given position i in L, evaluate the fraction of AEs in group G ("hits") and the fraction of AEs not in G ("misses"). N G denotes the number of distinct AE terms in group G.
where 1(⋅) is the indicator function. We then compute a running sum across all N AEs. The K-S statistic for AE group G is defined as KS(G) = max 1≤i≤J (P hit (G, i) − P miss (G, i)), which is the maximum value that P hit is above P miss . When many members of G appear at the top of the list, KS(G) is high. 3. Handle zero counts. The maximum likelihood estimate for i is n 1i n ⋅i . Thus, a zero count will produce a zero RR. The above KS(G) treats zero RRs as ties and can not efficiently handle excessive zeros. For example, it may identify a group as enriched if the group has majority of AEs with zero counts and only a few AEs with large signals. To include the number of zero counts in the enrichment test, we define group G as enriched only if it contains a smaller proportion of zero counts compared to the rest of groups. To this end, we let p G 0 denote the proportion of zero counts in group G, p G c 0 denote the proportion not in group G, and create an indicator function by comparing these two zero proportions. The function is zero if p G 0 is larger than p G c 0 . 4. Combine the statistics in 2 and 3, we propose a composite enrichment score (ES) where ES(G) ∈ [0, 1], and ES(G) is zero if the proportion of zero counts in group G is larger than the proportion of zero counts in other groups. ES(G) is large if group G has a smaller proportion of zero counts than the remaining groups and the nonzero counts in group G are concentrated at the top of the list L.

Vaccine/AE i Yes No
Yes n 1i n ⋅i − n 1i No TA B L E 2 A 2 by 2 contingency table for a vaccine-AE i pair

Estimate statistical significance for an AE group
The distribution of ES(G) under the null is not analytically tractable and is obtained using Monte Carlo hypothesis testing. 28 Under the null hypothesis, H 0 : Based on the relationship between Poisson and Multinonial distributions, the joint distribution of (n 11 , … , n 1N ), conditioning on where Given this multinomial distribution, we generate the AE count data and compute ES * (G) using formula (2). We repeat this process M times (M is generally large; say 5000) to create a null distribution of ES * (G). The P-value is the proportion of ES * (G) that is greater than or equal to the observed ES(G). Finally, we apply the Benjamini-Hochberg procedure 29 converting P-values into q-values to control the false discovery rate.

Estimate statistical significance for individual AEs within the AE group
Based on the null distribution generated by the Monte Carlo hypothesis testing method, we can also estimate the statistical significance of each individual AE by computing the proportion of RR * in the null distribution that is greater than or equal to the observed RR. The Benjamini-Hochberg procedure is then used to adjust for the false discovery rate.

AEFisher: AE enrichment test based on modified Fisher's exact test
This approach first assesses the significance of the association between each AE and the vaccine and then uses a "cutoff" strategy to classify the AEs into signaled and unsignaled AEs. To test the significance of the association, we apply the Fisher's exact test to data in a 2 by 2 table (see Table 2) and then use the Benjamini-Hochberg procedure to convert P-values into q-values for controlling the false discovery rate. A signaled AE is defined based on both the strength of the signal and statistical significance, such as q-value <0.1 and odds ratio (OR) >1.5.
To conduct the enrichment analysis for a particular AE group G, a conventional approach is to compare proportions of the signaled AEs in group G and not in group G. If there are significantly more signaled AEs in group G, then group G is enriched. To incorporate the excessive zero RRs in the test, we propose a composite enrichment score where OR G is the odds ratio estimating the association between signaled AEs and group G. A large OR G (OR G > 1) indicates more signaled AEs in group G than in the remaining groups. As in the AEKS test, 1(p G 0 ≤ p G c 0 ) ensures that an enriched group has the proportion of zeros smaller or equal to remaining groups.

Estimate statistical significance for an AE group
We perform a permutation test to assess the significance of the enrichment score for group G by randomly reshuffling the signaled/unsignaled labels. This in spirit is the same as the Fisher's exact test of fixing the row and column margins (here, the group size and the total number of signaled and unsignaled AEs are fixed), while considering the zero proportions.

F I G U R E 1 ROC curves using K-S statistics (AEKS vs GSEA) (left) and Fisher's exact tests (AEFisher vs Fisher's exact test) (right). The
ROC curve is constructed using P-values from the 10 simulated datasets. By comparing the enrichment decision from the test to the true enrichment status in the data generating step, the true and false positive rates were estimated and connected to make the ROC curve [Colour figure can be viewed at wileyonlinelibrary.com]

SIMULATION STUDIES
We ran simulation studies to investigate our proposed methods and compared them to existing enrichment tests. To make simulation studies more realistic, data in simulated datasets were made similar to the real dataset. We first created the AE group structure using the AE groups defined on the HLGT level in MedDRA. In each simulated dataset, we set the number of AE groups to be 150 and determined the group size, N G , by randomly sampling the group size data in MedDRA under the constraint of N G ≥ 10. Similarly, the total count of each AE was determined by randomly sampling the AE total count data in VAERS. Then we randomly selected 10% of the AE groups as enriched and the remaining groups as un-enriched. In VAERS, the proportion of zero AEs per group, p 0 , is between 10% and 60%, therefore, we used p 0 in this range in simulations. We generated a nonzero AE count from a Poisson distribution in formula (1) with the rate parameter randomly sampled from the estimated 's in VAERS. Specifically, in an enriched group, P 0 was sampled uniformly from 0.1 to 0.3 and the rate parameter was constrained to be larger than 0.3. In an un-enriched AE group, we either set the range of p 0 between 0.4 and 0.6 without constraining the rate parameter, or set the rate parameter smaller than 0.4 with p 0 in the range of 0.1 to 0.6. To compare performance of different methods, we constructed the ROC curve using P-values from the 10 simulated datasets. By using a threshold on the P-values, an AE group is identified as enriched if it has a P-value smaller than the threshold. By comparing the enrichment decision from the test to the true enrichment status in the data-generating step, we estimated the true positive rate (ie, the percentage of AE groups that are correctly identified as enriched) and false positive rate (ie, the percentage of AE groups that are incorrectly identified as enriched). By varying the threshold, we obtained a series of false and true positive rates and then connected them to make an ROC curve. As shown in Figure 1, AEKS and the AEFisher performed significantly better than the GSEA and the conventional Fisher's exact test, respectively.
In order to study the effect of sample size on the results of the analysis, we repeated the simulation studies with different sample sizes. Specifically, we let the total count, n .i , for the ith AE (i = 1, … , N) equal to 10%, 20%, 40%, 60%, or 80% of the original count, such that the total sample size is reduced proportionally. All the parameters in the data generating model remain the same. For each sample size, we calculated the area under the ROC curves (AUC) (see Figure 2). As shown in this figure, both AEKS and AEFisher improve its performance in distinguishing enriched from un-enriched AE groups as sample size increases, and AEKS performs better than AEFisher for different sample sizes. This result is expected as AEKS is more statistical powerful by using ranks of individual AEs rather than the binary (significant vs F I G U R E 2 Area under the ROC curves for AEKS and AEFisher with the reduced sample sizes. The ROC curve is constructed using p-values from the 10 simulated datasets. By comparing the enrichment decision from the test to the true enrichment status in the data generating step, the true and false positive rates were estimated and connected to make the ROC curve [Colour figure can be viewed at wileyonlinelibrary.com] nonsignificant) data from the single-AE analysis in AEFisher. Moreover, the advantage of AEKS over AEFisher is more obvious when the sample size is small. One reason is that a small sample size produces inaccurate estimations on statistical significance for individual AEs in AEFisher, thereby leading to under-performed enrichment test.

APPLICATION TO VAERS DATASETS
We applied AEKS and AEFisher to VAERS dataset to study flu and hepatitis vaccines. In both studies, we used the HLGT level of MedDRA to define AE groups. In AEKS, a signaled AE is defined if the q-value <0.1. In AEFisher, a signaled AE is defined if the q-value <0.1 and OR >1.5. In both AEKS and AEFisher, an AE group is enriched if q-value <0.1.

Study flu vaccines
Influenza vaccine is given in large quantities and it prevents millions of illnesses and flu-related doctor's visits each year. CDC recommends the appropriate vaccine during the flu season. Options include inactivated influenza vaccine (IIV) ("FLU3" or "FLU4" in VAERS) or live attenuated influenza vaccine (LAIV) ("FLUN3" or "FLUN4" in VAERS). By restricting the age of the vaccine recipients between 2 and 49, there were 139 353 and 21 820 reports for IIV and LAIV, respectively. To compare LAIV relative to IIV, we created a contingency table with two rows representing vaccine (LAIV vs IIV) and 3534 columns representing AEs. These AEs were classified into 287 AE groups. As in the GSEA, we removed AE groups containing less than 5 AEs, resulting in 132 AE groups with a total of 1,828 AEs. Finally, we applied both AEKS and AEFisher to identify enriched AE groups based on the data contained in this 2 × 1828 contingency table. As shown in Table 3, of the 132 AE groups we studied, both AEKS and AEFisher identified the same two enriched AE groups: respiratory tract infections and upper respiratory tract disorders. Relative to IIV, LAIV is associated with increased risk of respiratory system disorders. Individual AE identified in each group include rhinitis, nasal congestion, sinus disorder, which have been reported before. [30][31][32] New signals, such as epistaxis, are clinically interesting, and it might be true signals that need to be validated in large health care databases.

Study hepatitis A and B combination vaccines
In this study, we were interested in identifying safety problems that are likely due to interactions of two vaccines when they are administered to an individual at the same time. Specifically, we compared AE profiles induced by the hepatitis A  Table 4, AEKS identified peripheral neuropathies, while AEFisher identified musculoskeletal and connectivetissue disorders. Peripheral neuropathies were also mentioned in Reference 33 as an important AE group associated with the combination hepatitis vaccine.

DISCUSSION
AEKS and AEFisher have inherited pros and cons from the GSEA and Fisher's exact test in the gene enrichment analysis, respectively. For example, AEFisher depends on the threshold to select the significant AEs, and it can't be applied if no AEs are found to be significant based on the threshold. Compared to AEFisher, AEKS is more statistical powerful by using ranks of individual AEs rather than a "cut-off" strategy based on an arbitrary threshold. AEKS is capable of detecting AE groups with a few large AE signals or several weaker and concordant AE signals. In simulation studies, we have demonstrated the advantage of AEKS over AEFisher. Moreover, similar to the gene enrichment analysis, both AEKS and AEFisher can not efficiently deal with AE group overlap, where some AEs may belong to several AE groups. There have been several attempts to alleviate the effect of gene set overlap. 24 However, these methods suffer from low sensitivity. Developing methods dealing AE group overlap in the context of AEs would be an interesting future work. AEFisher uses a list of binary (signaled vs unsignaled AEs) data from the single-AE analysis to perform the enrichment test. In this paper, signaled AEs are determined using the Fisher's exact test. Alternatively, other tests for single-AE This is the first study on AE enrichment analysis, and we aim to set up a general framework for future research in this field. In both AEKS and AEFisher, the enrichment score has two components, one for AEs with nonzero counts and one for AEs with zero counts. For the zero component, we used an indicator function, which is 0 when the proportion of zero counts in group G is larger than the proportion in other groups. This is created based on our definition that an enriched AE group is required to have the zero count proportion less than or equal to other groups. However, this definition can be modified; for example, an AE group may be considered as enriched if it has many large AE signals from the nonzero component and slightly larger proportion of zero counts compared to other groups. In this case, a ratio of zero proportion not in the group versus in the group can replace the indicator function. However, an extremely large or small ratio will make the zero-component dominate the enrichment decision (ie, ES is solely driven by the ratio). To mitigate this issue, a weight may be introduced for the ratio to achieve a good balance in decision between the zero and nonzero components. This topic will be an interesting work for future research.

CONCLUSIONS
In this article, we develop two enrichment tests for vaccine AE enrichment analysis by incorporating the special features of the AE count data. AEFisher is a modified Fisher's exact test based on pre-selected significant AEs, while AEKS is based on a modified Kolmogorov-Smirnov statistic. By appropriately addressing the issues of ties and excessive zeros in AE count data, our enrichment tests performed well as demonstrated by simulation studies and analyses of VAERS data. While the proposed methods are developed for vaccine safety, they are broadly applicable to other safety surveillance projects. For example, they can be directly applied to the FDA AEs Reporting System and the Adverse Drug Reactions database for national and international drug safety.

DATA AVAILABILITY STATEMENT
VAERS data from the primary reports, with identifying patient information removed, are publicly available on the VAERS website (www.vaers.hhs.gov/data/index).