Classication and Coding of Data About IgE-mediated Food Allergic Reactions

Background: Collation of clinical data on IgE-mediated food allergies is essential to provide evidenced-based approaches to managing and treating food allergies and prevent accidental reactions. However, this can be a time consuming and dicult process due to the heterogeneous way in which studies collect such data. In order to facilitate data harmonisation a set of standardised terminologies have been identied and a consensus technique established to code food allergy data. Methods: Different terminologies to encode the most common signs, symptoms and problematic foods associated with IgE-mediated food allergies were identied. Their suitability for classifying and coding information about the signs and symptoms of food allergic reactions, causative foods and reaction severity of was assessed. The assessment included existing conceptual coverage and data descriptions, classication schemes and additional relevant information. Results: All of the terminologies reviewed included classication schemes, allowing broader concepts to be related to those that are more specialised. Additional information was often present such as equivalence. Of the clinical coding systems assessed, the Systemized Nomenclature of Medical Clinical Terms (SNOMED-CT) provided the most complete coverage with options to code symptom severity. Only food coding systems, such as FoodEx2, provided comprehensive conceptual coverage of the food terms. Conclusions: Utilising SNOMED-CT and FoodEx2 standards together will support the harmonisation of data regarding food allergy from diverse sources, providing a transparent and effective way to collate relevant data required for effective food allergen management in the future.


Introduction
IgE-mediated food allergies are estimated to affect around 1% of infants 1,2 and up to 4% of adults 3

in
Europe. The prevalence varies between countries with rates being higher in Australia 4,5 and lower in countries such as India and parts of China 6 . Approved therapies are currently only available in the USA and are solely used to treat peanut allergy in children and adolescents. The therapy can protect allergic individuals from accidental reactions 7 but are not accessible to all and not available for all the different foods that can precipitate an allergic reaction. Consequently, avoidance of the causative food is generally the only recourse and in order to help individuals avoid their problem food, labelling of a group of priority allergenic foods has been made mandatory in most parts of the world. However, traces of allergenic ingredients can nd their way, unintentionally, into food products and can cause adverse reactions 8 . In order to warn allergic consumers of the potential hazard posed by the presence of unintended allergens, food manufacturers place precautionary allergen labels (PAL) on such foods. PAL should only be used in conjunction with a risk assessment 9 but such approaches require reference doses of allergens that are accepted as being generally safe for the majority of food allergic consumers and are usually derived from oral food challenge data 10 . Such risk based approaches to food allergen management also provide a transparent and consistent system to prevent the overuse of 'may contain' labelling on foods where allergens are less likely to pose a risk and avoid mistrust from allergic consumers which can lead to risky choices being made of foods that could result in a reaction.
The ThRAll project aims to support the application of risk-based approaches to food-allergen management 11 . This involves the collation, harmonization and integration of data from individuals with IgE-mediated allergies undergoing a diagnostic procedure called an oral food challenge 12 . These can be used to identify doses of allergens below which food allergic subjects are unlikely to react, or react with only mild symptoms 13 . Through the ThRAll project a publicly available database of oral food challenges is being developed, which can be used for dose distribution modelling that underpins identi cation of doses that have an acceptable level of risk of eliciting a reaction in the allergic population 10 . Data from oral food challenges are often collected and reported in a heterogeneous manner, with studies conducted in different ways (double blind placebo controlled, single blind [when only the patient is blinded], interspersed, open challenge), using different dosing protocols and food matrices to deliver the allergenic food. In addition, different investigators can use multiple terminologies to describe the same clinical symptom (i.e. a change in function, sensation or appearance which indicates disease which is reported by the patient) or sign (i.e. an objective observation or evidence of disease) even within the same study centre. This ambiguity can lead to different interpretations when harmonizing existing datasets. To facilitate the integration of existing data and support repeatability of future studies it is important to have a consensus approach to encoding and reporting food allergy information. This paper aims to identify and encode common variables relating to food allergy to promote the repeatability of data analysis and facilitate the interoperability of data being collated in the ThRAll project relating to food allergies.

Materials And Methods
Relevant concepts that are used to understand, describe and diagnose food allergies were rst identi ed and de ned (Supplementary material Table 1). Clinical record forms used for recording oral food challenges from the EuroPrevall (The prevalence, cost and basis of food allergy across Europe) and iFAAM (Integrated approaches to food allergen and allergy management) studies were then used to identify common signs and symptoms experienced during an IgE-mediated reaction, which were also de ned (Supplementary material Table 2) 9,14−17 . In addition, key food terms were compiled based on the foods that initiate an allergic reaction (including species of origin, common derivative ingredients and manufactured foods) based on the 14 major allergens which must be stated on food labelling as described by Annex II of the EU Food Information for Consumers Regulation 1169/2011 18,19 (Supplementary material Table 3).
In the next step the utility of different terminologies for classifying and encoding this information was assessed. Clinically relevant and validated terminologies were identi ed using The National Library of Medicine. These were the Systemized Nomenclature of Medical Clinical Terms (SNOMED-CT), Medical Subject Headings (MeSH) and the Medication Dictionary for Regulatory Activities (MedDRA) systems (Table 1). LOINC (Logical Observation Identi ers Names and Codes) was also considered for classifying and encoding the symptom concepts. However, this system is used to represent the type or "question" for a clinical test or measurement. This is not in scope for this study, where we focus on coding observation results or "responses", and LOINC was removed from further consideration. In addition, the European Food Safety Authorities FoodEx2, LanguaL alimentaria (LanguaL) and a vocabulary developed by the Food and Agriculture Organisation (AGROVOC) were compared for their classi cation of food products (Table 1). Table 1 Terminologies assessed for their utility in encoding food allergy data.
Terminology Description SNOMED-CT (Systemized nomenclature of medicine-clinical terms) 20 SNOMED-CT is the most comprehensive and international clinical terminology system.
MeSH (Medical subject headings) 21 MeSH is used for indexing, cataloguing and searching biomedical and health-related information.
MedDRA (Medical Dictionary for Regulatory Activities) 22 MedDRA provides a clinically validated and internationally recognised medical terminology.

LOINC (Logical Observation Identi ers Names and Codes) 23
LOINC is an international standard, which facilitates the storage, exchange and harmonization of data.

FoodEx2 24
FoodEx2 is a standardised food classi cation and description system. Data is compiled from EU organizations, industries and academic research.
LanguaL (Langua aLimentaria" or "language of food") 25 LanguaL provides a standardised technique for describing foods based on the combination of characteristics of that food 26 .
AGROVOC (Agriculture and Vocabulary) FAO (Food and Agriculture Organization of the United Nations) 27 AGROVOC is a controlled vocabulary including both food and nutrition that can translate concepts into 37 languages.
The ThRAll project aimed to evaluate the classi cation of symptoms, signs and food relating to an IgEmediated food allergy from a risk management and public health perspective. Given this objective each terminology was assessed in four areas: 1. Conceptual coverage: Every symptom and food term identi ed was searched in each of the terminologies to quantitatively compare which system provided the greatest existing conceptual coverage. 2. Concept descriptions: A clear and validated de nition was identi ed for each symptom or sign and food term. This was used as a 'benchmark' de nition to be assessed and compared with the descriptions provided in each of the terminologies. 3. Classi cation: The information provided by each terminology to support classi cation of speci c symptoms and signs together with food were compared. The inclusion and clinical appropriateness of information that expresses the relationships between general and specialised concepts (classi cation schemes) was considered to be a key differentiator.
4. Additional information: Each terminology was assessed for support for synonyms speci ed in the ThRAll protocol. This was completed for symptoms and signs, but not for food where equivalence is more complex to determine. Any additional information was also assessed for use in the ThRAll study.

Conceptual coverage
For the symptom and sign concepts (Supplementary material Table 1) both SNOMED-CT and MedDRA provided complete coverage, whilst MeSH covered 88%. FoodEx2, LanguaL and AGROVOC did not cover any of the symptom concepts but did provide superior coverage for food concepts ( Table 2). AGROVOC covered 85% of the food concepts, LanguaL had complete coverage and FoodEx2 covered all of the food concepts except for sulphur dioxide. In addition, LanguaL uses the EFSA FoodEx2 coding and classi cation system for products in the European Union, which validates the legitimacy and effectiveness of FoodEx2.

Description of concepts
The term de nitions provided by each of the coding systems were reviewed and compared with the de nitions found in the literature (Supplementary material Tables 2 and 3). MedDRA does not provide formal de nitions for the symptom and sign concepts and so fails to provide any additional clarity or description for each term. In contrast, SNOMED-CT and MeSH provide clear and unambiguous descriptions for each concept. This is illustrated in Table 3 using "urticarial rash" and "hazelnut" as an example sign and food respectively. Thus, both MeSH and SNOMED-CT include a detailed de nition to describe the sign "Urticarial rash" which is consistent with the key de nitions (Supplementary material  Table 2). Similarly, FoodEx2, LanguaL and AGROVOC all provide a detailed description for each food term as well as detailing both the common and the Latin name for each food to reduce ambiguity (Supplementary material Table 3). Hazelnut SNOMED-CT "Tree nut (substance)" MeSH "A plant genus of the family BETULACEAE known for the edible nuts" FoodEX2 "Tree nuts from the plant classi ed under the species Corylus avellana L., commonly known as Hazelnuts or Cobnuts or Common hazelnut. The part consumed/analysed is not speci ed. When relevant, information on the part consumed/analysed has to be reported with additional facet descriptors. In case of data collections related to legislations, the default part consumed/analysed is the one de ned in the applicable legislation." LanguaL "The group includes kernels of the seeds of all species similar to Hazelnuts or similar nuts sharing the same pesticide to the maximum residue level (MRL) as Hazelnuts." AGROVOC "The fruit of small trees of shrubs of the Corylaceae family. The round-oval nuts are surrounded by a leafy involucre, which comes out easily when the fruit is mature. Remains the nut, with a pericarp, the shell, more or less woody, and depending on varieties. Inside is the seed, covered by a very thin tegument." Where appropriate, MedDRA aggregates and highlights similar terms related to the speci c concept and so allows comparable terms to be easily accessed. For example, 'urticaria rash' is the preferred name that classi es 33 related concepts including 'urticaria localized' and 'generalised urticarial rash', which provide varying levels of detail and alternative terms that include a level of clinical interpretation. This is useful but the large number of similar terms may reduce the speci city and level of detail initially identi ed by a term. In contrast, concepts in SNOMED-CT are associated with a unique Fully Speci ed Name (FSN); this is the ideal term that a clinician would use in a particular language, dialect or context. SNOMED-CT also identi es relationships to other similar and related concepts and considers the preferred name for 'urticarial rash' to be 'weal' with 'wheal', 'welt', 'hives' and 'nettle rash' being noted as alternative terms.
Since concepts are coded in MeSH for the purpose of indexing publication records, each MeSH term can comprise several synonyms; for example, 'urticaria' also includes 'urticarias' and 'hives'. All concepts that come under one record are considered equivalent and this is useful when trying to maximise the number of relevant articles identi ed in a search but not necessarily when identifying synonyms in the context of compiling data on food allergy in the ThRAll database.
LanguaL and FoodEx2 do not provide related concepts since this is not appropriate in the context of a food classi cation system as there would be no suitable synonyms. In certain cases AGROVOC provides equivalent terms and where appropriate states the broader and/or narrower relevant concepts as well as identifying what the product is produced (i.e. the plant/ animal species of origin).

Mechanistic basis to classify an adverse reaction
It was also important to compare the pathway that each of the terminologies used to classify an adverse reaction. Food can induce a range of adverse reactions but, although the symptoms and signs may be similar, the mechanistic basis of allergies, intolerances and sensitivities are completely different. An adverse reaction to food 28 encompasses both immune-and non-immune mediated adverse reactions (Fig. 1). Sub-types of immune-mediated adverse reactions include those involving the development of food-speci c IgE antibody responses. This type of food allergy results in symptoms and signs that appear immediately (in less than 2 hours, usually within 30 minutes) sometimes even after ingesting a small dose of the allergen, and can involve multiple organs including the skin, respiratory, digestive and cardiovascular systems. A second type of well-de ned non-IgE-mediated adverse reaction to food is the Tcell mediated syndrome triggered by ingestion of gluten, known as coeliac disease (CD) 29 . This life-long disease involves sensitivity to gluten and individuals with CD often present with gastrointestinal signs and symptoms, including diarrhoea, together with weight loss due to the malabsorption of nutrients. In contrast, food intolerance conditions are not immune mediated but nevertheless can be reproducibly induced following ingestion of speci c foods. One example is lactose intolerance where individuals lack the lactase enzyme, which is involved with the digestion of lactose. Symptoms appear shortly after drinking milk or consuming dairy and are commonly reported as stomach pain, bloating and diarrhoea. Lactose intolerance is different to an IgE mediated milk allergy, which is an IgE mediated reaction where symptoms appear within 2 h of consuming milk-containing foods.
The classi cation of an adverse reaction used in the ThRAll project ( Fig. 1) was used to benchmark how the different terminologies classify an IgE mediated food allergy (Fig. 2). SNOMED-CT considers the trigger of an allergic reaction as the causative food and then links this back to the allergic hypersensitivity. It provides an overview of the process of the reaction but also lters into speci c details about the adverse response, including branching to the causative agent and a quali er for the severity of the reaction. This is consistent with terminology from the World Allergy Organisation (WAO) and the European Academy of Allergy and Clinical Immunology (EAACI) 30 . MedDRA and MeSH classify food allergy from a disease perspective and then acknowledge the response as a consequence from ingestion of the problematic food. MeSH also includes a logical and clear ow using the descriptor "food hypersensitivity" as a type of "immediate hypersensitivity" which is used as a synonym of IgE-mediated food allergy. This is linked to the causal food and the eliciting symptoms. In contrast, MedDRA utilizes a tree ow diagram to show how a food allergy is classi ed but this was a simpler and less detailed pathway compared to MeSH and SNOMED-CT. Since the ThRAll project is considering the reaction from a food and public health perspective, the SNOMED-CT approach to classify a food allergy was considered the most appropriate. Figure 3 illustrates how MeSH, MedDRA and SNOMED-CT classify a non-IgE immune mediated adverse reaction to food and coeliac disease. MedDRA and MeSH both provide multiple ways to classify the pathway of coeliac disease. These terminologies consider this disease as a nutritional or gastrointestinal disorder that leads to malabsorption, which clearly demonstrates the reaction is triggered by food.
MedDRA has a third pathway to classify coeliac disease from a disease perspective as an autoimmune disorder; this is consistent with the ThRAll approach, which classi es coeliac disease as an immunemediated reaction. Again, SNOMED-CT has a different way of approaching the classi cation of coeliac disease in comparison to MedDRA and MeSH but still considers it a malabsorption syndrome caused by the ingestion of gluten. Figure 3 demonstrates that coeliac disease is an adverse response with a clear food trigger, yet the pathway to classify this reaction is very different to the classi cation of a food allergy and so these reactions should be considered separately.
Lastly the pathways to describe a non-immune mediated adverse reaction by speci cally looking at the classi cation of lactose intolerance are shown in Fig. 4. MedDRA, MeSH and SNOMED-CT all describe lactose intolerance as a metabolic or gastrointestinal disorder which disrupts the absorption of carbohydrates. The ThRAll approach classi es lactose intolerance as a non-immune mediated reaction due to a disorder of an enzymatic process. The lack of lactase enzyme in lactose intolerance patients causes the malabsorption of the carbohydrate lactose and so demonstrates the similarities between these classi cations.

Classi cation schemes for symptoms and signs
MedDRA has a logical ve-tier structure expanding from very speci c to more general concepts. Lowest Level Terms (LLTs) represent the most specialised concepts including symptoms, medical procedures and personal characteristics. LLTs can be considered the preferred term (PT) or a synonym of the preferred term (PT) 31 . Similar PTs are aggregated into High Level Terms (HLT) based upon anatomy, pathology, physiology or aetiology. These HLTs are further categorised into High Level Group Terms (HLGT) that are then split into one of 26 System Organ Classes (SOC's) providing the most general classi cation. This is a logical and methodical organisation system but concepts are con ned to these ve levels and further clari cation or granularity cannot be expressed beyond the LLT to indicate the severity or manifestation of a symptom. Concepts in SNOMED-CT can vary in their speci city; more general concepts are aggregated together which lter down to more speci c terms. Relationships are used to portray a con rmed association between multiple concepts. The branching structure in SNOMEC-CT is useful to be able to code and represent clinical data at a level of detail that is appropriate to a range of different uses. In contrast MeSH is a cataloguing system with a slightly different framework to the other terminologies as terms do not identify clinical phenomena but instead represent a category. MeSH is used to categorise and retrieve records and is organised in a hierarchical structure with 16 primary categories that splits into subcategories providing more detailed terms. A letter corresponding to a category and a number representing the hierarchical level provides an identi er for each term.

Classi cation schemes for food
There are many levels of granularity that need to be considered when describing and encoding a food product including the origin of the food, the food matrix and processing techniques. For example, when considering a food allergy, we need to identify differences in the frequency or severity of a reaction,1 which may be affected by cooking technique (e.g. dry roasted compared to raw peanuts) or the food matrix/ vehicle used (e.g. peanut butter, whole peanut, baked goods containing peanuts). It is useful to have a system with the ability to encode multiple levels of detail depending on the amount of data and information that is available. This is important given that the literature shows that some food allergens are sensitive to food-processing techniques and a high fat content may increase the allergenicity of the protein 32,33 . The structure of these different classi cation systems represents their primary use whether that is to understand nutritional value, physical characteristics or the type of food product.
FoodEx2 is arranged into 21 clearly de ned food groups, such that every food aligns to exactly one group. The system is made up of base terms and facets; the base term is de ned by a unique vecharacter alphanumeric code and represents the speci c foods within the hierarchy. Facets provide additional detail to the base term, such as the origin of the product; its ingredients and the process involved in its preparation. This additional information can be combined with the base term to provide a more detailed and complete description of the food product. This demonstrates the range of granularity and amount of information that has been considered in this classi cation system making it useful for encoding allergenic foods and common matrices or derivatives used in oral food challenges.
LanguaL systematically classi es food in a systematic way according to 14 key concepts including product type, cooking method and packing medium. These concepts are used to encode the product and enable almost any food product to be classi ed to the level of detail that is required. A unique code is provided for each food concept, which can then be translated into multiple languages.
AGROVOC relates concepts in a hierarchical and non-hierarchical structure. A branching logic is used whereby the term becomes more speci c and precise. For example, nuts provides a broader way of describing hazelnuts. There is also a non-hierarchical relationship and this expresses related concepts where appropriate. Figure 6 demonstrates how each of the standards classi es hazelnuts as an example of a food concept. The gure shows that LanguaL utilises the logic from FoodEx2 and that each of the terminologies distinguish between plant and animal products before identifying the relevant concept from a list of key categories and then ltering into the speci c species.
Encoding symptom severity When considering the dose at which an allergen induces an objective reaction it is useful to encode the severity of that reaction. Thus, when identifying the dose that elicits a reaction in p% of the allergenic population (eliciting dose, ED p ), it is useful to understand the proportion of individuals that presented with either a mild, moderate or severe reaction at this dose. Some studies, such as iFAAM, have collected detailed information on severity which can also be used to identify criteria for stopping a challenge [16].
This information can be used to identify eliciting doses that present a tolerable level of risk of reaction in allergic individuals, as observed in the Peanut Allergen Threshold study 13 . The classi cation of reaction severity is inherently subjective, arising from clinical interpretation of signs and symptoms and many different approaches have been developed and compared 34,35 . SNOMED-CT codes do not always provide su cient detail to describe the severity as well as symptom presence but this can be addressed by using an additional quali er code (mild, moderate or severe) to fully express the speci c, observed sign or symptom. This is illustrated for the iFAAM oral food challenge record where symptoms have been classi ed and coded with both the symptom code and the terms, mild, moderate or severe using the approach of Sampson 36 (Table 4). This shows, for example, if a patient experienced one episode of diarrhoea this would be coded as the SNOMED-CT code for diarrhoea and the code of mild severity and paired in a structure to indicate one episode of diarrhoea: 62315008-255604002. Signs and Symptoms recorded in the iFAAM challenge record [16] were classi ed and coded as to their severity using a combination of the approach of Sampson 36  Signs and Symptoms recorded in the iFAAM challenge record [16] were classi ed and coded as to their severity using a combination of the approach of Sampson 36 as being either mild (Sampson grade 1), moderate (Sampson grades 2 and 3) or severe (Sampson grades 4 and 5) and then encoded using SNOMED-CT.

Discussion
The results demonstrate the complexity required to encode food allergy information and each terminology has a different way of classifying an allergic reaction. Furthermore, each terminology varies in the system used to classify and organise concepts, which depend on the primary purpose of the classi cation system. The analysis showed that SNOMED-CT encoded all of the key terms and provided the conceptual coverage required to maximise the representation and classi cation of food allergy data for public health purposes aimed at food allergen management. In addition, SNOMED-CT provides a classi cation scheme that relates speci c and more general concepts and that considers food as the trigger of an allergic reaction which is clinically relevant. Therefore, it was chosen as the most appropriate terminology to code data relating to food allergies collated within the ThRAll project. Additionally, SNOMED-CT also provides severity code, allowing for more precise, quali ed records of reactions.
Although FoodEx2, SNOMED-CT, MeSH, LanguaL and AGROVOC all provide terms for foods, FoodEx2 provided the greatest conceptual coverage including species of origin and food processing methods.
Consequently, FoodEx2 was selected to code the foods that initiate an allergic reactions, enabling analysis to be undertaken to identify how factors, such as food processing, may in uence eliciting dose threshold and reaction severity.

Conclusion
The ThRAll study has determined that SNOMED-CT and FoodEx2 are presently the most competent terminologies for the representation of clinical knowledge relating to food allergy, the former delivering encodings for observed symptoms or signs and the latter representing eliciting foods (Fig. 6). This combined set of terminologies provides not only complete conceptual coverage, but a depth of granularity providing a structured exibility to encoding a given required level of detail, via symptom severity in SNOMED-CT, and facets for food matrices and processing techniques in FoodEx2.
It is expected that by de ning a coding system for the ThRAll oral food challenge database will increase accessibility through standardisation. This will in turn extend the usefulness of these data into the future. Furthermore, the approach taken could also be expanded in future to encompass data from other clinical studies and registries of food anaphylaxis such as the network of severe allergic reactions (Network for Online-Registration of Anaphylaxis, NORA) 37 . The proposed pairing of codes to qualify severity of reaction is supported in HL7's FHIR (Fast Healthcare Interoperability Resource), a standard for exchanging healthcare information electronically. This system represents data as a set of related hierarchical elements and values into resources. For the purpose of capturing data relating to allergic reactions, it is possible to record a symptom element, a severity element and a food element within the same resource meaning that there is a standard de nition of the relationships between the codes. Using the proposed approach to coding within this standard could further increase the future utility of these data making for a promising approach for recording and storing reusable datasets. The coding approach is also applicable to representing an individual's food allergy information in a standardised way in their electronic health records by including other metadata, such as diagnostic certainty (e.g. whether the data was a self-reported, negative, possible, probable or con rmed allergy) which could ensure correct interpretation across medical systems.

Declarations
Ethics approval and consent to participate Classi cation of a non-toxic adverse reactions to food.

Figure 2
Page 20/22 Comparison of the pathways to describe the onset of IgE-mediated food allergy according to MedDRA, MeSH, SNOMED-CT and the ThRAll approach.  Comparison of the pathways to describe lactose intolerance, a non-immune mediated adverse reaction to food, according to MedDRA, MeSH, SNOMED-CT and the ThRAll approach.

Figure 5
Branching logic to classify the sign "urticarial rash" according to MedDRA, MeSH and SNOMED-CT.
*SNOMED-CT preferred tem is Wheal (disorder) with urticarial rash as a synonym.

Figure 7
The ThRAll pathway for coding food allergy information using SNOMED-CT and FoodEX2 Supplementary Files This is a list of supplementary les associated with this preprint. Click to download. FrenchetalFoodAllergyCodingSupplInformationver8.docx