Trends in reasons for emergency calls during the COVID-19 crisis in the department of Gironde, France using automatic classification.

Objectives During periods such as the COVID-19 crisis, there is a need for responsive public health surveillance indicators related to the epidemic and to preventative measures such as lockdown. The automatic classification of the content of calls to emergency medical communication centers could provide relevant and responsive indicators. Methods We retrieved all 796,209 free-text call reports from the emergency medical communication center of the Gironde department, France, between 2018 and 2020. We trained a natural language processing neural network model with a mixed unsupervised/supervised method to classify all reasons for calls in 2020. Validation and parameter adjustment were performed using a sample of 20,000 manually-coded free-text reports.


Introduction
Coronavirus 2019 Disease  caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) was detected for the first time in December 2019 in China 1 . Since then, as of first November 2020, more than 40 million cases and more than 1,100,000 deaths have been reported worldwide 2 .
Health-related surveillance data are indispensable for the adjustment of public health policies aimed at curbing the spread of the virus and protecting vulnerable populations, for adapting the means of caring for the most seriously ill, and for assessing the indirect consequences of the pandemic and of the measures implemented to control it: lockdown, movement restriction, social distancing and quarantine for some countries.
Many countries have relied on an extrapolation of classic infection-control and public-health measures to contain the COVID-19 pandemic, similar to those used for SARS in 2003 and H1N1 in 2009. These range from extreme quarantine measures in China to painstaking detailed contact tracing with hundreds of contact tracers (e.g., Singapore, Hong Kong, South Korea). However, these measures may not be effective in 2020 for tackling the scale of COVID-19 3 . Monitoring, surveillance, detection and prevention of COVID-19 have launched a new challenge on a global scale and digital technologies have so far failed to be the driving force behind it with a few notable exceptions 4 .
The number of people who test positives by PCR or chest CT images with characteristic lung damage is the most reliable indicator of the number of people with the virus. It is, however, heavily dependent on the screening strategy, which varies greatly from one country to another. The number of people entering an ER with symptoms suggesting SARS-CoV-2 infection was implemented nationally in France on March 16, 2020, with specific coding of the main diagnosis in ER summary reports centralized by Santé Publique France in the OSCOUR® Emergency Department Surveillance Network 5 set up in 2004. The French health authorities are also monitoring the number of COVID-19 patients hospitalized, admitted to intensive care units, and the number of deaths in hospital and in nursing homes (EHPAD).
All these tools, completed by a large set of ad-hoc surveys and specific surveillance systems have been mobilized to monitor the course of the epidemics and its consequences. For example, analysis of the registry of the Paris-Sudden Death Expertise Center observed a transient increase in cardiac arrest incidence 6 . A similar observation was reported in Lombardy, Italy, a region heavily impacted by the COVID-19 pandemic 7 .
Reports of call content to emergency medical communication centers (EMCC) is a source of information that needs to be considered for health surveillance during such a pandemic. By dialing 15, French citizens can get medical advice and, if necessary, a medical mobile care unit can be sent to the scene. In addition, the health authorities had instructed the population to call 15 in the case of symptoms or concerns, and to avoid spontaneous uninvited visits to the emergency room (ER) 8 .
We used an automatic classification tool based on an artificial neural network language model we recently adapted for free-text clinical notes 9 to identify the main reasons for calls to the Gironde EMCC, in order to monitor trends in the nature of these calls before, during and after the national lockdown established in France on March 17, 2020 and eased on May, 11, 2020.

Method Setting
The Gironde department (1.6 million inhabitants) is served by a medicalized EMCC known as SAMU 33 (Service d'Aide Médicale Urgente de la Gironde) which answers calls to the French toll-free number dedicated to medical emergencies (the "15"). A call is first received by a medical assistant, and then an emergency physician or a general practitioner (depending on the severity of the case) decides on the appropriate response, from medical advice to the dispatch of an ambulance or a mobile intensive care unit 5 .

Clinical reports
For all cases handled, a clinical report is created in the form of a computerized free-text note and updated by a medical assistant and a physician through the various telephone interactions with the patient, family, witnesses, and then with the paramedics if applicable. These clinical notes contain all the elements that make it possible to know the circumstances of the event at the origin of the call and the clinical observations made by the protagonists, whether they are witnesses, the patients themselves, or the medical personnel involved on site or responding to the call.

Classification of clinical reports using the GPT-2 neural language model
Over the past 10 years, neural language models have progressively taken the largest share in the field of natural language, with applications like machine translation, document classification, text summarization and speech recognition.
New levels of performance have only recently been achieved with the use of models based on the concept of attention, which consists in learning dependencies between words in a sentence without regard to their distances. This mechanism has been implemented in a sequence to sequence neural network model, the Transformer architecture, proposed in 2017 12 . One of the latest examples is the GPT-2, published in February 2019 by OpenAI. GPT-2 is a large transformer-based language model trained on a dataset of 8 million web pages 13 . Beyond the capability to generate coherent texts, Transformer models have the potential to perform other tasks such as question answering and document classification with a limited number of training examples. The training of the model is performed in two distinct phases 14 : the first pre-training unsupervised phase (i.e. not needing a human classification), consists in exploitation of a text corpus. In our application, this consisted in the 312,367 clinical reports of the year 2018. This leads to the ability of automatic text generation (see Supplementary material 1). The relevance of these synthetic sentences suggests that the networks learn contextual semantic representations. The second training phase, this time a supervised one (i.e. using examples classified by humans), creates a system able to perform the specific classification tasks.
The 117-million parameters version of the GPT-2 model was trained on a workstation with one Nvidia GeForce RTX Titan Graphic Processing Unit with 24GB of VRAM. The computation time for the pre-training phase was 26 hours per epoch. On average, 12 samples were processed per second in the training phase.

Extraction of free-text digital calls reports from the EMCC of the Gironde department, France
We retrieved call reports from the digital medical record system of the EMCC of the Gironde department, France.

Selected groups of reasons for calls
A first model was built by grouping calls according to the main reasons into 12 broad categories (chest pain, gastroenteritis and abdominal pain, flu-like symptoms and breathing difficulties, focal neurologic deficit and stroke, road traffic crash, violence, suicide and self-harm, injury other than violence, self-harm and road crash, pregnancy and delivery problems, malaise with loss of consciousness, stress and anxiety). Many symptoms or situations did not fall into these broad categories and were grouped into a thirteenth "other" category.
Another distinct model was built to count calls for which an acute alcohol intoxication was involved.

Classification procedure
Classification models were built and validated using a five-phase procedure: (iii) These two models were validated using the manual coding of a 39,907 random sample from the 2019 dataset.
(v) All 254,633 call reports of 2020 (January to September) were classified using the two models built in the previous steps;

Emergency room admissions with suspected COVID-19 and SARS-CoV2-2 PCR tests result
We retrieved daily aggregated data of ER admission with suspected COVID-19 and the daily number of SARS-Cov-2 PCR positive tests in the department of Gironde from the national public health agency "Santé Publique France". All data are available on the Geodes website (https://geodes.santepubliquefrance.fr/).

Analysis
For all call reasons, a probability cutpoint was determined in order to optimize the accuracy of the number of calls estimated by the model. This was done by selecting the cutpoint for which precision and recall are equal in the validation sample (i.e. when the number of false positive predictions equals the number of false negative predictions). The confidence of the method was assessed using a bootstrap procedure with 10,000 1:1 random partitions of the validation sample. For each partition (and each reasons), the cutpoint is selected using the first half of this partition sample and is the one for which precision equals recall. The difference between the manually determined prevalence of the given reasons and the predicted prevalence (using the selected cutpoint) is then computed in the second half of the sample. Median bias and 95% confidence interval were derived from the distribution of these 10,000 bias measures. Estimated daily counts of calls to EMCC were plotted against time with a 7-days moving average window to smooth the one-week periodicity in calls, which were systematically 25% more frequent on Saturday and Sunday.

Model training, parametrization and validation
The unsupervised training phase led to a model capable of generating artificial texts with the same structure as the learning texts, containing essentially the same clinical notions, but whose coherence is often inconsistent (see examples in the appendix). The content of these synthetic reports however suggests that the model has learned contextual semantic representations.
The training sample included from 1,829 (road traffic crash) to 412,218 (other reasons) samples per group (see Table 1). Performance statistics computed using the 39,907 manually-coded samples are shown in Table 1. Area under the ROC curve ranged from 0.785 to 0.990.
The optimal cutpoints to minimize bias in prevalence estimates were chosen using the same manually-coded sample of 39,907 call reports. Bootstrap 95% confidence interval assessment showed that the method provided narrow confidence intervals (Table1).

Reasons for calls during the Covid-19 crisis
In 2020, the median daily number of calls to the EMCC with a report was 892 with a total number of 254,633 and a peak of 1,926 on March 14. The distribution of the estimated number of calls per group of reasons and according to the lockdown period is shown in Table 2.
Three groups exhibited a peak at the onset of the lockdown period. Figure 1 shows that calls for flu-like symptoms and breathing difficulties began to rise on February 21, 20 days before the rise in ER admission for COVID-19, and peaked on March 14 with 928 calls out of a total of 1,926 calls that day. By the 16th day after February 21, the number of calls in this group had reached 403 calls per day, which is higher than all levels over the past 5 years. Starting from March, 1, 2020, the correlation between daily calls for flu-like symptoms and daily ER admissions with a delay of 14 days was 0.80 (p <10 -15 ; Pearson's test). Two other reasons, chest pain and stress and anxiety, were found to peak 12 days after March 14, with a curve of the same shape as the number of ER visits for COVID-19. Figure 2 shows that calls for alcohol intoxication, road traffic crash and malaise with loss of consciousness experienced a sharp decline around the onset of lockdown and that these trends started several days before the first day of lockdown. In a lesser extent, the latter curves were parallel with those for violence and injuries other than violence and road crash. Finally, no noticeable trends in relation to lockdown was found for other groups of reasons including gastroenteritis and abdominal pain, stroke, suicide and self-harm, pregnancy and delivery problems.

Discussion
The number of calls to EMCC showed a significant spike centered on the March 14, the day before the closure of all nonessential public places in France, three days before lockdown. The main reasons were related to flu-like symptoms (cough and/or fever), followed 14 days later by a peak in ER admissions, in calls for chest pain and calls for stress and anxiety. Calls for road traffic crashes, malaises with loss of consciousness, non-voluntary injuries and alcohol intoxication fell by 59%, 38%, 24% and 23% respectively during lockdown.
Probably the most interesting finding of our study is the delay we observed between the rise in calls for flu-like symptoms (mainly cough and fever) and the rise in ER visits for suspected COVID-19. Thus, the curve began to rise 20 days before the increase in ER visits. By the 16th day after the start of this rise, the levels reached a level higher than any levels in the past 5 years. One could hypothesize that the peak of calls recorded around March 14 was due by the concern, if not anxiety, caused by the announcement on television of the closure of public places by the French President on that day. However, in a more affected part of the country, the Ile-de-France region, the peak was reached much earlier (10 days earlier) 15 , suggesting that most of the calls we recorded were more motivated by symptoms than by concerns raised by communication by the authorities. Further, a spike in calls for stress and anxiety was measure 14 days later. EMCC call content is therefore probably the most predictive early indicator of the start of the epidemic, as recently shown by Riou and colleagues who found in the Ile-de-France region a strong correlation between calls regarding suspected COVID-19 and the number of patients in intensive care, with a delay of 23 days 15 . This is why this is considered for the monitoring of a potential relapses in the epidemic 16 . Finally, while the number of calls for flu-like symptoms proved to be an early and relevant signal, its intensity was probably increased by the request of the authorities not to go directly to the ER and to contact instead the EMCC.
An important difference between the work of Riou and colleagues and this study is that our process was clearly agnostic to the COVID-19 epidemic or to lockdown, as the automatic classification used models trained using reports from previous years. This results in a procedure that remains independent of the COVID-19 epidemic and lockdown, which would have influenced human codification. The signal thus obtained depends less on the context and is more likely to be an indicator of the actual public health situation. More generally, the added value of using an automatic classification procedure based on a natural language processing model is that it frees us from the context in which the reported events are coded. For this reason, we did not use the coded diagnoses at the time of the call to observe trends. In addition, these diagnoses were absent from one-third of the reports.
In the context of the COVID-19 epidemic, several research teams have used a similar approach, attempting to investigate the internet or social media to build early indicators of the epidemic 11,17 . However, no such signal could be found from a Google keyword search 18 , as the peak for cough, fever, coronavirus or COVID-19 was not reached until the week of [15][16][17][18][19][20][21] Contrary to what was observed in Paris 19 , no increase in calls related to cardiac arrest was observed in our study. This observation supports the hypothesis that the transient twofold increase in out-of-hospital cardiac arrests observed in Paris and its suburbs could be due to COVID-19 infections and to pandemic-related health system issues in heavily impacted regions. This was clearly not the case in the Gironde department where EMCC and intensive care units have never been overwhelmed.
A very significant decrease in calls for malaise with loss of consciousness, and to a lesser extent for strokes, was observed, starting one week before lockdown. This paralleled the sudden drop in ER visits that was observed in many countries that issued a statewide stay-at-home order 20 , raising concerns that patients who needed medical care were not presenting to the hospitals and, for example, that stroke patients arrived too late to receive tissue plasminogen activator. The actual overall public health impact of this phenomenon will have to be carefully assessed when we have enough hindsight to appreciate its medium-term health consequences.
The decrease in calls associated with interpersonal violence and alcohol intoxication is less surprising as it is probably due to the reduction in social interactions. Interestingly, the figures returned to normal levels by the end of the lockdown period. Early on, concerns were raised about the risk of domestic violence as a result of lockdown 21 . This was not confirmed here by calls to EMCC. Although, unfortunately, not all domestic violence is reported to EMCC, this is an interesting result because most statistics used during lockdown to estimate the incidence of intimate partner violence were derived from Police reports and not all violent events reported to EMCC are reported to the police.
In order to produce results in a time frame compatible with the health emergency related to the recent lifting of lockdown measures, we used the samples from 2016-2018 for which a diagnosis was coded during the call by the medical assistant in charge of handling it. The ideal procedure would have been to perform a manual coding of this training sample, which was done for a sample of 39,907 reports from 2019, but retained in this work as a validation sample. Our previous work has shown that it takes about a thousand different examples to maximize the performance of the model 9 . This would have meant manually coding more than 100,000 notes, a task that was out of reach in a short period of time. The performances of the GPT-2 model measured with the manually coded validation samples were, however, very high and allowed us to derive a reason for call for all reports including the 22% of them with a missing value for the diagnosis.
Some limitations need to be acknowledged. First, although we have shown that an AI-based natural language model shows high performance in classifying free-text clinical reports, a small proportion of reports remains misclassified. Because our exercise here was to provide prevalence estimates, we adjusted the decision cutpoints so that precision equals recall. The bootstrap analysis showed that this was a very reliable strategy. Second, not all calls are handled by the EMCC, a proportion of them remains unanswered and this proportion increases during peak periods. It is therefore likely that around March 14 the number of attempted calls was higher than those handled. Finally, the study was done in Gironde, a department with a reportedly low rate of SARS-Cov-2 infection if compared to the Ile de France and the north-east regions of France. However, lockdown and fear of the epidemic affected all French people and the Gironde EMCC are the third largest in terms of the number of calls received in France, which has made it possible to build up a sufficiently large database.

Conclusion
Given the fast spread and severity of COVID-19, the availability of reliable surveillance platforms is crucial for timely monitoring and for responding with adequate control measures to the COVID-19 epidemic and others to come, together with other major events with public health consequences. The results of this study, carried out in the context of a period of lockdown related to the COVID-19 epidemic, illustrate the extent to which automated classification of the reasons for calling EMCC is a powerful epidemiological surveillance tool, provides insights into the societal upheavals induced by a health crisis and would be instrumental to better anticipate the needs of the health care system.  Table legend   Table 1. Training sample size (from 2016-8 datasets) and validation from manually coded samples from 2019 dataset Table 2. Reasons for calls to EMS in 2020 as determined by the GPT-2 model. Calls per day and per motive

Confidentiality, ethics and data protection
No personal data were necessary for this work. All reports were automatically de-identified using a grep-based textsearch procedure that was applicable because of the standardized format of personal information inserted in the reports. This work conforms in terms of the protection of personal health data and the protection of privacy to the application framework provided by Article 65-2 of the amended French Data Protection Act and the General Regulation on the protection of personal data. It was approved by the Bordeaux Teaching Hospital committee for ethics and data protection.