Multilingual topic modelling for tracking COVID-19 trends based on Facebook data analysis

DOI: https://doi.org/10.21203/rs.3.rs-45177/v1

Abstract

Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the bene ts of social data analysis to the healthcare practices and curing. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited twitter as source. In this paper, we choose to exploit Facebook, rarely used, for tracking the evolution of COVID-19 related trends. In fact, a multilingual dataset covering 7 languages (English (EN), Arabic (AR), Spanish (ES), Italian (IT), German (DE), French (FR) and Japanese (JP)) is extracted from Facebook public posts. The proposal is an analytics process including a data gathering step, pre-processing, LDA-based topic modelling and presentation module using graph structure. Data analysing covers the duration spanned from January 1st, 2020 to May 15, 2020 divided on three periods in cumulative way: rst period January-February, second period March-April and the last one to 15 May. The results showed that the extracted topics correspond to the chronological development of what has been circulated around the pandemic and the measures that have been taken in the various languages under discussion.

Full Text

This preprint is available for download as a PDF.

Declarations

Acknowledgment

The work was supported by the Ministry of Higher Education and Scientic Research of Tunisia (MHESR) in the framework of Federated Research Project PRFCOV19-D1-P1.

Conflict of Interest

With the submission of this manuscript we would like to undertake that the authors declare no competing interests. The authors whose names are listed in the paper certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. As for data, the link exists in the submitted paper: https://mohamedalihadjtaieb.github.io/Covid19-based-Facebook-Research/ Data are collected according to the public data provided in our previous paper: Amina Amara, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha: Identifying i-bridge Across Online Social Networks. AICCSA 2017: 515-520.

tables