A Self-Assembling Blockchain-Based Database in Global Data Sharing of SARS-CoV-2

The year 2020 has been through the occurrence of a pandemic caused by a virus known as SARS-CoV-2. The uncontrolled worldwide spread of COVID-19 and the rapid transmission of this virus lead to more and more cases, which results in a massive data stream. Each day a new set of data points are being reported regarding the number of positive test results, patients hospitalized, deaths, ventilator shortages, etc. This paper reviews Blockchain technology principles that suit it for data sharing of the pandemic and highlights the potential applications and opportunities in allying this form of data deposition and sharing for combating the COVID-19 pandemic.

1. Background 1.1. Lack of su cient data sharing tools for SARS-CoV-2 Data generated by the medical units are more readily amenable to record, analyze, and share. In contrast, some data related to sparse research labs or even individuals are sometimes not included in the current open-source of COVID-19, or they might be inserted late or partially [1].
The raw data of research on assessing the druggable target [2], antiviral assays, and nally clinical trial data all need to be more readily accessed and exchanged between scientists to attain the drug in the shortest possible time. Since Machine Learning (ML) is essential to accelerate the analysis of complex and large datasets and disease prediction, Mortality Risk [28], and etc, The global sharing of scienti c data helps with gathering large amount of data and also offers a promising tool to ght the disease [27].
Ease of access and rapid worldwide sharing of SARS-CoV-2 data is critical for the deep understanding of its characteristics and the spread of the COVID-19 virus. This will give researchers around the world the chance to collaborate and help to develop treatments and a vaccine. Therefore, veri ed and updated databases or data sharing platforms are essential for facilitating data exchange of all related data of the COVID-19.

Principles of the Blockchain technology
Succeeding the introduction of Bitcoin by Nakamoto in 2008 [1], Blockchain also got popular in the role of an emerging distributed ledger technology. Instead of the mathematics behind the Blockchain, some principles clarify its usage for healthcare purposes, including data sharing of COVID-19 is discussed in this paper. Blockchain's main utility is that it makes it possible to participate in a distributed network minus the necessity for a centralized, trustworthy third party. It accelerates the transaction by eliminating the delay that emerged from the central authority. Simultaneously, it provides transactions with lower costs since the central authority's transaction fees are eliminated. Blockchain uses a consensus mechanism to reconcile inconsistencies among nodes in a distributed application instead of a central authority.

Types of Blockchain Technology
Different types of Blockchain currently exist; while some are public, the others are private, and merely legitimate people can access them. In a public blockchain, an open-access network, and fully decentralized, everyone using the protocol can read, write, or participate in the network. Any fresh block must be con rmed and validated by every computer participating in the network before added to the chain. Every transaction in a public blockchain is transparent and unchallengeable, which means when it is veri ed, it cannot be altered. On the other hand, private blockchains are networks governed by a single authority. Everyone requires authorization to do read, write, or audit operations in Blockchain. There are various levels of access, and information is usually encrypted to keep commercial con dentiality. Private Blockchains allow organizations to utilize distributed ledger technology without making the data public.
Private Blockchains work faster and more e ciently than public blockchains that take lots of time to validate and verify transactions [15,20].

Different Versions of Blockchain
Blockchain technology has different versions. Blockchain version 1.0 is the rst cryptocurrency, which primarily records Bitcoin transactions. The reason behind its rst version was the implementation of Distributed ledger technology. It is a permissive Blockchain where any participant can perform a valid transaction of Bitcoins. In Blockchain version 2.0, the cryptocurrency Bitcoin was replaced by a new one known as Ethereum. One of the supreme and successful blockchain applications is the smart contract, which reduces transaction cost magni cently. Currently, the Ethereum network is the top platform for creating smart contracts. Blockchain 3.0 is called Dapps, a short form for Decentralized applications, which now is the basis of many decentralized systems and applications [21]. Many decentralized applications are running on a peer-to-peer network, like BitTorrent, Tor, etc. The latest version of Blockchain technology is mostly known as the Blockchain for the industry. The fourth version of Blockchain's objective is to resolve the issues with the previous three versions. Blockchain 4.0 clari es the strategies that make blockchain technology accessible to data sharing demands such as medical and clinical practices.

Properties
The Blockchain is an ingenious method for peer-to-peer information, passing automatically and safely.
One party to a transaction begins the procedure by generating a block that is veri ed by thousands of computers distributed everywhere in the network. The veri ed block is appended to a chain stored across the net, making an exclusive record with a unique history. Altering a single record would mean changing the whole chain in masses of instances, which is virtually impossible. Decentralization, Transparency, and Immutability are the three main properties of Blockchain Technology, which have helped it gain widespread acclaim. Although the information is stored by one single entity in a decentralized system, it is stored by every node in the net, too [2]. Figure 1 illustrates the difference between centralized and decentralized systems.
Transparency is one of the main and most interesting yet somehow misunderstood concepts in the Blockchain. While some may argue that Blockchain provides privacy, in contrast, others claim that it is transparent. Blockchain employs cryptographic primitives to gather most of its capabilities so that participants in a Blockchain network are represented as nodes. Each node uses the speci c public key infrastructure, which serves as the user's public address. Each participant also holds a private key to authenticate the user itself.
Immutability in Blockchain means that after a record has been added to the Blockchain net, it cannot be interfered with others in order to cause damage or make unauthorized alterations. Blockchain bene ts this property by using the cryptographic hash function. A cryptographic hash function is a particular class of general hash functions that has properties making it ideal. Generally, in hashing methods, no matter how long your input is, the output always has a xed default length. Hence, when you deal with a massive amount of data, this could be very critical. Basically, instead of keeping track of the input data, which is probably huge, you just remember the hash function. A critical property of the cryptographic hash function is called the "Avalanche Effect" which means that even if you make a minor change in input data, its re ection in the hash function will be tremendous. Imagine blocks in a net; Any modi cation in block #3, will change the hash stored in block #2, and this will also affect the data and the hash of block #2, which will result in changes in block #1 and so on. This is precisely how Blockchains attain immutability. Any minor change will entirely alter the whole chain, which is inconceivable [2].
The appeal of Blockchain technology and its application in various industries has reached a point where several uses for Blockchain in healthcare have been identi ed [23,25,26]. Decentralization, a key feature of Blockchain that is advantageous for medical applications, provides the distributed implementation of healthcare data sharing, which is not dependent on a centralized authority. Furthermore, the transparency property is attained by a routine, which copies the information on the Blockchain to all the nodes. This issue allows healthcare workers, stakeholders, also patients to see how, when and by whom their information is used. More importantly, this also compromises any single node in the Blockchain network that does not have an effect on the ledger's state because of the information being replicated through the network. By this approach, Blockchain protects healthcare data from potential problems like corruption, data loss, and mainly security attacks. Besides, being impossible to alter or change records added to the Due to the immutability property of Blockchain aligns with the guarantee needed for the validity and integrity of health records. Figure 2 shows the way of transaction performance in the Blockchain network.
Some applications of Blockchain in healthcare systems are even commercial, such as the MedRec project, which was created to facilitate the management of permissions, authorization, and data sharing among health care providers and Guardtime. This company operates a Blockchain-based healthcare platform to endorse patients' identities for Estonian citizens [3].

Current Companies Using Blockchain
Yet, Blockchain also has the potential to disrupt clinical data management and its transparency. Its immaturity in healthcare applications is an obstacle that yet needs to be conquered. Despite all of these, pharmaceutical companies have been showing an increasing interest in the application of Blockchain in the clinical trials area. Collaborations with technology companies have also been increased during the last few years. Blockchain technology in clinical trials offers a decentralized system that provides data integrity, transparency, and patient participation, automation of processes while improving trial quality and patient security at a lower cost. Table 1 is a list of companies in a particular area that use Blockchain technology. Besides, Fig. 3 displays Market investments into blockchain technology worldwide.

Constructing a Blockchain-based Database
As computer-assisted human activities become increasingly dependent on data, data integrity has become much more important. Moreover, nowadays, data is a very tempting target for cyber-attacks.
Threats to data integrity are very prevalent, and damaging data integrity may affect critical decisions in any area [4]. The integrity of medical data is important to stakeholders, including academic researchers, and most importantly, prospective patients and the general public services. Numerous threats to the validity of clinical trial data exist to challenge data integrity [5]. Unintentionally or by immoral intent, records can be either modi ed or even permanently lost. Redundancy exists in database systems and usually not transparent to outside observers. Moreover, there is a critical threat that the published analysis of data is not a real representation that was planned initially [6]. In cloud computing environments, this problem is even more critical, where physical storage of data and control of data access is beyond the control of data holders. Therefore, electronic data preservation technology is essential to provide legal evidence of medical disputes and medical negligence [8]. Figure 4 illustrates a simple overview of a Blockchain-based database.
Blockchain has emerged as a convenience technology that provides attractive data integration features. However, there are many barriers for using Blockchain to address data integrity threats, including its limitations on low throughput, high latency, and instability that make Blockchain-based solutions impractical. A Blockchain database is the same as a normally centralized distributed database. It can store data that can be accessed, modi ed, and added, but there are fundamental differences [7].
To maintain the database and users' access to it, a typical client-server database uses centralized servers, at which time clients can access and also modify the data. The original version is always secure stored on central servers. Data security is established by the administrators who control the network with the fact that users must be allowed to access it. When a user modi es data, any changes are logged by the central server before updating others who have access to the database.
On the other hand, a Blockchain database is entirely decentralized. It is maintained and controlled by users who act as active participants. The main point is that all participants must verify the transaction's validity to be registered in the database. Using Blockchain technology, this technology creates an immutable ledger of transactions that provides more security and eliminates the need for a unique central controlling entity that has database management rights. The data is recorded as a block in the database. When a new entry is recorded, it is appended to the chain. Consequently, the database has stored all of COVID-19's recorded data as a block since the ledger was primarily started. This makes the Blockchain database much more secure because each participant works independently of the others. Therefore, each participant will take action to prevent unauthorized alteration of the data recorded in the chain [6]. An important feature of a highly scalable distributed database is that it allows companies to store and access large amounts of data in real-time.
Blockchain as a database can contain any kind of information. Still, Blockchains are limited in storing large amounts of data because of the number of transactions they can process at a given time and can never exceed any of the nodes participating in the network. This issue leads to a crucial problem that affects Blockchain, known as scalability. As a client-server database grows, additional resources are added to control the extra computing power required. On the other hand, in a Blockchain, this adds additional nodes to the network. Because of the need to validate transactions across all nodes, the term "inter-node latency" increases logarithmically, rendering the database ine cient and slowing down the network growth process.

Security of Blockchain databases
Every day, a massive amount of medical data is produced in diagnosis and treatment in health centers. This data is vital for both patients and health care workers. The content must be valid and complete when built, and the data must be traceable and immune to falsi cation, modi cation, or deletion when validated. Medical data must also be secure and anonymous when stored because of sensitive personal information. Hence, data retention mechanisms play an important role [8]. In addition, the data must be encrypted so that the stolen data cannot be understood without decryption. Although the data generated varies from person to person, data formats can be classi ed into speci c types of information.
Data sharing among the worldwide research communities can help compile powerful data sets, which would play a critical role in COVID-19 research. Data-sharing protocols and regulations should not be violated in the data-sharing tools. Patients' privacy is one of the most crucial concerns and impediments to implementing the medical data sharing system and its varied adoption. For instance, the national and international groups, Health Insurance Portability and Accountability Act (HIPAA), impose vigorous controls and outline data access control policies [24]. Blockchain is used to fortify effective protocols and develop an apt basis for an e cient, evidence-based process that plays a tactical role in secure sharing of data between people, clinics, and research centers, independent of its reliability of the crosschecking of these groups themselves [23]. The mechanism used in the Blockchain is to develop a secure system for the storing of distributed data. The risk associated with the centralized system is xed by cybersecurityrelated with the distributed blocks [22].

Potential of Blockchain in the data registry
Blockchain can be vital in addressing challenges that the clinical research community faces, such as reproducibility, personal data privacy concerns, data sharing, and patient registration in clinical trials. The main characteristic of Blockchain that services depend on trusted third authority can be reconstructed in a transparent, secure, decentralized way that enables users to have full control and lets the service be observable; hence, it ensures tracing information and permits for securely automating the clinical trial. Simultaneously, it also provides shareable parameters for patients or clinical trial stakeholders. The potential impact of Blockchain technology implementations in clinical trials will illuminate how recent clinical trial methods could develop and pro t Blockchain technologies to confront the challenges mentioned above [16][17][18][19]. Figure 5 outlines the strengths, opportunities, weaknesses, and threats of Blockchain.

Clinical trials of drug development for COVID-19
Nowadays, patients are willing to be more attentive, and their engagement is becoming a great success factor for clinical trials [9]. Patients are assured to agree to the protocol prearranged in the Blockchain when giving consents to contribute to a trial. The smart contract feature guarantees that any protocol change results in patients' consent renewal [10]. This technology also enables main success features, including direct patient engagement, reporting to regulators, and secure data tracking. Through Blockchain's structural characteristics, data transparency can be guaranteed, including decentralization, encrypted immutable records, and trusted time-stamping [12].
Most of the data is being collected from various sources during the current pandemic, including the datasets mentioned above. As the virus has spread worldwide, misleading and false information has also spread with it [13], leading to erroneous conclusions. Consequently, data quality plays a signi cant role in revealing trustworthy information through research, and falsifying clinical data can be a hazardous risk in clinical research discoveries. Blockchain can ensure data accuracy and quality through its service perspective and immunity, thus signi cantly contributing to pandemic data's quality and transfer transactions [12]. Studies have expected that the numbers of patients in several countries worldwide are considerably greater than what is o cially reported [11], which can lead to numerous errors in making vital decisions; hence, maintaining the transparency of data during a pandemic is required for e cient decision-making.
Enrolled patients must also share all of their medical information, including past or current medical conditions. Since the lack of any generalized digital health record system, prior patient condition information is based on the personal statement, which can be inaccurate. For instance, in the recent Estonian experience, where 1.3 million residents pro ted from a Blockchain-enabled e-Health record system that automatically gives investigators access to the patient medical information recorded in the net, including their documents. This access would be assured until the patient gives its consent according to the smart contract; hence, this would lead to precise patient information. Moreover, ongoing logging of patients' health condition records on the Blockchain net provides transparency and trust between patients and information investigators.

Clinical trials of vaccine development for COVID-19
Controlling the virus's spread necessitates the human body's rapid immunizations against the virus by an active vaccine. At the time of writing this paper, many research institutes and laboratories worldwide are conducting clinical trials of several vaccine candidates. A few, such as P zer-BioNTech and Moderna, have already been veri ed. As C.D.C. reports, based on evidence from clinical trials, the P zer-BioNTech vaccine was 95% effective at avoiding laboratory-con rmed COVID-19 illness in people without previous infection.

Challenges in applying Blockchain to pandemic data collection
Despite Blockchain's excessive potential in battling the COVID-19 pandemic, several challenges should be considered. In the following, some of the major challenges are outlined.

Lack of Technology Professionals
Developing a blockchain platform requires various skillsets ranging from security, technical and engineering, medical knowledge, and other related areas, which causes problems for companies in a shortfall in the expert workforce. Consequently, some companies train their employees in their training centers to ll blockchain-related job vacancies, while others may outsource. Yet, nding experts is still a critical challenge due to technology's novelty [14].

Scalability
Moreover, as the number of transactions grows, the blockchain network will grow exponentially since all network nodes should store all validated data. This becomes a challenge as there is a limitation on the block size and time interval required for adding a new block. Existing blockchain platforms process a small number of transactions per second, which prevents large-scale blockchain platforms. Hence, scalability is also an obstacle in applying Blockchain to pandemic data.

Legal Considerations and Data Privacy
Legal issues also need to be considered since the current pandemic's most crucial concern is associated with the data being accessed and shared in the blockchain network. Several issues must be resolved by different parties, such as international health organizations. Data privacy is another critical challenge that is already discussed in detail [14].

Concluding remarks
We have discussed how the emerging blockchain technology features and capabilities could be leveraged in data storing and sharing and combating the ongoing COVID-19 pandemic through the paper. We explored the blockchain applications and the basics of technology and represented its potential in medical applications. We identi ed the participating organizations' key requirements to develop blockchain-based systems for healthcare emergency services. We discussed existing blockchain-based systems developed recently to implement various services related to data privacy assurance, remote testing, tracing, and remote outpatient health monitoring. In cooperation with traditional health systems, this system can enhance service quality, diagnosis capabilities, and monitoring and allow patients to actual ownership of their medical data. We have considered the limitations of the Blockchain usability and the challenge of identifying and understanding its consequences to formulate strategies to overcome them.

Future directions
Blockchain networks can be optimized to reach e ciency in network latency and improved security, making Blockchain ideal for healthcare applications like the current pandemic. Well-organized blockchain design in healthcare is required to optimize data veri cation and safety. Emerging local and private blockchain networks is another possible solution to reduce the size of Blockchain net or create customized ledgers located on local servers in each outbreak area to improve blockchain performance and rapidness.
Pioneering technologies such as Blockchain and Arti cial Intelligence applications could answer the comeback to the Coronavirus outbreak. At the same time, Blockchain can confront pandemics by enabling rapid and trusted data sharing, early detection of epidemics, and protecting patients' data privacy; AI offers smart solutions for recognizing symptoms triggered by Coronavirus for treatments assisting drug industrial and vaccination.
Additionally, combining technologies can also help infected people using smart devices through the Internet of Things aiding Blockchain. Internet of Things (IoT)-based healthcare devices collect useful information, provide extra perception through symptoms, enable remote monitoring, and offer individuals better self-determination and healthcare using the Blockchain ability to secure the data sharing of patient health records. The steps of transaction performing on Blockchain network Market investments into blockchain technology worldwide Blockchain pros and cons of applying the blockchain technology for the global sharing of the COVID-19 data Figure 6 Displays the cumulative COVID-19 vaccination doses administered per 100 people.