Experimental Environment to Model, Simulate and Analyze Contagious Diseases as a Diffusion Process in Social Networks

. In this paper the novel model of diffusion on networks and the experimental environment are presented. We consider the utilization of the graph and network theory in the field of modelling and simulating the dynamics of contagious diseases. We describe basic principles and methods and show how we can use them to fight against the spread of this phenomenon. We also present our software solution – CARE (Creative Application to Remedy Epidemics) that can be used to support decision-making activities.

The drawbacks of the standards diffusion models is that they do not take into account underling real-world networks topology. Who (or what) is connected to whom (what), seems to be a fundament question. Apparently, networks derived from data on real life cases (most often: networks growing spontaneously), are neither regular graphs nor random ones. As it turned out real networks, which have been intensively studied recently, have some interesting features. These features, whose origins are nowadays discovered, modelled (Barabási and Albert 1999; Barabási and Albert 2000;Erdös and Rényi 1959a;Erdös and Rényi 1959b;Newman 2000;Watts and Strogatz 1998) and examined (Barabási and Albert 2002;Kasprzyk 2012a;Newman 2003;Strogatz 2001;Wang and Chen 2003) significantly affect dynamics of the diffusion processes within real-world networks. Three very interesting models of real-world networks which have been introduced recently e.g. Random Graph, Small World Networks and Scale Free Networks, will be described later in this paper.
We have to also remember that all kinds of phenomena spreading over the networks have their unique properties, and we should be able to model them. The notion of a state machine seems to be useful in this modelling situation. Using probabilistic finite-state machines (Sokolova and de Vink 2004;Vidal et. al. 2005) we can model spreading of vast variety of phenomena. For example, we are able to build models of diseases with any states (e.g. susceptible, infected, carrier, immunized, dead, etc.), and probabilities of transitions from a state to another resulting from social interactions (contacts). Again, the underling contacts (social network topology) seems to have a huge impact on the dynamic of diffusion processes, what has been already mentioned.

Basic definitions and notations
Let's define network as follows (Kasprzyk 2012a;Korzan 1978 h  can also change over time. In this paper we were particularly interested in relationship between the structure of real-world networks and the dynamic of any phenomena on them. Due to this fact, we focused on the characteristics of the graph () Gt , while functions on the graph's vertices (nodes) and edges (links) were omitted.
Simple dynamic graphs are very often represented by a matrix () At called adjacency matrix, which is a The degree () i ktof a vertex i v is the number of vertices in the first-neighbourhood of a vertex i v , i.e.: The path starting in vertex i v and ending in vertex j v is a sequence of Let's denote the number of existing edges between the first-neighbourhood of a vertex i v as () i Nt, i.e.: Now, we can define a very important concept, so called the local clustering coefficient i which is then be given by the proportion of () i Nt divided by the number of edges that could possible exist between first-neighbourhood of a vertex i v (every neighbour of i v is connected to every other neighbour of i v ). Formally: The clustering coefficient C for the whole network is define as the average of i The degree distribution ( ) , P k t of a network is defined as the fraction of nodes in the network with degree k . Formally: Vt is the number of nodes with degree k ; () Vt is the total number of nodes.
These features whose origin are nowadays discovered indeed affect the diffusion processes within networks.
Understanding the balance of order and chaos in networks is one of the goals of the current research on so called complex networks. Identifying and measuring properties of a real-world networks is a first step towards understanding their topology. The next step is to develop a mathematical model, which typically takes a form of an algorithm for generating networks with the same statistical properties. The above-mentioned features have contributed to creation of a wide range of models which describe genuine networks in a more adequate way than Random Graphs. In the 90s, first types of such network models were created i.e. Small World Networks and Scale Free Networks, and their further development is still underway.
For a long time real networks without visible or known rule of organization were described using Erdös and Rényi model of Random Graphs (Erdös and Rényi 1959a;Erdös and Rényi 1959b). Assuming equal probability and independent random connections made between any pair of vertices in initially not connected graph, they proposed a model suffering rather unrealistic topology. Their model has now only a limited usage for modelling real-world network.
Not long ago Watts and Strogatz proposed Small World model (Watts and Strogatz 1998) of real-world networks as a result of simple observation that real networks have topology somewhere between regular and random one.
They began with regular graph, such as a ring, and then "rewire" some of the edges to introduce randomness. If all edges are rewired a Random Graph appears. The idea of this method was depicted in Figure 1. The process of rewiring affects not only the average path length but also clustering coefficient. Both of them decrease as probability of rewiring increases. The interesting property of this procedure is that for a wide range of rewiring probabilities the average path length is already low while clustering coefficient remains high. This correlation is typical for real-world networks. Barabási and Albert introduced yet another model (Barabási and Albert 1999) of real-world networks as a result of two main assumptions: constant growth and preferential attachment. They showed why the distribution of nodes degree is describe by a power law. The process of network generation is quite simple. The network grows gradually, and when a new node is added, it creates links (edges) to the existing nodes with probability proportional to their connectivity. In consequence nodes with very high degree appears (so called hubs or superspreaders), which are very important for communication in networks. There are many modification of this basic procedure for generating networks. Now it is considered that Scale Free models of real-world networks are the best ones.

Measures of nodes importance
In the Figure 3, there is an example of real social network. Nodes represent individuals and link social interactions.

Fig. 3.
An example of a real social network.
The most basic and frequently ask question is how to identify the most important nodes. The answer can help maximize or on the other hand minimize diffusion dynamic of any phenomena within networks. We decided to use so called centrality measures to assess nodes importance. No single measure of center is suited for application. Sever noteworthy measures are degree centrality, radius centrality, closeness centrality, betweenness centrality, eigenvector centrality. Thanks to these measures we can show for example how to disintegrate the network with minimum number of steps and in consequence minimize diffusion area, in particular how to optimize vaccination strategies (Kasprzyk 2009;Kasprzyk 2012a).

• degree centrality
Gives the highest score of influence to the vertex with the largest number of first-neighbours. The degree centrality is traditionally defined analogous to the degree of a vertex, normalized over the maximum number of neighbours this vertex could have: . 4. Importance of nodes according degree centrality.

• radius centrality
Chooses the vertex with the smallest value of the longest shortest path starting in each vertex. So if we need to find the most influential node for the most remote nodes it is quite natural and easy to use this measure: . 5. Importance of nodes according radius centrality.

• closeness centrality
Focuses on the idea of communications between different vertices and the vertex, which is "closer" to all vertices gets the highest score:   Using matrix notation, we have as follows: Hence, () ec t → is an eigenvector of adjacency matrix with the largest value of eigenvalue  .

Global Connection Efficiency Coefficient
Using centrality measures it is possible to optimize vaccination strategies. At first sight, it seems that the most central nodes should be considered as potential individuals to be vaccinated. However there is an issue connected with the fact that real-life networks' structures are not fully known.
To evaluate how well a G network is connected before and after the removal of a set of nodes we use the global connection efficiency ( GCE ) (Crucitti et. al. 2004). We assume that the connection efficiency between vertex i v and j v is inversely proportional to the shortest distance: The global connection efficiency is defined as the average connection efficiency over all pairs of nodes: Unlike the average path length, the global connection efficiency is a well-defined quantity as well as in the case of non-connected graphs.  (Random-Random Nodes), which means that the removal strategy is a two-step one.
Firstly, nodes are chosen randomly, and secondly, among all first-neighbours of these nodes random nodes are chosen again. This strategy is often called Vaccinate Thy Neighbor (Cohen et al. 2003;Kasprzyk 2009;Madar 2004). All other strategies are based on centrality measures. Using these strategies the nodes with the greatest value of the following measures are removed: dcdegree centrality, rcradius centrality, cccloseness centrality, bcbetweenness centrality, eceigenvector centrality.
The lower the value of the function the higher the effectiveness of the removal/vaccination strategy for a particular graph G .

A novel model of diffusion
All in all, who is connected to whom seems to be crucial for diffusion in networks, but all kind of phenomena have their unique properties. In consequence, we defined model of diffusion in network as a vector, with three element (Kasprzyk 2012a; Kasprzyk 2012b): 1,2,...,  Thus both concepts i.e. real-world networks topology and probabilistic state machine models are highly pertaining to the presented idea subject and objectives. The aim is to uncover the diffusion mechanisms hidden in the structure of networks.

Experimental environment
The program platform for the development experimental environment is Framework Gephi (Bastian 2009, Gephi 2020 Framework are not responsible for their proper functioning. Such approach is a tribute to the contemporary needs in terms of the necessity to quickly create software based on the existing components. However, on the other hand, the adopted solution ensures high quality of the software, at the same time guaranteed that its plugins may be used by already numerous Gephi users. The experimental environment was created as a set of original Gephi plugins, and its functionality was presented by the Use Cases. Fig. 11. A use case diagram: "Generation of synthetic networks (social network models)".     Free networks. We take into account three main strategies i.e. random, target and random-random. Fig. 15. The designation of vaccination strategies.
As we can see on the Figure 16-18, the random strategy is very ineffective. So it is obvious that we should use the target strategy, which is based on centrality measures. In this particular case we take advantage of the betweenness centrality which seems to be most effective. The problem with the target strategy is the fact that it is necessary to know the exact topology of networks. Our knowledge about most real networks is incomplete and uncertain; that's why the target strategy is very often unusable. Then, as experiments prove, the random-random strategy could be utilized, which is much more effective than the random one (Bartosiak at. al. 2013;Kasprzyk 2012a).

SIS model of a disease on networksresearch question no. 1
Let us now analyse a very simple case study of diffusion process from the field of epidemiology. One of the most extensively studied epidemic model is SIS (Susceptible-Infected-Susceptible). In each time step susceptible individuals are infected by each infected neighbours with probability beta and the recovering rate of infected individuals to susceptible ones is alfa. Parameter lambda is known in literature as speed of spreading or virulence of the disease and is define as: beta lambda alfa =  • the lowest value of the degree centrality; • the highest value of the degree centrality; • randomly.
Then simulation of an epidemic is started. This experiment has been repeated 100 times. The dynamics of an infectious disease in different networks is presented in Figure 20 (Bartosiak at. al 2013;Kasprzyk 2012a). We can see that the topology of the networks as well as the way in which the nodes are chosen to infect at the start time have a great impact on the dynamics of the infectious disease.

SIS model of a disease on networksresearch question no. 2
The central question then becomes: how might network topology affect the disease diffusion process with different virulence (lambda values). Our focus is still on the SIS model of a disease spreading in networks with different topology. We use three networks: Scale Free, Random Graph and Regular Graph that is exactly GRID-

System CARE
As practical utilization of our research system called CARE (Creative Application to Remedy Epidemics) was developed (Kasprzyk et al. 2011;Kasprzyk et. al. 2010;Kasprzyk at. al 2009  In the Disease Modelling module, using probabilistic finite-state machine approach, we can model any kind of disease based on knowledge from the field of epidemiology. We allow to build the models of diseases with any states and transitions in the editor we have proposed. We are able to define some essential parameters like: the transition probability between the states, the minimum/maximum time that an individual spends in each state, the maximum number of neighbours that can be infected by an individual in a given time period and much more. In Social Network Modelling module we can model and generate social networks using complex network theory.
Using proposed generators we obtain synthetic networks but with the same statistical properties as real-world social networks. The algorithms generate networks that are Regular Graphs, Random Graphs, Small World Networks, Scale Free Networks or modifications thereof. Using Simulation module we can visualize and simulate how the epidemic will spread in a given population. The system proposes two ways of information visualization. The first way is called "Layout" and helps user to manipulate networks and to set up some parameters of simulation.

Fig. 29.
Graphical User Interface of the Simulation module with "Layout" visualization.
The alternative way is "Geo-contextual" one which allows to visualize networks on the world map.  Based on the centrality measures Vaccination module helps the user to identify so called "super-spreaders" and to come up with the most efficient vaccination strategy. The identification and then vaccination or isolation of the most important individuals of a given network helps decision makers to reduce the consequence of epidemics or even stop them early in the game. We use a number of centrality measures to address the question "Who is the most important person in a given social network from the epidemic point of view?". We show how to discover the critical elements of any network, the so-called "super-spreaders" of a disease.
The crucial step in fighting against a disease is to get information about the social network subject to that disease. Questionnaires module helps building special polls based on sociological knowledge to help discover network topology. Polls designed in this way are deployed on mobile devices to gather data about social interaction.

Conclusion
In this paper we have presented some basic principles and ideas that can provide a deeper understanding of the diffusion processes on networks, in particular the dynamic of contagious diseases. We would like to admit that we are a little bit closer to understand diffusion in networks. As a relevant deliverables the novel model of diffusion on networks and the experimental environment, based on Gephi platform, was introduced. These tools can be used in a variety of applications: • diffusion of any phenomena on any networks; • identification of nodes in networks that are the most important from a different point of view; • estimating the effectiveness of removal strategies; • estimating the amount of resources to stop, slow down or per contra speed up any phenomena on networks.
The solutions presented in the paper have practical implementation as a system to fight with contagious diseases called CARE. It is worth to mentioned that CARE has its counterpart to fight with malwares in the Internet called VIRUS (Kasprzyk 2010).

Declarations
• Ethics approval consent to participate Not applicable • Consent for publication Not applicable • Availability of data and material The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

• Competing interests
The author(s) declare(s) that they have no competing interests