Implementing Integrity Assurance System for Big Data

The computation of big data and related services has been the topic of research and popular applications due to the rapid progress of big data technology and statistical data analysis solutions. There are several issues with data quality that contribute to error decisions in organizations and institutions. Current research just covers how to adequately validate data to assure its validity. Data integrity is synonymous with data validity. It is a difficult undertaking that is often performed by national statistics organizations and institutes. There is a significant need to provide a general system for validating the big data integrity. This approach has been dedicated to presenting a model for data integrity, particularly big data, and how to solve the validation process. The data also comprises the validity of the data fields, as well as the validity of measuring the data and determining compliance with the data cycle chain. For the integrity of large data, the processing speed and accuracy of the verification process are taken into account. The research was based on the Python programming language and real test data, and it was based on the use of the most recent technologies and programming languages.


Introduction
Big Data (BD) is an expression that describes the enormous size of digital data in terms of its speed of generation, its large size, and its various sources in terms of its organization, such as relational databases [2]. The conventional database system has some challenges and problems. One of these problems is the capture and storage of BD. It does not support the growing fields of data. It provides support for traditional data processing for governments or small organizations [1]. Nowadays, BD is the most frequently discussed topic in our time in data technology societies. It seems this topic has great popularity in the future, in the way this data is managed, processed and secured with some applications such as health care, statistics, education and so on. Organizations have become more open and flexible in receiving and generating n and modern [2].
The term BD is believed that this term is used for the first time by search companies on the web, which uses large and distributed data. As for the terms that denote the security and validity of data, they contain the following characteristics [3]: (a) Volume: Streaming data, data acquired from sensors, and other reasons all contribute to an increase in volume. (b) Variety: Nowadays, data arrives in a variety of formats, including emails, audio, video, and transactions. (c) Velocity: This refers to how quickly data is created and how quickly data must be processed to satisfy demand. Variability and Complexity are the other two aspects to consider for BD. (d) Variability: Data flows, like Velocity, can be very inconsistent, with periodic peaks. (g) Complexity: When data comes from various sources, the complexity of the data must also be addressed. Before processing, the data must be connected, matched, cleaned, and converted into the appropriate forms.
The BD can be divided into two main paths: the systems that provide real-time operation and the customer view that depend on interaction where data is collected and saved, and secondly, systems that perform data validation operations after saving [2].
The data is collected and processed, aggregated or integrated, prior being evaluated or interpreted. Hence, "big data is not just about being big," the amount of data was known as a significant obstacle for the domain of big data. In particular, once the large-quantity component was paired with higher speed and a wide range of data roots, it turned out more complex, as additional tools and countermeasures are required to extract a value of analysis from the data.
If the data volume was high, data processing turns more difficult, particularly at the data capturing and extractions level, when important and interested data were not integrated together. When going with the method stacks seen in Fig. 1, the data volumes turned out weaker. In particular, the following Fig. 1 illustrates schematically how the data were more important and once they were processed along the integration process; thus, the value Fig. 1 Process of Big Data [4] 1 3 of data was inversely proportion to the data length. The combination of stored data with incorrect, needless, and differing degrees of value data is indicated by volume. The data's size reduced from petabytes to just a few megabytes in the layer of integration, followed by a few kilobytes in the decision layer. Consequently, the difficulty of data collection is attributed to the hard analysis required to distinguish good value data from poor data, even when working with small volumes of data, things get simpler.
However, after the integration process, the data's importance and value are improved. Only specific and processed data has been preserved for use in decision-making process, whereas poor data has been discarded. This notion is invoked in a chaotic way of precision when looking at vastly more details.
The evolution of algorithms used and applied for big data in different domains was shown in Fig. 2. To analyze complex data and identify patterns, it is essential to securely store, manage and share large amounts of complex data. [3]. Using data in the BD domain involves masking heterogeneous data forms, structures, and systems, as well as varied inputs and outputs. The data relating to each source and schema must then be handled [4]. Because BD style analysis techniques help in detecting the threats in the early stages using more sophisticated pattern analysis and analyzing multiple data sources. The challenge of detecting and preventing advanced threats and malicious intruders is better solved by using BD style analysis. [3] Data validation is an essential factor in ensuring in almost any data and computation related context. It serves as one of the qualities of service and a necessary segment of data privacy and security. With the cloud computing proliferation and the progressing requirements in BD analytics, data validation verifications becomes very essential, specifically on outsourced data [5]. Types of Data validation the data validation has two types: physical completeness and logical consistency. Both procedures and approaches are compiled in hierarchical and relational databases to improve data accuracy.
Physical integrity the security of data validation and consistency when saved and recovered is physical integrity. Physical functionality is undermined as natural catastrophes occur, electricity is lost, or hackers interrupt network functions. An error, storage degradation, and various other problems impede data processing managers, device engineers, engineers, and internal auditors from correctly collecting results. Logical integrity and Logical consistency retain unchangeable records, as found in a relational database in numerous ways. Logical integrity, but somewhat different from physical integrity, protect data from human and hacker error. There are four logical truth types. Entity integrity attribute consistency is dependent on the existence of primary keys or special data values to ensure the data is not reported more than once and no field in a table is empty. This is a function of connections that saves data in tables that can be interconnected and used in various ways. Referential integrity refers to the set of processes which ensure consistent storing and use of data. Laws built into the database's layout regarding how international keys are used ensure that only acceptable improvements, insertion or deletions are made. Laws which include requirements that prohibit data entry, ensure the data is correct and/or avoid data entry that may not occur. Domain integrity is a method collection that guarantees the consistency of each data element in a domain. A domain is an appropriate set of values which may be found in a field. It can include restrictions and other steps that limit the data entered by size, sort and number. User-defined integrity includes the rules and requirements that the user establishes to satisfy his individual needs. Often the data is not safeguarded by person, referential and domain integrity. Unique compliance principles also have to be taken into consideration and implemented into data validation behaviour. Data validation risks. There are a variety of considerations that which influence the quality of the data contained in the database. Such points are: • Human error: When people enter information inappropriately, duplicate or remove data, do not follow the required protocol, or make errors during the execution of protocols designed to safeguard information, data privacy is at risk. • Transfer errors: If the data cannot be easily transferred from one location in the database to another, a replication error has occurred. Transfer errors happen in the case of a portion of the data in the reference table, but not in the link database source table. • Bugs and viruses: Spyware, spam, and viruses are software packages that can enter your computer and change, erase, or steal your files. • Compromised hardware: Sudden program or server crashes, and problems with how a program or other system operates, are examples of serious faults and suggest that the hardware is in trouble. Compromised hardware can improperly or incompletely render data, limit or erase access to data, or make information difficult for us.
Data privacy threats can effectively be reduced or removed by doing the following: • Restricting access to data and modifying permissions to prevent modifications to information from unauthorized parties; • Validating the data to make sure it is right both as it is collected and used • Recovery of records • Use logging to keep track of what data is inserted, updated or deleted; • Undertaking routine organizational audits; • Use error monitoring tools.
According to [4], the problems with BD was not the range but the control loss over data source. Consequently, the unknown nature of the data (data source) is causing accuracy issues. In this context, it is evident that the responsibility for integrity starts when the data begin to appear. The first input for data validation is the data collection or capture point. Indeed, the data to produce actual meaning for the judgment is important to be accurate, valid, genuine, and not changed or adjusted incorrectly. The importance of the data refers to the precision and validity of the data. As a consequence, their dignity must be protected from source to destinations. Thus, with big data metrics such as velocity that the data arriving rate and time to respond, and also the accumulation speed and the high computing power required to make it usable and hold it up-to-date, integrity may be jeopardized, in particular, by the need for real-time or near-time decision-making.
Consequently, this research on Big Data Integrity focuses on two main objectives: 1. The core problems of the credibility of big data. 2. The creation of a new paradigm to protect the credibility of big data.

Related Work
Initially and through researching such topic, little research was found that addresses the issue of Data Integrity as the main theme of the research. Hence, most of the reviewed research included in this chapter presents the issue as part of a wider set of challenges in the domain of big data.
Since 'machine learning' is jargon at present, also referred to it as 'statistical learning,' for predictive analysis [5], which is the set of traditional and current regression and classification techniques. Moreover, the modelling of the machine is not entirely new: algorithms such as the linear differentiation analysis by Fischer dates back to 1936; other linear models were developed and called linear generalized models in the 1970s, and in the 1980s various non-linear algorithms such as classification and regression tree were developed, often linked to the computing power [5] Additional computation, memory, available records, and significant science and free software advances have provided very fertile soil for advancement in many fields-including official statistics -over the past two decades. There are three specific forms of machine learning algorithms: supervised learning, unsupervised learning and semi-supervised learning [19]. For forecasting data, unsupervised algorithms are not used because the response part does not exist. A regression to predict quantitative data and classification to escape categorical data is the third form [20]. It is commonly used to analyze details and to detect patterns in data. The National Statistical Institutes (NSI) performs DV in the Data Validation (DV) to assure the reliability of the findings supplied. Data is provided to the service providers for clarity or suggestions.
Specifically, where two-way contact with the data providers is practicable and appropriate, these data are not gathered for statistical purposes; it is of particular significance. Until now, these DVs have mostly been conducted in two ways: by eyeballing the data collected manually or through logical checks automatically. In some cases, automation is sufficiently sophisticated (not achieved by machine learning) to encourage data providers to rectify all the apparent errors and commit to them in questionable circumstances. Thus, the data accessing the NSI is thus of better quality, analyzed more rapidly and therefore, more resource-efficient. Comparable DV in data fields were predicted-so that the existing automated systems could not be replaced but complemented.
Researchers like discussed the challenges imposed by data fields on the modern and future Scientific Data Infrastructure (SDI). They looked at the different scientific communities that define requirements on data management, access control and security. They proposed a generic model and architectural solution that constitutes a new Scientifical Data Lifecycles Management (SDLM) method and the general SDI architectural model which provided a base for heterogeneous SDI component interoperability and integrations based on cloud infrastructures technologies [6]. However, it can only be considered more of a survey of the challenges for such a topic. Their work didn't include a specific proposed implementation for Integrity Assurance in Data fields [6]. Among all the parameters [5], efficiency and security were two of the most concerning measurements. They provided: • An analysis on authenticator-based efficient data validation verification • Analyzed and provided a survey on the main aspects of this research problem, • Summarized the researched motivations, methodologies as well as main achievements of several of the representative approaches • Proposed forth a model for possible future developments.
A key factor that distinguishes "Data fields" from "lots of data" lies in changes to the traditional, well-established "control zones" that facilitated clear provenance of scientific data, thereby ensuring data validation and providing the foundation for credible science [7]. To ensure data integrity, proposed a dynamic data update scheme and optimized public auditing [8]. It involves three phases: Step: This phase consists of key generation, file pre-processing that results in block-metadata (HLAs) and mCAT generation for the file, and allowing a third-party auditor. • Dynamic Data Update Phase: During this phase, the client uses mCAT to execute block-level and fine-grained changes on its data stored in the cloud. Following that, it computes a new HLA for the changed block and saves it on the TPA's site. • Third-Party Auditing Step: During this phase, an approved third-party auditor (TPA) sends the CSS a challenge-request. The CSS returns to the TPA an integrity-proof matching to the collection of challenging blocks. TPA then validates the integrity of the difficult set of blocks.
Others such as [9] proposed a blockchain-based framework for Data validation Service. It was claimed that more reliable data validation verification could be provided for both the Data Owners and the Data Consumers. Some researchers such as [9] presented an approach for strengthening Big Data Analytics Services (BDAS) security by modifying the widespread Spark infrastructure to monitor the integrity of data manipulated at run-time. It can be ensured that the results obtained by the complex and resource-intensive computations performed on the Cloud are based on correct data and not data that have been tampered with or modified through faults in one of the many and complex subsystems of the overall system. However, this work needs improvement in different aspects, such as performance speed by utilizing parallel processing. The other part is to make fine-tuning of the system easier. Others such as [10] looked at ways to enhance data fields auditing using Remote Data Auditing schemes (RDA), the core schemes are Provable Data Possession (PDP) and Proof of Retrievability (POR). Thid work only looked at such auditing tools for data fields to decide on the optimal tool and not proposed a specific implementation.
Other researchers such as [10] looked at the complex key management issue in cloud data validation checking by introducing fuzzy identity-based auditing under the same subject of data fields auditing. It was claimed that it is the first in such an approach. The primitive of fuzzy identity-based data auditing was presented, where a user's identity can be viewed as a set of descriptive attributes. The system model was formalized and the security model for this new primitive. Then presented a concrete construction of fuzzy identitybased auditing protocol by utilizing biometrics as the fuzzy identity. Such a proposal is only considered an auditing solution for data fields' validation and may not be suitable for real-time data fields' validation requirements. The algorithms are very complex, and they all function very differently. A useful categorization is given in three subtypes for both regressions and classification: linear model, non-linear model, and tree-specific [5].
Well-known algorithms such as linear and logistic regressions were introduced to explain the fundamental underlying ideas [1]. After all, something similar to all controlled algorithms indicates that a dependent variable may be predicted with such predictive variables. How this prediction is carried out is the cornerstone that separates these algorithms [12]. Analogies may also help to understand some of these algorithms: for example, the method of breaking the forecast groups into a particular boundary (for the assisted vector machine) or drawing logical trees (if the age is below X, then Y) [13]. Cerberus provides efficient methods for data validation, however quick and fast. It is intended to be generalized easily and can be checked for users [14]. Python's validation series, in other words. The study chose Cerberus as an example because the validation schema of language agnostics (JSON) would function well with different languages, rendering it more flexible with various workflows. The basic Cerberus workflow is to verify the details via Cerberus, only need to define the guidelines [15]. This can be done in the Python dictionary by an entity named Schema. To this end, to cover all facets of data validation, and compared other modules and proposed hybrid models.

Methodology
Statistical services conduct DV to verify administrative data and survey data accuracy and reliability. Data suspected to be inaccurate was returned to service vendors with a warrant for clarification. Until now, such DVs were primarily conducted at two different levels: either by manual checks or automatic processes using threshold values and logical tests. This two-way "plausibility tests" method requires more effort. In certain instances, workers are forced to recheck the data manually; some rules also require extra verification. This rule-based approach emerged from prior practice but is not inherently exhaustive and yet correct. Machine learning can allow quicker and more precise tests [16]. In the area of data fields, data validation concerns are added. Improvements in the field that have the meaning specified for them would be required such as providing a form validation (or also for a subset of that) that may have a better justification for their value, a better reason.
not to look at something else they had or a better reason to store those values for them [17]. There might be instances when, as with other circumstances, they do not lead to the data, which is often the case for most memory operations, so where there is no data in a database, it may not display the issue [18]. This contributes to a transmission error in some instances.
The other thing is that a bitfield includes the values that are processed (small bits). There is a lot of memory capacity [17]. Field Store: is very similar to the memory representation in x. It may be a bitfield, but it may comprise a range of components (or even, a few of them). This ensures that a bitfield may be encoded into a bitfield from a space (in a single bit) of bits. So that a bit-field can be encoded from a bit, such that this bitfield encoding is not maintained when the bits are exchanged or whether a file or system is shared, or whether it can be encoded (because for example, bits can be encoded together in a shared bit, so a shared bit can be encoded together) [26]. It also guarantees that a memory is not at the bottom of the memory map. It is a bit-field that was not attached to the original encoding (such as a bit with an even binary, or even a bit with another binary).
The storage of this bit-field defines a collection of bits. For instance, when a bit is stored in one bit, and a byte is in the other, and it is interpreted as a bit of the other bit itself for the duration of a bit, an encoding files or file of its original shape [21]. In a bitfield without storage, the main concept here is for a "small bit": the bits have their values at the root, and are simply encoded as a bit of byte or bits until they are even taken from the bit of byte. In a bitfield, this is interpreted as a bit of bytes: if the bit-field that comes from the encoding file bits of the initial encoding file, the bits will be contained in one bit of a file, and then the bits that are encoded will be the second bit of some type [22]. There can be a number of bits or even bits in a single bit because only fewer bits than one in a tiny bit can be obtained. But fewer bits and fewer bits can be a little like the length of a second or the equivalent amounts of bits [23]. These bits have a unique byte composition, and if those two were mixed, can have more bits just about anywhere than several other bits.
Let's think about this as a situation where we are writing something close to the case of a bit-of-string (i.e. the bit that comes from several bytes and characters can be encoded as a string) of the (1-sender) or the string before talking about how tiny bits are (i.e. a string is a string that comes out of a string, which means the string is embedded into the string) [24]. If the array is a string, all the bits in that string and the components in that string are still included. Although instead of the string's initial set, it only consists of the pieces in it even if the string is a part in another file, it does not include all of the components in it then. That way, in a bit sector, all the bits in the string are processed. For instance, to make a little bit of representation of the bits we are making. Before we start describing this, let's begin with an enum. It is a bit type (or the name of the file that we are making) that combines a number of bits of a string into its single element: that is the binary bit. If a bit is a binary, we call it X-X, and then we call it C-C-D. If we do so the string X and A are both of those bits and the binary bit C is an element of that string. In this case, the binary bits are the bits in that string to be encoded in a bit-there is a bit minus a number. The project's methodology will be composed of three main phases: • System modelling and design • Data validation • Implementation, testing and results.

System Modelling and Design
The proposed Integrity Assurance System will be based upon the model proposed by [4] in Fig. 3. The proposed mode is developed on the following processes and analysis.

Validation and Filtering of Input
A system for input validation and filtering was proposed, which verifies the reliability of data provenances and source. For this, the sub score was assigned to every data provenance (the data input source and the object they include were known and also known which object each data belongs to). Following that, the score known as input data (sources, t capture ) was computed and delivered, which indicates the level of data integrity.
Then, based on the outcome of input data (threshold for integrity fulfilment was specified), this model would either provide the captured/entered/extracted data to the following systems or suggest it to avoid them (if the integrity level was poor). The resulting input integrity rate was saved in an individual database for future integrity controls. The filter of excellent data must be used in this stage for preventing poor data from proceeding to the next stage of processing and thereby reducing the volume of data. As a result, only data with a high level of integrity and an integrity provenance are allowed. As a consequence, decisions based on the data filtered produced were stable and precise. However, protecting integrity is more than merely validating data input or  Fig. 3 The validation process of the model [4] performing initial integrity filtering. However, the following data analysis must also ensure data integrity.

Monitoring Continuous Integrity
As described in the literature works, integrity was violated when the data processing sequencing was not followed or when some analyses were not performed properly or completely. Further, even when sequencing of process was followed, integrity was at risk when intruders are involved in data processing. Monitoring user behaviour is a difficult task given the dimension of big data, particularly the amount and data velocity. To overcome this difficulty, for examining every request that pertains to every data movement was proposed. As a result, a composite measure called data movement monitoring was created, which returns two numerical value: data processing cycle integrity (CI) and data cycle sequencing respect (SR). Cycle Integrity (CI) in data processing consists in answering the question: who (users) does what (activities) with the data (recorded data) in a certain time period (t). As a result, CI (users, activities, data, t start ) was calculated as follows: Through "U" indicating the interested users, "O" the relevant objects, and "A" the actions performed on the objects by the users. Then, as indicated in algorithm 2, a function was described as Privacy (my U , my O , my A ) = f O,A (U) = score. The value received from algorithm 2 was utilized to calculate the total value of all requests processed during the data cycle procedure.

Functions Privacy (my U , my O , my A ) Int n Int Privacy UAO // privacy of user doing an action on object Begin 1. Check if my U , my O , my A have been participated in the current processing cycles 2. Choose privacy percentage into n from table privacy observations when user = my U and Object = my O and action=my A Return n End
As a result, a sense of how the data was leveraged throughout the cycle process, as well as the quality of them across who manipulates them in which way (allowed or forbidden, complete authorization or just partial) was analyzed. Still, the data supplied in the right sequence throughout the whole processing cycle should be controlled. This was the final control this model recommends before integrating data into the analysis schemes.

Sequencing Respect of Data Cycle
Here, SR (sequencing respect) was proposed, which is an indicator that returned the percentages of data analyses that were effectively completed prior to integrating the analysis models, as well as a level of order respect (to check if the step n was completed before the step n + 1 and after the step n-1).

Functions CI (users, actions, data, t start ) Int s Begin Select avgerage Privacy (my U , my O , my A ) into s from myTable control where start time = t start ; Return s; End
The measure was determined from the time data was collected until the time analyses were submitted. The model was compared to a control cycle (that stored in the system). If the analysis was performed in the right order and after the preceding one has been completed completely and appropriately. The result was the percentages that indicates whether or not the reference was followed and to what extent. Again, this model illustrated the need of adhering to the specified process stages.

Data Validation
In the second phase of the project, an input system gives an automated description to service fields. This is important, as high-prediction efficiency and interpretability cannot be associated with the same module. Thus, though a good forecast was accomplished in the first part, the same module cannot serve for clarification. But in another module, a feedback system was created, a local clarification, to reveal the black-box of the service fields. The project will be utilizing data fields' tools and mainly Python language to model, handle and access the data required by the designed system. Many important factors justify the choice of Python for the project. Python is ideal for Data fields projects because it offers excellent Data Visualization, unlimited Data Processing, Scalability, Flexibility, Ease of learning, High compatibility with distributed frameworks like Hadoop and most importantly the support for the many powerful scientific and machine learning library packages.

Implementation, Testing and Results
Data validation is meant to guarantee a certain degree of final data consistency. However, in official data, consistency has multiple dimensions: importance, precision, timeliness and timeliness, usability and clarification, comparability, coherence, completeness. Therefore, it is essential to decide which component data validity issues. DV relies on consistency measurements related to 'data structure, i.e., precision, comparability, coherence. DV does not depend explicitly on consistency issues from suggested manual procedures (e.g., device fields). It is worth seeing in-depth to what degree DV applies to various consistency measurements. This project process would concentrate on real device coding and use the existing code model. Step 3 will include all software distribution system facets of code analysis, source coding, validation, and manual verification.
These steps occur in order and are as follows: input source, input validation, approved protocol, the method is extended at all stages, intermediate data is distorted and accessed by designated users and programs, data is not valid/trustworthy and accessed by Yes (Utilize and keep for later usage) [25][26][27][28]. At the beginning of implementation, the first step is to enter data, and the entered data is either data that has been entered from fields designated by the user in real-time, or data previously saved, after that, the entered data is in conformity with the general field rules was ensured. If it is found that it does not conform to the rules, it is rejected directly. In the next stage, conditions for all fields are set, for example, if a field is designated for a date, then the conditions are shown in Fig. 4.
For other types of data or the types of fields that will be entered by the user, Fig. 4 shows an example of how to set age rules for users or for data that is previously saved.
In Fig. 5 that several rules have been set up, and the entry must be numbered, and the numbers must not exceed three numbers, and in this way, the entries for each field will be verified according to the rules assigned to this field Fig. 6.

Results
The methodology was tested on several files, the file size ranges from about 100 thousand rows and about 50 columns, based on World Health Organization data, as it is the most recent and accurate data at our time, the methodology can handle user inputs and enforce   Case study 1 C ase study 2 C ase study 3 Integrity rules on them in real-time, but the closest thing to the term big data is that the big data as shown in the following table was used Fig. 7 Table 1.
As shown in Table 2, the total number of columns tested is 93, and the number of rows is 1,497,055. When testing the first case, the methodology achieved integrity by up to 92% and was able to filter the file and exclude all noise. Noise is meant incorrectly entered data and incomplete and inaccurate data.
When testing the second case, the methodology achieved integrity by up to 96%. It was able to purify the file and exclude all noise, and the second case was large, unstructured and corrupted data. All unstructured and incorrect data were revealed. When testing the third case, the methodology was able to achieve integrity by up to 95%. The sample contained statistics from the World Health Organization, the real and most accurate data. Table 3 shows the proposed model was compared with other existing models like Hadoop and MapReduce models for validation based on data integrity results obtained using WHO data. Integrity was achieved by a rate that exceeds the previous studies. The sample used is superior to the sample used in previous studies, so our methodology seems to be working on integrity in a meaningful way. The methodology was tested on a system running Windows 10 64-bit Pro with 6 GB random access memory and reached the core i5 CPU 2.50 GHZ.

Conclusion and Future Work
This paper discusses problems relating to data validation in the context of data fields and illustrates some aspects of data fields dealing with data validation. Two issues were discussed, and a new approach was developed to handle and protect the quality of data under these data fields' issues. Three metrics draw on the model-based problems of data fields' validation: Validity of measuring data, assessed confidence in the system creation period for the requests interfering, and assessing conformity with data cycle series. The application of this model through the data fields V-dimensions evolutions also provides a different view of the current study. The opportunity for unsupervised learning, which requires only input data, should be discussed in the future. More precisely, the impact of crossvalidation on unsupervised learning would need to be investigated to recognize the possibilities of the validation mentioned above process. The train and test validation approach's influence using the ordered timeline data is a potential extension of the research. It is also intended to analyze the effect of random order training validation data, which may interrupt the chronology, but could reveal patterns and regularities that are not possible to detect  with time-ordered data. By using random order, the ability to eliminate time dependency decreases the influence of the sporting aspect. It increases the effect of identifying unique similarities or regularities within the data gathered. Data validation and processing are important fields relevant to procedures that include a human aspect and involve exploring an appropriate validation method.
Authors' Contributions Fawaz and Saad Almutairi are contributed equally.

Conflicts of interest
The authors declare no conflict of interest, financial or otherwise.

Data Availability
The authors confirm that the data supporting the findings of this research are available within the article.
Code availability Custom code.

Human and Animal Rights
No animals/humans were used for studies that are basis of this research.