Three fundamental themes run through the related work section. The first subsection discusses the Data Middle Platform's security and privacy problems. The blockchain's contribution to authentication will then be summarized in the following subsection. The contribution contributed to Blockchain technology and Zero Knowledge Proof will also be mentioned in the last subsection.
2.1 Security and Privacy Challenges of Data Middle Platform
In fact, as the Data Middle Platform come out not so long, there is not much discussion on the Data Middle Platform security and privacy preservation. Researchers have only been able to summarise the challenges of Data Middle Platform security and privacy preservation from a few academic writing and technical blogs of Data Middle Platform implementers. Specifically, based on the data lifecycle, the challenges can be classified into five areas which are showing in Fig. 3 [32–34].
2.1.1 Data Collection
In data collection phrase of Data Middle Platform, there are three securities and privacy issues need to be confirmed. First one is data protection measures do not match sensitivity level. The second one is the weakness for unified access control mechanism. The third one is risk of operation / maintenance personnel dragging and bumping into warehouse.
Data protection measures do not match sensitivity level: Data classification is not only the basis of data governance, but also the premise of data security. China's relevant laws and regulations also require different security measures and means for data of different security levels. For example, if the data in the government big data platform is not classified, the security protection will be carried out according to the default sensitive level of government network data, resulting in the following problems: first, more human, material and financial resources need to be invested in the security protection work; second, the data whose sensitive level is lower than the default sensitive level in the unclassified data will be overprotected, resulting in security problems In addition, the data protected by the default level will also contain a large number of sensitive data higher than the default level, resulting in the risk of insufficient protection of highly sensitive data. For enterprises, there are the same problems.
The weakness for unified access control mechanism
Big data resource pool includes relational database, massively parallel data warehouse and distributed big data. Each kind of database has its own account and permission system. If we want to apply the global security policy, we need to set each kind of database separately, which will inevitably lead to the problem of doubling the workload. Each type of database has inconsistent support for the granularity and standard of access control. Suppose that field level granularity access control is needed at present, while some components can only control table level granularity access. Therefore, the difference in access control ability will inevitably lead to the difficulty of making global security policy. "Workload increased" and "global policy making difficulty" become the short board of "unified access control mechanism".
Risk of operation / maintenance personnel dragging and bumping into warehouse
Operation and maintenance personnel use the database account for operation and maintenance management, and the authority on the account may exceed the standard required by the actual operation and maintenance management. If the lack of access behavior controls management, operation and maintenance personnel can carry out operations unrelated to operation and maintenance tasks on the technical level. Driven by curiosity or some interests, the operation and maintenance personnel can completely perform the operation of dragging and bumping the database to obtain the data they are interested in, resulting in a large number of data leakages.
2.1.2 Data Storage
There is only one significant challenge to data storage stage. That is the authority out of control. Risk of authority out of control: It is well known that every database in the big data resource pool has a super administrator account. The super user is the default user of the database creation process, and can be considered as the "creator" in the database. They are like the root user of UNIX system or the administrator user of windows, and have supreme power. Of course, for the sake of security, the password of this super user is generally controlled by a few people, but the excessive concentration of power will also bring problems. When the super user has the highest power, it means that he can do anything he wants without leaving any trace. Therefore, there is also the risk of authority out of control in the big data resource pool. In addition, in the data storage link, there are also the short board of unified access control mechanism and the risk of operation maintenance personnel dragging and bumping the database, which will not be repeated here.
2.1.3 Data Processing
Data processing is a critical stage. The data needs to be pulled out and processed and therefore raises security and privacy issues. Specifically, weaknesses include four parts: malicious operations in processing, lack of automatic approval mechanisms in command, uncontrolled management processes, and risk of privacy breaches.
Malicious operation in processing
Data governance behavior is implemented by executing governance scripts, which contain various data source operation commands. If the operation commands are not effectively controlled, malicious commands may appear. Once the malicious command is executed, the data source will be damaged and lead to other unpredictable security problems.
Lack of command automatic approval mechanism
Although adopting the approval method can govern the risk of the script itself, the approval process should not only judge the risk of the script itself, but also judge whether the database table affected by the script is consistent with the application information, so as to ensure the correctness of the governance script business. Therefore, for different target objects, governance scripts need different approval strategies. If the manual approval method is adopted, before the approval is completed, if the execution of the script that has not been approved cannot be prevented, there is the risk of unauthorized execution. In addition, manual approval is highly dependent on personal experience, which may lead to misjudgment of scripts containing a large number of operation commands. Therefore, there is still a risk that high-risk scripts will be approved. On the other hand, the manual approval mechanism cannot guarantee enough timeliness, which is a short board for the overall efficiency of the data center.
Management process out of control
Data governance is accomplished by executing governance scripts. When the governance script is executed through the governance platform, the execution process and results will be controlled by the governance platform. In addition, the governance script can also be executed manually. The process and results of manual execution cannot be synchronized to the governance platform, and the governance platform cannot be controlled, resulting in the data governance process out of control. Executing the governance script will directly operate the original library. If the governance script out of control is executed, the data may be destroyed.
Risk of privacy leakage
Abnormal data governance behavior (such as illegal execution of data query scripts) can lead to privacy leakage. If there is no analysis and audit mean, when abnormal behavior occurs, it cannot timely alarm, and it cannot trace and obtain evidence after abnormal behavior occurs.
2.1.4 Data Exchange and Sharing
The stage of data interchange and sharing in the Data Middle Platform encounters similar security and privacy challenges as that of the processing stage. As such, data exchange and sharing entail several security and privacy risks. In particular, these vulnerabilities include the risk of data leakage, unauthorized access to the application server, untraceable data leakage, and illicit use of data.
Data leakage risk
The data demander will request batch data from the data center for analysis. When the data center provides batch data, because it does not understand the business analysis indicators, it will provide excessive data. For example, an app needs to analyze the monthly number of marriages in this year. The app will apply for all the married population information in this year, including name, address, contact information, etc. In fact, the statistics of monthly quantity only need accurate time information, and there is no need to provide sensitive information such as name, address and contact information. When the data center does not understand the analysis content, it will provide the complete population information of this year's marriage, which leads to the leakage of sensitive information.
Risk of illegal application server access
Data resource management platform provides data services to applications through interface. If the interface lacks the trusted authentication mechanism of application server, the legitimacy of application server cannot be distinguished, and there is the risk of illegal application server access. After the illegal application server access, it can directly access all the interfaces provided by the data resource platform, which has a great security risk.
Data leakage is not traceable
Sometimes the data service platform needs to provide batch data to the data demander, and the same data may also be provided to multiple data demanders at the same time. Suppose there is a data leakage event, and the leaked data is provided to multiple demanders by the data service platform, the leakage channel may be one or more demanders, or the data middle station itself, so it is difficult to accurately determine the leakage path, and there is a risk that the data leakage cannot be traced, so it is difficult to be responsible.
Risk of illegal use of data
Data publishing outside the normal process belongs to abnormal behavior. For example, the data requested by the data demander beyond its actual needs is an abnormal behavior, and the unauthorized use of published data is also an abnormal behavior. The above abnormal use of published data will lead to the risk of illegal use of data. If there is no abnormal behavior monitoring means, when abnormal behavior occurs, it can't timely alarm, and it can't trace and collect evidence after abnormal behavior occurs.
2.1.5 General Security and Privacy risk
Except those four stages, there are another three challenges in the Data Middle Platform.
Lack of data access analysis means
The optimization of security policy depends on the mastery of the current data security state. The more we know about the current data security status, the more support we can provide for the subsequent security policy optimization. Data generally goes through the process of collection, processing, storage, exchange and sharing, use and destruction. Without the linkage analysis of the process, it is difficult to form the overall situation of data flow, master the current data security status as a whole, and provide effective support for the subsequent data security decision-making.
Lack of security event exception warning mechanism
When a security incident occurs, timely alarm feedback is needed to enter the emergency response process. If there is no security event exception alarm mechanism, it is impossible to take timely measures when security events occur. The later a security incident is discovered, the greater the loss will be.
Risk of sensitive information leakage
After the script development of the collection platform and governance platform is completed, it is necessary to test the effectiveness and stability of the script. Developers and testers must have database operation permission to conduct script testing, which will involve sensitive information, so there is a risk of sensitive information leakage in this link. Other business platforms will also carry out customized development in the future. The process is similar to the collection platform and governance platform. It is possible that sensitive data may be leaked throughout the course of developing and testing. To summarize, the Data Middle Platform is vulnerable to a variety of security threats because of the data's life cycle.
2.2 The Related Work about Blockchain using in Authentication
Abbas's research group [35] is looking at how blockchain technology might be used to the Internet of Vehicles and Vehicle-to-Everything Networks (IoV-VANETs), with a particular emphasis on the authentication subsystem. A method to privacy protection for a credit inquiry system is presented by Yuan et al. [36] using the blockchain technology. Users of credit-investigation systems, credit-investigation agencies, and cloud service providers may all benefit from the secure exchange of credit-investigation data thanks to this method. There are still certain flaws in the system, such as frequent interactions or an inability to be linked together, despite the fact that numerous conditional authentication techniques that preserve users' anonymity have been presented to make communication secure. As a result of this, the study presents a unique hierarchical authentication method that is assisted by blockchain technology in order to overcome these existing problems in a comprehensive manner [37]. Because of the limited resources available, traditional security and privacy architectures cannot be applied to the realm of the internet of things (IoT). In order to solve this issue, a security mechanism [38] that is based on blockchain technology has been implemented. This mechanism permits secure permitted access to smart city resources. In blockchain application situations, the identity authentication approach resolves the seeming contradiction between the need for anonymity and the need for traceability [39]. The fundamental purpose of Lee's work [40] is to design a vaccination passport (VP) validation system that is based on a wide blockchain architecture and is intended for utilization in a simulated international scenario. [41] Propose a secure authentication approach for users that utilizes the advantages and characteristics provided by blockchain technology and smart contracts. Several assaults, such as linkability attacks, DoS attacks, and DDoS attacks, have been discovered to be possible against the 5G-AKA technique. Therefore, a safer authentication method is needed to address these security concerns. To address these security challenges and boost the overall dependability of the 5G network, Chow et al. [42] present the Secure Blockchain-based Authentication and Key Agreement for 5G Networks (5GSBA). [43] Bathalapalli et al. have developed a hardware-assisted blockchain system that addresses both device and data security concerns in smart healthcare applications. This innovative blockchain framework is specifically designed for smart healthcare and has demonstrated superior security and functionality capabilities compared to other available solutions. A detailed comparison study has shown that this framework offers lower transaction and compute overhead, making it a promising solution for secure and efficient smart healthcare applications [44].