TFPA: A traceable federated privacy aggregation protocol

Federated learning is gaining significant interests as it enables model training over a large volume of data that is distributedly stored over many users. However, Malicious or dishonest aggregator still possible to infer sensitive information and even restore private data from local model updates even destroy the process of training. To solve the problem, researchers have proposed many excellent methods based on privacy protection technologies, such as secure multiparty computation (MPC), homomorphic encryption (HE), and differential privacy. But these methods don’t only ignore users’ address and identity privacy, but also include nothing about a feasible scheme to trace malicious users and malicious gradients. In this paper, we propose a general decentralized byzantine-fault-tolerant federated learning protocol, named TFPA, which can integrate multiple learning algorithms. This protocol can not only ensure the accuracy of aggregation under the adversary setting of 4f+1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4f+1$$\end{document}, but also provide user address privacy and identity privacy assurance. In addition, we also provide a heuristic malicious gradient discovery and tracking scheme to help participants better resist malicious gradients and ensure the fairness of aggregation to a certain extent. We evaluate our framework on Linear Regression, Logistic Regression, SVM, MLP and RNN, and attain good results both in accuracy and performance. Last but not least, we also simply prove the correction and security of TFPA.


Introduction
Federated learning (FL) [1,2] is a framework allowing multiple participants to train a global model cooperatively only by sharing their local model or local gradients instead of local dataset.Compared with traditional centralized machine learning, federated learning can protect participants' privacy to a certain extent.
However, the security and effectiveness of federated learning still depends on the honesty and credibility of participants and aggregator.On the one hand, if the aggregator is malicious or dishonest, it can attempt to take advantage of collected gradients to infer sensitive information, even local private data [17,18].On the other hand, participants also can attempt to implement poisoning attack [15,16] by adding noise to local gradient or private data, so as to reduce the accuracy of the global model.Therefore, it is necessary and essential to achieve byzantine-fault-tolerant aggregation as well as confidential aggregation to guarantee the security of federated learning.
Differential privacy [19] is recognized as the first effective scheme to achieve confidential aggregation.However, the accuracy loss makes it difficult to be widely spreaded.To guarantee the accuracy, secure multi-party computing is widely used in federated learning to construct confidential aggregation schemes [20].These schemes can be classified into three categories according to the cryptographic technology used, including secret sharing [8,[10][11][12], additive masking [4,6,21] and homomorphic encryption [3,5,9,13].But, these schemes are inefficient due to complex encryption and data transmission.To find the balance between efficiency, accuracy and security, some researchers attempt to use hybrid schemes, including combining differential privacy and SMC protocol [28,29] or combining fusing homomorphic encryption and secret sharing [7,14].
But, most schemes focus on achieving confidential aggregation to protect the privacy of participants while a few schemes pay attention to Byzantine-fault-tolerant aggregation [13,30].Besides, almost aggregation schemes are based on centralized [1][2][3][4][5][6][7][8][9][12][13][14] or bicentric network settings [10,11] while few schemes consider decentralized network settings.Last but not least, almost all schemes ignore the address and identity privacy as well as considering nothing about how to trace malicious gradients.In this paper, we design TPFA, a robust confidential as well as Byzantine-fault-tolerant aggregation protocol, to protect participants' privacy based on decentralized network settings.The protocol can adapt to the malicious setting of4 f +1.In our scheme, each participant will anonymously participate in confidential aggregation.Our main contributions are as follows.
-We firstly design an address protection and identity concealment protocol for each client based on traceable ring signature and one way proxy re-encryption to protect participants' address as well as identity privacy.The protocol can integrate multiple learning algorithms.-We propose a universal byzantine-fault-tolerant privacy protection aggregation protocol based on secret sharing.Besides, we improve the federated averaging algorithm based on the distance correlation coefficient to ensure the effectiveness of aggregation.We prove the astringency of algorithm.-We first propose a heuristic malicious gradient discovery and tracking scheme to help participants better resist malicious gradients and ensure the fairness of aggregation to a certain extent.

Federated learning
Federated learning is a distributed machine learning framework including n clients C = (C 1 , C 2 , ..., C i ..., C n ) that cannot share data and an honest gradient aggregation server S, where i represents the index of clients.All participants and the server collaborate to train a unified model via stochastic gradient descent(SGD) method [22].Figure 1 shows the complete process of federated learning.
As shown in the figure, each client will firstly download the model from the server and initialize it.Then, each client use the following formula to train the local model and calculate the local gradient G i based on private datasets D i , where f(*) is the loss function and θ i represents model parameters of client i.Note that these datasets have the same type.
Next, the clients will upload their local gradients to the server.On the other hand, the server will wait until the gradients of all clients are gathered and then aggregate them as weighted mean , where β i represents the weight given by the server to client i.In the gradient update phase, the clients will download the aggregated gradients and update the local model use the following formula for the next training epoch, where α i represents model learning rate of client i.

Threshold secret sharing
Shamir's Secret Sharing(s, n, t) [23] is a cryptographic primitive.It provides a reversible method of splitting a secret s into n shares for a client.Then the client can't recover the where t ≤ n. -RS( * ) takes the threshold t , n shares as inputs, outputs the secret s .

RS({s
where t ≤ n .

Traceable ring signature
Ring signature is a cryptographic primitive that allows a signer to conceal n members in the ring and generate an anonymous signature.Traceable ring signature is an improved scheme of ring signature allowing ring members to trace the signature through labels.Traceable ring signature scheme holds three properties:tag-linkability, anonymity and exculpability.A traceable ring signature [25] scheme is defined by a tuple of algorithms(K eyGen, Sign, V er, T race)where: ⇒ sg is a deterministic algorithm takes a tag L = (issue, pk), two pairs of messages and corresponding signatures (m i , δ i m ) and (m j , δ j m ) as input and outputs a string sg that is either equal to indep, linked or to an element pk ∈ pk.where

Undirectional proxy re-encryption
Proxy re-encryption is a cryptographic primitive allowing a semi-trusted proxy to convert a ciphertext under public key of data sender into the ciphertext of the same message under public key of data receiver without full decryption [24].
Unidirectional Proxy Re-encryption [26] is an improved scheme of Proxy re-encryption allowing the data sender and the data receiver to complete re-encryption in the case of exchanging identities.Unidirectional Proxy Re-encryption is defined by a tuple of algorithms(P P, K eyGen, ReK eyGen, Enc, ReEnc, Dec) where: -P P( p) ⇒ p α is a deterministic algorithm takes large prime p as input and outputs public parameters p α .-K eyGen(1 k ) ⇒ ( pk, sk) is a PPT algorithm takes a security parameter k as input and outputs a pair of public and private keys ( pk, sk).-ReK eyGen(sk i , pk i , pk j ) ⇒ rk i, j Taking a secret key sk i , a public key pk i and a public key pk j as the input parameters, this algorithm provides undirectional re-encryption key rk i, j as the output.-Enc( pk, m) ⇒ C On taking p α , a message m and public key pk as the input parameters, this algorithm gives ciphertext C as the output.-ReEnc(rk i, j , C i ) ⇒ C i j This algorithm takes a ciphertext C i and re-encryption key rk i, j as inputs and outputs a re-encrypted ciphertext C i j .-Dec( p α , sk j , C i j ) ⇒ m On input p α , a private key sk j and a ciphertext C j , this algorithm gives message m as output.
Correctness and security requires that if

Symbol description
Table 1 lists some important symbol definitions

System structure
Different from a federated learning setting introduced in [30], we consider decentralized settings where the system consists of some participants who have a unique identity in [n] = 1, 2, ..., n connected in peer-to-peer network.We denote a participant by U i , i ∈ [m].Each participant U i holds an aggregator A i , a dynamic proxy P i , a key generator K i , a local model trainner T i as well as a local privacy dataset D i = (x( j), y( j)), where 1 ≤ |D i |.In which, each proxy has a primary address for sending messages.The subaddress will change at ending of a round.Besides, each aggregator owns a public subaddress for communicating with other aggregators.Moreover, we add a public key server so that participants can query other participants' public keys.The server doesn't participate in the training of the global model.Figure 2 shows the complete system structure and the brief process of the protocol.
In the system, the participants need to generate n shares of local gradient utilizing threshold secret sharing algorithm and distribute confidentially n − 1 shares to other participants' aggregators.Besides, to achieve system goal, they also need to cooperate to caculate candidate gradients as well and screen global gradient.

System goals
The goal of our system is to help all participants to train a global model over the combined dataset D = D 1 ∪ D 2 ∪ ... ∪ D n even if there are some participants(less than n 2 participants) dropouting even share malicious candidate gradients or gradient shares.Besides, protecting

Threat models
In our system, we mainly consider four kinds of adversaries.
external semi-honest adversaries: During transmission, participant's gradient shares may be eavesdropped by the external adversaries.Attacker can monitor any communication channel at a specific time to capture the information being transmitted.Although the shares are vectors, the attacker still perform inference attack to learn a participant's private dataset as well as local gradient according the gradients obtained by monitoring.-external malicious adversaries: The malicious participant may try to implement manin-the-middle attack or channel interference to destroy the parameter interaction process.-internal semi-honest adversaries: The internal semi-honest adversaries may try to obtain the model update parameters of the individual through the process of global gradient aggregation, and each internal semi-honest adversaries may conduct colluding attack with other internal semi-honest adversaries to learn about the honest participant's private parameters.
internal malicious adversaries: The malicious adversary may not only share false or malicious gradient's share, but also feed back false or malicious candidate gradients.Besides, the malicious may conduct colluding attack with other malicious participants to affect or even destroy the training process of the global model as well as learn about other participants' private parameters.-Assumptions: We assume that all participants do not change their public and private keys in a complete training process.Besides, we also assume that at least the number of 3t + 1 participants are honest among the total number of n participants and malicious adversaries neither further control the honest participant as well as public key server nor steal the keys of honest participants, where t = n 4 .Moreover, we assume that the channel between any two participants has a probability of r being destroyed or monitored by the enemy and any malicious adversaries cannot monitor or destroy two channels at the same time.Last but not least, we assume the number of exiting participants or offline participants shall not exceed t, during each round of training.

TFPA protocol
In this section, we present our TFPA protocol for federated learning that can simultaneously achieve robustness aggregation even if there are t participants dropouting or evil.Firstly, we give the overview of the TFPA protocol.Next, we design an one-time dynamic proxy protocol to protect participant's address privacy and identity privacy based on traceable ring signature and undirectional proxy re-encryption.The protocol takes the number of participants n, a threshold value of secret recovery t and a large prime p as inputs and outputs a family of sets of n participants' public keys pk = {pk 1 , pk 2 ..., pk n }, private keys sk = {sk 1 , sk 2 , ..., sk n }, undirectional re-encryption keys rk = {rk 1 , rk 2 , ..., rk n } as well as aggregtor addresses.Then, we design a robust candidate gradient aggregation algorithm to achieve security aggregation.The algorithm inputs of the second component are a family of sets of participants' local gradients G = {G 1 , G 2 , ..., G n } and outputs a sets of weight candidate gradient for global model Then, we design a byzantine-fault-tolerant global gradient screen protocol.Last but not least, we design a gradient calibration protocol so that all participants can trace the source of receiving gradient.All participants anonymously participate in the TFPA protocol to aggregate the global gradient as well as trace malicious gradient.Finally, we would give our analysis of correctness, robustness, privacy as well as performance.The specific protocol is described in detail in Algorithm 2.

Functionality overview
We describle the protocol in Figure 3.In the protocol, the participants collaborative train the global model and upload the gradients confidentially as well as trace malicious gradients without a center server participating.
In the offline phase, each participant operates one-time dynamic proxy protocol independently to establish the anonymous communication environment.Then, each participant trains a local model and calculate the n shares of local gradient.
During the online phase, we provide privacy protection for the gradients and enhance the robustness of aggregated results to a certain extent, but not ensure the 100% accuracy of the aggregated results.As mentioned before, the total number of the adversary in our protocols is less than t while the semi-honest adversary participants will follow the protocol specification but may try to learn more than allowed from the protocol transcript.
In the calibration phase, we provide a way of tracing the malicious gradients so that all participants can find the source of the malicious gradient.

One time dynamic address proxy
In the section, we describe the one time dynamic adddress proxy protocol.As a part of the aggregation protocol, it is used to protect participants' address privacy as well as identity privacy against potential collusions between the interacting parties.The protocol is done by a three-step procedure.In the first step, each key generator of the participants generates themself public key pk i and private key sk i through the K eyGen algorithm introduced in section III.D as well as section III.E and sends the public key pk i to corresponding the aggregator of the participants.Then, each aggregator of the participants sends its address and corresponding participants' public key to the public key server.In the second step, the public key server generates public parameter p α through the P P algorithm introduced in section III.E and random public parameter t as well as t polynomial coefficients, where t = n 3 and n is the number of participants participating in training.In the third step, each participant's aggregator downloads the public parameters p α , t as well as t polynomial coefficients and other n −1 participants' public key as well as aggregator addresses from public key server.Then, they send n − 1 participants' public key to corresponding participants' proxy and trainer and send t polynomial coefficients to corresponding participants' trainer.Each trainer separately generates a set of re-encryption keys rks = {rk i,1 , rk i,2 , ..., rk i,n } and sends them to corresponding participants' proxy.Then, the participant is able to send messages to other participants and recieve messages confidentially.Before the aggregation starts, all participants can join and exit freely.When a participant joins or exits, other participants need to update the public key set.Figure 4 presents an overview of two participants using this mechanism to realize confidential communication.

Robust aggregation protocol
From section III.B, we know that threshold secret sharing can guarantee that the adversary learns nothing about the secret as long as no more than t − 1 secret shares are public.In Figure 4 Overview of one time dynamic dddress proxy protocol for two participants addition, Shamir's Secret Sharing is additive homomorphic.We introduce the property and its corollary in Lemmas 1 and 2. We also prove Lemmas 1 and 2 in Appendix.

Lemma 1 if s x
δ is a secret share of s x and s y δ is a secret share of s y , then s z δ = s x δ + s y δ is a secret share of s z = s x + s y .

Lemma 2 if s x
δ is a secret share of s x and s y δ is a secret share of s y , then s z δ = γ s x δ + ηs y δ is a secret share of s z = γ s x + ηs y .
However, the Shamir's Secret Sharing scheme in [23] only share a single value rather than a matrix.We generalize the theory of reference [23] and give the definition of matrix secret sharing in Theorem 1.
Obviously, matrix secret sharing is also satisfied Lemmas 1 and 2. Based on Lemmas 1, 2 and 3, we design our robust aggregation protocol, where we introduce two or more non-colluded participants to share themself local gradients and aggregate the global gradients securely different from the strategy in [14].Figure 2 shows a simple flow chart of our protocol.We give a complete discrible of the protocol process here.
During one training epoch, our protocol consists of gradient sharing phase, rough aggregation phase, gradient voting phase, gradient screening phase.Besides, in all training stages, aggregators, trainers and agents of different participants run independently.
In the gradient sharing phase, the participants' trainers T = {T 1 , T 2 , ..., T n } train the model on the private datasets D = {D 1 , D 2 , ..., D n } and compute the gradients G = {G 1 , G 2 , ..., G n } in federated learning.Then, they caculate the n shares through the SS algorithm introduced in section III.A as well as the corresponding n ring signatures through the Sign algorithm introduced in section III.C, Besides, they encrypt the G i, j and corresponding ξ i, j through the Enc algorithm introduced in section III.D and send the ciphertext to their proxys.After updating their own address, the proxys re-encrypted the ciphertext through the ReEnc algorithm introduced in section III.D and send re-encrypt ciphertext to the jth participant' aggregator, where G i is local gradient vector of participants i, G i, j is the jth share of G i, j , 1 ≤ i, j ≤ n, and m is the characteristic dimensions of datasets D. Besides, the participants' trainers send the local gradient G i to their aggregators.
In the rough aggregation phase, the participants' aggregators wait and collect less 3t +1 Reencrypted shares from other participants' aggregators.Then, they decrypt them through the algorithm introduced in section III.D and record additional ring signature.Then, they trace and verify theses signatures so that exclude some gradients left with malicious signature labels.After verifying the correction of these signatures through the V er algorithm introduced in section III.C, these aggragators compute the aggregated results of recieved shares as candidate gradient through the Algorithm 1 and send the candidate gradients as well as corresponding ring signatures back to other participants' aggregators.
In gradient voting phase, the participants' aggregators wait and collect less 3t + 1 candidate gradients from other participants.After verifying the correction of these signatures, they calculate cosine distance between the recieved candidate gradients and local gradient and screen out t + 1 candidate gradients that is most similar to the local gradient.The participants' aggregators pack these candidate gradients as a candidate set V and caculate the corresponding ring signatures R V .Then, they send the set as well as corresponding ring signatures to other participants' aggregators.
In the gradient screening phase, the participants' aggregators wait and collect n candidate sets.After verifying the correction of these signatures, they count the votes of each candidate gradient and screen the candidate gradient with the largest number of votes as the global gradient G w current round.Besides, they send global gradient G w to themself trainer and the trainer update the local model.Until the model converges to the specified accuracy, the protocol stops running.Figure 5 shows the view of a participant in the aggregate protocol.Other participants' views are similar to it.

Gradient calibration protocol
From section III.C, we know that traceable ring signature doesn't allow participants in a ring to generate anonymous signatures, but also trace these signatures through labels.The two functions of traceable ring signature mainly depend on its anonymity and label linkability.Literature [25] introduces the properties and proves their correction in detail.

Algorithm 1 Gradient aggregation.
Require: a set G including l m-dimensional gradient vectors Ensure: a m-dimensional composite gradient vector G c 1: for i = 1 to l do 2: for j = 1 to l do 3: do nothing 7: end if 8: end for 9: end for 10: select the smallest r u,v and corresponding G u and G v , where u ∈ [1, l]  Return Based on traceable ring signature, we design our gradient calibration protocol where we introduce two or more non-colluded participants to calibrate the global gradient as well as trace the source of malicious gradient.The protocol consists of calibration phrase and tracing phrase.
During calibration phrase, the participants' aggregators pack the screened gradient in the aggregation protocol and corresponding ring signatures as the voucher and send it to other participants' aggregators while these aggregators wait and collect these vouchers from other participants.Then, they verify whether the screened global gradient is consistent with the gradients in most vouchers so that correct global gradient or wait for the next calibration.
In the tracing phrase, the participants' aggregators pick out these "Malicious" gradients different from their own gradients in these vouchers after correcting the global gradient.Then, they trace these gradients through the T r algorithm introducedin section III.D and record the Figure 5 The view of a participant in the aggregate protocol ring signature label as malicious lables so that it can be more convenient to exclude gradients with these malicious label.

Correction and efficiency
It is easy to verify that the training protocol is correct if the participants provide their true gradient inputs and don't alter model or use malicious model.In Theorem 2, we describle the correction of our protocols.We will prove the conclusion in Appendix.(2) The Key generator of participant i generates key pairs k i = ( pk i , sk i ) and send k i to A i .

Theorem 2 (CORRECTION) Consider an execution of TFPA protocol where n participants follow the protocol. In the one time dynamic address proxy protocol, all participants can join and exit freely and achieve confidential communication with other participants. Besides, each party can't know the ip address of other parties and has
(3) A i sends the pk i to dynamic address proxy P i while the pk i and its own ip address to public key server.(4) Public key server initials the threshold parameter t, the public parameter p α as well as t polynomial coefficients and pack recieved pk i as public key set pk.
(5) A i downloads the p α , t as well as t polynomial coefficients and pk.(6) A i sends pk, t as well as t polynomial coefficients to T i while pk to P i .( 7) T i generates the set of re-encryption keys rk i = {rk i,1 , rk i,2 , ..., rk i,n } and sends rk i to P i Part II: Aggregation protocol (1) T i initializes the parameter vector of the local model θ = 0.
(2) T i computes the gradient of local model where z ∈ [m] and regards the jth share of gradient G i as G i, j = {G 1 i, j , G 2 i, j , ..., G m i, j } (4) T i computes ring sigature ξ i, j = Sign(sk i , pk, G i, j ) where j ∈ [n].
(5) T i packs G i, j and ξ i, j as M i, j , then computes C i, j = Enc( pk j , M i, j and sends C i, j to P i .( 6) P i computes re-encryption ciphertext R i, j = ReEnc(rk i, j , C i, j ) and and sends R i, j to A j .(7) A i waits and collects ciphertext, then A i computes plaintext M i, j = Dec(C i, j , sk i ) and verifies whether the value of expression V er( pk, G i, j , ξ i, j ) is true to decide wether keep the gradient as well as correspondMaximum alloweding ring signature.(8) Until at least 2t + 1 gradients are retained, A i computes the candidate gradient G w i according Algorithm 1 and corresponding ring signature ξ w i = Sign(sk i , pk, G w i ).(9) A i packs G w i and ξ w i as M w i , then computes C w i = Enc( pk j , M w i and sends C w i to A , where ∈ [n]and = i.(10) A i waits and collects ciphertext, then A i computes plaintext M w i = Dec(C w i , sk i ) and verifies whether the value of expression V er( pk, G w i , ξ w i ) is true to decide wether keep the candidate gradient as well as corresponding ring signature.(11) Until at least 3t + 1 gradients are retained, A i computes the cosine similarity between G i and per recieved gradient and screens t + 1 most similar candidate gradients as vote set V i .(12) A i computes the ring signature ξ v i = Sign(sk i , pk, V i ) and packs V i and ξ v i as M v i , then computes C v i = Enc( pk j , M v i and sends C v i to A , where ∈ [n] and = i.(13) A i waits and collects ciphertext, then A i computes plaintext M v i = Dec(C v i , sk i ) and verifies whether the value of expression V er( pk, V i , ξ v i ) is true to initially identify and record malicious participants as well as decide wether keep the vote set as well as corresponding ring signature.(14) Until at least 3t + 1 vote sets are retained, A i counts the gradient with the highest number of occurrences in these votes as the global gradient G W i of this epoch and sends G W i to T i .(15) T i updates the parameter vector of the local model θ according formula (2).( 16) T i test the accuracy of model.If accuracy ≤ Acc or Epoch current ≤ Epoch max , stopping training, Otherwise continue to train.Part III: Gradient Calibration Protocol (1) A i pack G W i as well as corresponding ring signature as voucher V i and computes C v i = Enc( pk j , V i and sends C v i to A , where ∈ [n] and = i. (2) A i waits and collects ciphertext, then A i computes plaintext V i = Dec(C v i , sk i ) and verifies whether the value of expression V er( pk, V i , ξ v i ) is true to decide wether take it as the basis for decision-making as well as further identify and record malicious participants.
(3) Until at least 3t + 1 vouchers are retained, A i verifies whether your global gradient is consistent with the gradient in at least t + 1 vouchers.If it is consistent, ignore this verification.Otherwise, judge whether there are at least t + 1 gradients in the vouchers.If there are, take this constant gradient as the global gradient.Otherwise, ignore this verification.(4) Besides, A i filters out the gradient in the voucher that is significantly different from the modified gradient, calculates its cosine similarity with the modified gradient.If the similarity exceeds error κ, further traces the source of the gradient according to the signature in the voucher as well as the previously retained signature information and malicious records and sends the encrypted tracing result with ring signature to other aggregators.(5) A i collects tracing result and verifies the correction of these result.Then, A i record these result so that it can identify and exclude gradients from malicious participants in subsequent training.

Security and privacy
In this section, we evaluate the security of our protocols and state our security theorems.We consider the security of TFPA protocol against two different kind adversaries, namely semi-honest adversaries including inside semi-honest adversaries and outside semi-honest adversaries, malicious adversaries including inside malicious adversaries and outside malicious adversaries .The security of the training protocols are proved in the simulation paradigm using the hybrid arguments.In Theorems 3 and 4, we describle the security of TFPA protocol against semi-honest adversaries and malicious adversaries.

Theorem 3 (SECURITY AGAINST SEMI-HONEST ADVERSARIES) Consider an execution of TFPA protocol where n participants follow the protocol. The TFPA protocols are secure in the presence of semi-honest adversaries, meaning participants' local gradient will not be disclosed to any adversary if the number of collusive semi-honest adversary is less than t = n
4 .Then, we prove the conclusion in Appendix.

Theorem 4 (SECURITY AGAINST MALICIOUS ADVERSARIES) Consider an execution of TFPA protocol where n participants follow the protocol. The TFPA protocols are secure in the presence of malicious adversaries, meaning participants can cooperate to train a good global model without disclosing local gradient to any adversary if the number of collusive malicious adversary is less than t = n
4 .Then, we prove the conclusion in Appendix.

Complexity
In this section, we assess the computational complexity as well as communication complexity of our protocols and state our complexity theorems.We measure the computational complexity of TFPA protocol for computing the global gradient in terms of the number of ring signatures, the number of encryptions, and the number of times to calculate cosine similarity(CS).Table 2 summarizes their range in five phrases including gradient sharing phrase, rough aggregation phrase, gradient voting phrase, gradient screening phrase.Asymptotically, the overall computational complexity for TFPA protocol is O(n 3 ), where m is the characteristic dimension of local dataset, n is the number of participant.Let ζ as the average bit length of a ciphertext by a single participant in a single communication.Then, the total communication cost for the five phrases of TFPA protocol is 5 ζ n 2 , where the complexities of the gradient sharing phrase, rough aggregation phrase, gradient voting phrase and gradient screening phrase are 6 Experimental evaluation

Implementation details
We implement these cryptographic primitives in our protocol including shamir's secret sharing, traceable ring signature and undirectional proxy re-encryption based on SM2 algorithm.We also have implemented complete TFPA protocol in python without using a third-party machine learning as well as cryptographic algorithm library.Our experiments are executed on a personal computer with four Intel(R) Core(TM) I7-10510U CPU @ 1.80GHz 2.30 GHz and 16G RAM in the LAN setting.All protocols have been implemented in Python3 language with Wing Pro 7.2 IDE.

Experiments setup
We evaluate the performance of TFPA protocol on 5 different real-world datasets, in which, 4 datasets from the UCI ML repository [31] and without division of training data and test data, summarized in Table 3.We use these datasets with various sizes and dimensions to verify the effectiveness of TFPA protocol for different machine learning / deep learning algorithms including Logistic Regression, SVM, Linear Regression, RNN, as well as MLP, and to evaluate the performance of TFPA protocol under different parameter settings.The dimension of the dataset ranges from 13 to 784.For Logistic Regression, SVM,RNN and MLP, we use correct proportion of classification to measure the predictive accuracy of the model.For Linear Regression, we measure the accuracy of the model using the decline rate of the root mean squared error (RMSE).Note that no privacy-preserving mechanism is used to compute the accuracy of the model.In the experiment, for the datasets without division of training data and test data, we take 70% as the training set and 30% as the test set.Besides, if there are n participants participating in the experiment, we will first divide the dataset into n different subsets and distribute them to per participant.Each participant takes 70% of the subset as the training set and 30% as the test set.

Accuracy
We compare the same model in different scenarios and we find that our accuracies in the semi-honest and malicious setting are both nearly to the prediction accuracy of standard federated learning as well as local training.We plot the accuracy changes over the training epochs in a Logistic.For instance, Figure 6 shows that the curve for TFPA, almost coincides with the curve for federated average, the curve of local training as well as the curve of Fltrust  Experimental results of logistic for accuracy [30].And we get similar results for Linear Regression, SVM, and MLP, we plot them in Appendix due to the space limit.The best accuracies for each model in all protocols are shown in Table 4.
Convergence Rate Our experiments also illustrate that the convergence rates in all protocols over training epochs are close, the secure techniques do not influence the results much.For instance, the convergence rates of TFPA and Standard FL are almost the same as Fltrust for Logistic Regression as well as RNN.It is obvious that the models gradually convergent at around 30 epochs in different protocols for Logistic Regression in Figure 6 while at around 100 epochs in different protocols for RNN in Figure 7.
Performance Our protocols mainly includes offline phase and online phase.In the offline phase, participants need to train the local model as well as generate the shares of local gradient.Besides, they also need to generate ring signatures of these shares and encrypt content.In the online phase, participants need to synthesize candidate gradients as well as vote to filter the global gradients.We run the complete protocol 10 times for each model and take the average as the result(100 epochs).To better evaluate the runtime, we also implement the standard federated protocol as well as Fltrust protocol and test its runtime in the same way.Besides, we also test the runtime of local training.The details are in Table 5.From Table 5, we can see that the runtime of TFPA is nearly to the time of standard FL and Fltrust in offline phrase.In online phrase, the runtime of TFPA is 40s more than standard federated learning on average while runtime of TFPA is close to Fltrust.However, we know that TFPA needs extra time to generate shares of local gradients and corresponding ring signature as well as encryption.If some participants dropped out in the training process, this reduces their efficiency greatly, while our protocols do not need this extra operation even if the same misfortune happens.This is because the participants in our protocols independent of each other, which means few dropped participants do not impact other online participants.Combining offline and online, we can see that our protocol's whole performance is quite good.On the other hand, TFPA only needs a public key server, but Standard FL and other schemes based on secret sharing need two or more servers, and we require that at most t = n 4 participants could collude.It is easy to satisfy this security requirement in practice.Meanwhile, in Standard FL, the parameter server would learn the aggregated gradients and even the trained model, but in TFPA, the public key server can learn nothing.What's more, our protocols are more robust than Standard FL, since we allow t = n 4 participants to halt at worst but Standard FL must ensure the parameter server runs normally.

Related work
Based on the C-S architecture, federated learning realizes the idea of multi-party collaborative training, which protects the data security of the client to a certain extent, but there are still security issues such as privacy disclosure.A large number of experts and scholars have proposed many security schemes around this problem.
In 2017, Bonawitz et al. [8] proposed an efficient and robust protocol for high-dimensional data security aggregation.The protocol has a constant number of rounds, low communication overhead, robustness to failures, and only needs a server with limited trust, which can safely calculate the sum of vectors.However, the scheme needs a centralized trusted execution environment to ensure the security of aggregation.Then, Corrigan-Gibbs and Boneh [12] extended the classical private aggregation protocol and designed a system for massive data collection and statistics based on secret sharing noninteractive proofs (SNIPs).However, this scheme is only applicable to regression statistics and cannot be well applied and generalized to general federated learning scenarios.In 2018, Mandal et al. [10] proposed a noninteractive pair key generation scheme for mobile participants, and designed an efficient and robust low communication aggregation scheme based on the scheme improvement of Bonawitz et al.However, this scheme requires trusted service providers to participate in gradient verification and cannot achieve byzantine fault tolerance.In 2019, Mandal and Gong [9] designed two privacy protection protocols for training linear and logistic regression models based on the additive homomorphic encryption (HE) scheme and aggregation protocol, which can fight against semi-honest opponents.However, the schemes designed in this paper cannot be well extended to other learning algorithms, nor can they achieve byzantine fault tolerance.In 2020, D. Wu et al. [3] proposed an effective security aggregation scheme, which realized the efficient security update of participant model by introducing matrix transformation and function encryption.However, the scheme needs the participation of a trusted third party and a centralized aggregator so that the scheme is unable to defend against malicious attacks from the aggregator.Besides, Bell et al. [6] improved the security aggregation protocol proposed by Bonawitz et al.in 2017 based on k regular graph, reducing the communication overhead of security aggregation to O(nlog(n)).Based on the anonymity of data collection, this paper designs a security shuffle model and two new protocols with semi-honest and malicious security, which can provide better privacy protection for participants.However, the scheme requires a central server to faithfully forward the mask share as well as calculate aggregate values.Then, Dong et al. [13] proposed a vector weighted aggregation protocol based on Hamming distance that can resist Byzantine attacks.This protocol can complete security aggregation when there are 1/2 proportion of malicious nodes in the system.However, the scheme needs a centralized aggregator to perform complex operations.Moreover, in order to ensure communication security, two party security protocols are frequently used, which greatly increases the communication complexity of the scheme.In addition, Dong et al. [14] improved TernGrad protocol based on secret sharing and homomorphic encryption, designed a privacy protection protocol that can resist semi-honest adversaries, and used SIMD to optimize the aggregation efficiency of the protocol.However, as the author mentioned in the article, the scheme cannot guarantee the accuracy of aggregation.In contrast, our protocol can ensure the accuracy of aggregation even if there are a few malicious participants.Furthermore, Cao et al. [30] improved federal average algorithm to achieve byzantine fault tolerance.Besides, the author designed a series of attack experiments to verify the robustness of the algorithm.However, this scheme is based on centralized network settings and cannot resist attacks from malicious aggregators.Moreover, this scheme also cannot protect the local gradient and address privacy of participants.In 2021, So et al. [4] designed a secure and robust federated learning aggregation framework based on multigroup circulation strategy and secret sharing technology.This framework can ensure that when the node exit ratio is less than 50%, the security aggregation of online nodes can be completed, and the communication complexity can be controlled at O(nlog(n)).Howerver, the scheme does not realize complete decentralization as well as byzantine fault tolerance.Then, So et al. [5] also proposed a security aggregation framework with multiple rounds of privacy guarantees.The use of structured participant selection strategy makes the framework better guarantee the fairness of aggregation, so as to achieve the purpose of long-term protection of participant privacy.However, the scheme needs a centralized aggregator to complete aggregation operation so that it can not achieve byzantine fault tolerance.Moreover, Hosseini et al. [7] extended the classical MPHE method based on secret sharing and designed a security aggregation scheme.The central node only needs to receive at least one encrypted version of a single gradient to complete the aggregation gradient calculation.At the same time, in order to ensure the efficiency of the new scheme, this paper designs a gradient vector compression scheme.However, the scheme needs a centralized aggregator to complete aggregation operation so that it can not achieve byzantine fault tolerance.In addition, Dong et al. [11] proposed an efficient distributed machine learning protocol based on secret sharing, which can ensure that each client can not learn any information except the model, and the parameter server can not learn any privacy information about the client.However, the scheme cannot resist malicious attacks from aggregators due to centralized network setting.In contrast, our scheme also realizes the protection of user address privacy and identity privacy.

Conclusion
We propose a novel secure decentralized Byzantine-fault-tolerant federated learning protocol by combining distributed machine learning with secret sharing and ring signature.The participants can complete the training of the global model without disclosing identity privacy, address privacy and local data sets.We also propose the method to help participants to find and trace malicious participants and malicious gradient.We evaluate our framework on Linear Regression, Logistic Regression, SVM, MLP and RNN, and achieve excellent results both in accuracy and performance.We will evaluate our framework on PCA, bayesian Classification, gradient ascending tree and so on.We will also continue to design and implement an attack experiment to verify the security of the framework.We define the sum of two polynomial-functions F(r ) and G(r ) as H (r ).
Obviously, H (r ) is a polynomial-function of degree t − 1 with t cofficients, which can be regarded as a polynomial share of a new secret s z δ , so we can get that 123

Proof of Lemma 2
According to the result formula (8), we can turn to getting the Linear superposition of the two shares (γ a i + ηb i ) * r i mod P. (11) Similarly, We define the Linear superposition of two polynomial-functions F(r ) and G(r ) as Q(r ).

Proof of Theorem 2
We consider the aggregation of n gradients in Figure 11(a) including G i , i ∈ [1, 2t +1].Each gradient is a high-dimensional vector.However, due to space constraints, the 2-dimensional vector is shown here as an example.According to federated averaging algorithm, we can get the aggregation gradient G w = 1 n 2t+1 i=1 G i , showed Figure 11(b).On the other hand, according to Algorithm 1, we will find two vectors G x and G y with minimum distance correlation coefficient firstly.Then, we take them as the cluster center and divide all vectors into two groups based on the distance correlation coefficient.We let G x ∈ G A and G y ∈ G B, where G A and G B are two collections including c A gradients and c B gradients respectively.According to algorithm, if c A ≥ c B , then G core = G x , else G core = G y .We assume c A ≥ c B .Last, the aggregation gradientG w = 2t+1 i=1 G i * r i,core ) is obtained by calculating the weighted value of these new gradients, where r i,core is the distance correlation coefficient between G i and G core , i ∈ [1, n].The whole process is showed in Figure 11(c) as an example.We know that the distance correlation coefficient between G core and these gardients in G A are greater than 0 while the distance correlation coefficient between G core and most of gardients in G B are smaller than 0. But, these gardients in G B will be reversed due to the weighted value from distance correlation coefficient.Therefore, the distance correlation coefficient between G w and these gradients in {G i } n 1 are greater than 0.Then, each participant will screen a global gradient by caculating the distance correlation coefficient as well as voting.There will more over t +1 participants attaining a global gradient and the distance correlation coefficient between global gradient and local gradient is greater than 0. It can be clearly seen from the figure that, compared with the federated averaging algorithm, the gradient direction aggregated by Algorithm 1 can better represent most gradient directions and can accelerate the model convergence of most participants.Besides, according to our protocol, the number of malicious gradient is less than that of normal gradients all the time based our assumption.Therefore, the protocol can ensure that the candidate gradient is close to the normal gradient.Besides, malicious gradient will be multiplied by negative weight to gain the composite gradient.

Proof of Theorem 3
Less t semi-honest adversaries cannot recover the original gradient of any through collusion attack.The subsequent screening process can obviously tolerate the attack of semi honest opponents.

Proof of Theorem 4
According to our protocol, in the gradient share exchange phase, each participant will receive at least 2t + 1 shares from different participants, including l ≤ t malicious gradients.We consider that l = t, the remaining t + 1 gradients are normal, while the number of malicious gradient is less than that of normal gradients.Therefore, the protocol can ensure that the candidate gradient is close to the normal gradient.Besides, threshold secret share protocol guarantees that malicious participants cannot recover the local gradient of any honest participant through collusion attack when the number of malicious participants is less than t + 1.In the candidate gradient exchange phase, each participant will also receive at least 3t + 1 candidate gradients from different participants.Assuming that p ≤ t malicious gradients, each honest node will screen t + 1 gradients for voting according to the protocol.When p = t, the remaining 2t +1 gradients are normal.Then the maximum probability of the malicious gradient being selected as the global gradient is l * ( sk) is a PPT algorithm takes a security parameter k as input and outputs a pair of public and private keys ( pk, sk); -Sign(sk i , L, m) ⇒ δ m This algorithm takes a secret key sk i , a tag L = (issue, pk) and a message m as input and outputs a signature δ m where pk = ( pk i , ... pk n ).-V er(L, m , δ m ) ⇒ b is a deterministic algorithm takes a tag L = (issue, pk), a signature δ m and a message m as input and outputs a bit b such that b = 1 if the signature is valid and b

Figure 2
Figure 2 System structure and brief process of protocol

Figure 3
Figure 3 Overview of TFPA protocol

Algorithm 2
Detail of TFPA protocol.Parties: Participants 1, ..., n and a public key server.Public parameters: Characteristic dimension m, input domain D m , and PRG F : R λ → D m .Input:D i ∈ D m (by each participant i).Output:a global model M We denote by D i the training data set of Participant i, P i the dynamic address proxy of Participant i, A i the aggregator of Participant i, T i the trainer of Participant i, where i ∈ [n] Part I: One Time Dynamic Address Proxy Protocol (1) A i and P i initialize their ip address while A i , P i , T i and corresponding key generator exchange each other's internal communication address.

Figure 6
Figure 6 Experimental results of logistic for accuracy

FigureFigure 9
Figure Experimental results of Linear Regression for accuracy

Figure 10
Figure 10 results of MLP for accuracy

Figure 11
Figure 11 for acceleration of algorithm

Table 1
Symbol description As per participant do not send the private inputs to other participant, each participant need to collect gradients from other participant and composite global gradient.All participants should perform the majority of work in the training phase.The computation cost and communication costs of the training should be minimal.

Table 2
Computational complexity range

Table 3
a Dimension of the dataset

Table 4
Accuracy for each model in all protocols Figure Experimental results of RNN for accuracy

Table 5
Run time of all protocols (s,100 epochs) ) is a polynomial-function of degree t − 1 with t cofficients, which can be regarded as a polynomial share of a new secret s w δ , so we can get that i=0 (γ a i + ηb i ) * δ i mod P =γ s x δ + ηs y δ(13)