OSM-DSSE: A Searchable Encryption Scheme with Hidden Search Patterns and Access Patterns

Dynamic searchable encryption methods al-low a client to perform searches and updates over encrypted data stored in the cloud. However, existing researches show that the general dynamic searchable symmetric encryption (DSSE) scheme is vulnerable to statistical attacks due to the leakage of search patterns and access patterns, which is detrimental to protecting the users’ privacy. Although the traditional Oblivious Random Access Machine (ORAM) can hide the access pattern, it also incurs signiﬁcant communication overhead and cannot hide the search pattern. These limitations make it diﬃcult to deploy the ORAM method in real cloud environments. To overcome this limitation, a DSSE scheme called obliviously shuﬄed incidence matrix DSSE (OSM-DSSE) is proposed in this paper to access the encrypted data obliviously. The OSM-DSSE scheme realizes eﬃcient search and update operations based on an incidence matrix. In particular, a shuﬄing algorithm using Paillier encryption is combined with 1-out-of-n obliviously transfer (OT) protocol and local diﬀerential privacy to obfuscate the search targets. Be-sides, a formalized security analysis and performance analysis on the proposed scheme is provided, which indicates that the OSM-DSSE scheme achieves high security, eﬃcient searches, and low storage overhead. Also, this scheme not only completely hides the search and access patterns but also provides adaptive security against malicious attacks by adversaries. Furthermore, experimental results show that the OSM-DSSE scheme obtains 3-4x better execution eﬃciency than the state-of-art solutions.


Introduction
The rise of cloud service provides vast benefits to society and the IT industry. Storage-as-a-Service is one of the most common cloud services available, which allows the client to store data online remotely and access data everywhere, reducing the cost of data management and maintenance. Despite the merits, Storage-as-a-Service also brings significant security and privacy issues. Once data is outsourced, a client loses the ability to control the data. Also, sensitive information may be tampered or stole by a malicious user. Although the client can encrypt data with standard encryption schemes (e.g., AES) to ensure confidentiality, basic operations (e.g., search/update) on the encrypted data could not be performed. And substantial computational overhead is incurred, which greatly reduces the benefits of the cloud service.
To solve the above problems, in 2000, Song et al. [1] first proposed the concept of searchable symmetric encryption (SSE). As a new encryption primitive, sea-rchable encryption enables the user to search for a keyword over the ciphertext. However, the application was limited to search on static encrypted data and was unable to resist the simple adversary attack. In 2003, Goh et al. [2] formally defined the secure index and developed a security model called the "semantic security" for adaptive selective keyword attacks. However, the accuracy of query result was limited due to the use of the Bloom filter. In 2006, Curtmola et al. [3] proposed two new security models called "adaptive secu-rity" and "non-adaptive security", introducing a singlekeyword-search SSE with a formal security definition. Due to the limitations of the SSE proposed earlier and the dilemma between ensuring user privacy and efficient data usage on the cloud, Kamara et al. [4] introduced the dynamic searchable symmetric encryption (DSSE) method, which enabled the user to perform search and update operations on encrypted data.
The general searchable encryption algorithm improves the search efficiency at the cost of leaking some information about files or queries to the server, such as the search pattern and the access pattern. It is generally acknowledged that the searchable encryption scheme is secure unless it does not reveal user data and query information other than the information disclosed by the leakage profile. However, in the real world, an adversary can exploit these leakages to launch statistical attacks to recover the user data and query information. For instance, Islam et al. [5] and Cash et al. [6] firstly exploited access pattern leakage and prior knowledge about the dataset to recover the user's query information. Liu et al. [7] exploited the search pattern to launch attacks and obtained users' query information. Zhang et al. [8] completely exposed the client's query and recovered user data and query information through the file injection attack. Simon et al. [9] leveraged both access and search pattern leakages to recover the keywords of queries. Therefore, an important direction for future research is to focus on the suppression of information disclosure, rather than setting it as default.
For the leakages and attacks described above, although some solutions have been proposed, most of the research focuses on forward-secure and backwardsecure methods [10][11][12]. The Oblivious Random Access Machine (ORAM) can address the problem of access pattern leakage [13][14][15], but it was impractical for wide-spread adoption. Garg et al. [16] exploited ORAM and garbled RAM (Random Access Memory) to hide the search pattern. Kamara et al. [17] proposed a general scheme for suppressing search pattern leakage. The united structured encryption (SE) based on ORAM made the scheme more efficient than ORAM, but the scheme was static. Besides, the recently proposed literature introduces differential privacy mechanisms [18] and hashing techniques [19] to obfuscate access patterns. In principle, solutions that do not leak any information to the server can be built on powerful techniques such as secure two-party computing, full homomorphic encryption (FHE), etc., but they are often impractical.
To achieve more secure searchable encryption, both the search pattern and the access pattern need to be hidden. The above-proposed solutions solve either the search pattern leakage or the access pattern leakage but not both. Although the method proposed by Hoang et al. [20,21] exploited distributed data structures to hide the search pattern or the access pattern, it is usually necessary to consider whether there is collusion between servers. Akavia et al. [22] proposed a secure search on an encrypted data structure using FHE. This scheme can seal the leakages of the search pattern and the access pattern, but it is difficult to deploy in real environments. In addition, these solutions only address the information leakage on the index (explicit) without considering the information leakage in accessing the file (implicit).
In this paper, a new dynamic searchable encryption sch-eme, called oblivious shuffle incidence matrix DSSE (OSM-DSSE), is proposed to access encrypted data obliviously under a single server. OSM-DSSE can hide both explicit and implicit search patterns in addition to the access pattern, contributing to a higher level of security. The contributions are as follows: • This paper proposes a shuffling algorithm with Paillier encryption [23,24] to address the problem of access pattern leakage, which can shuffle the data in the incidence matrix to change the access path. • This paper performs the search based on the group query and efficient 1-out-of-n OT protocol, ensuring the privacy of the server and the client. At the same time, random tokens can be generated to combine with the shuffling algorithm to hide the explicit search pattern. • Since searching the same keyword always returns the same file set, exposing the response length is disclosed (implicit search pattern). This paper utilizes differential privacy based on the random response to hide the response length, so that the search pattern can be completely hidden.  [25,27,26]. Chase and Kamara proposed structured encryption to support queries on arbitrarydata-structure. Kamara et al. [4] proposed the concept of dynamic searchable encryption, making searchable encryption no longer limited to static operations. Although subsequent research efforts focused on effectiveness [17,28,29], dynamics [10,30,31], localization [32,33], security [34][35][36], and complex functions [37][38][39], they still suffer from leaking some important information. Attackers can use these leakages to attack and recover data and cause more serious information leakages. At present, some solutions have been proposed to deal with these leakages and attacks, but these researches primarily focused on forward-secure and backwardsecure properties. Forward-secure refers to the ability to break the linkability of newly added data and query keywords; backward-secure means that the server is no longer able to match and retrieve the deleted data. Stefanov et al. [10] proposed the first solution to support forward-secure property, but it exhibits a linear time complexity for the search. Bost et al. [11] proposed a scheme relying on primitives such as constrained pseudorandom functions and puncturable encryption to achieve fine control of the opponent's power, preventing the adversary from evaluating functions on selected inputs, or decrypting specific ciphertexts for forward and backward security. Sun et al. [12] proposed the first practical, non-interactive backward-secure SSE scheme using symmetric punctured encryption. However, the forward-secure and backward-secure methods mainly aim at the information leakage in the update phase, without considering the information leakage of the access pattern and the search pattern. As a result, the problem of information leakage was not completely solved.
The oblivious random access machine (ORAM) can hide the access pattern by confusing each access process to make it indistinguishable from random access. The access pattern refers to the sequence of operations and memory addresses. The ORAM was first proposed by Goldreich et al. [13] to ensure that any data block in memory did not permanently reside at a physical address and that two accesses are unrelated. Goldreich et al. also proposed an ORAM model, giving a square root (Square-Root) and a layered solution. Zhang et al. [14] proposed a method based on ORAM access pattern protection in the cloud storage environment. Garg et al. [16] proposed a TWORAM scheme that reduces the client storage overhead while hides the file access pattern with ORAM. However, the researches have shown that using ORAM to eliminate information leaks leads to high overhead and low execution efficiency [40][41][42].
The rest of this paper is organized as follows. The preliminaries are presented in Section 2. The overview of the proposed scheme is described in Section 3. The detailed description of the algorithms is provided in Section 4. The security analysis is provided in Section 5. The experiment and analysis are provided in Section 6. The conclusion is in Section 7.

Preliminaries
In this section, some preliminaries used in the following sections are introduced. In this paper, F = (f 1 , f 2 , ..., f n ) denotes a collection of n files, and W = (w 1 , w 2 , ..., w m ) denotes a set of all possible keywords, where W i represents the subset of W . Each file f i associates with a unique keyword list W i ∈ W .

Symmetric encryption
The symmetric key encryption scheme is represented as a tuple ε(Gen, Enc, Dec) presenting IN D − CP A encryption scheme. It is used to encrypt and decrypt the documents, which usually consists of three polynomialtime algorithms: • K F ← ε.Gen(1 κ ) : Key generation algorithm is a probabilistic algorithm. It accepts the input of a security parameter κ and outputs the key K F for the file collection. • c ← ε.Enc K (m) : Encryption algorithm is a probabilistic algorithm. It accepts the inputs of a key K F and a message m, and outputs the ciphertext c. • m ← ε.Dec K (c) : Decryption algorithm is a deterministic algorithm. It accepts the inputs of a key K F and a ciphertext c, and outputs the message m.

Dynamic Symmetric Searchable encryption
A dynamic SSE scheme DSSE = (KeyGen, BuilIndex, SrchT oken, U pdT oken, Search, U pdate, ε) is a tuple that consists of six polynomial-time algorithms and a symmetric key encryption scheme: • (K) ← KeyGen (1 κ ) is a probabilistic key generation algorithm for the client to initialize. It takes the security parameter κ as input and outputs the key K = (K I , K F ), where K I and K F ← ε.Gen(1 κ ) are for the secure index and the file collections, respectively. • (I) ← BuildIndex (K I , (F, W )) is a probabilistic algorithm for the client to build a secure index. It takes the key K I and a file collection F along with a keyword list W i as inputs, and outputs a secure incidence I. • (c) ← ε.Enc (K F , F ) is a probabilistic algorithm for the client to encrypt the file collection. It takes the key K F and file F as inputs, and outputs the ciphertext c. A ciphertext collection C = (c 1 , c 2 , ..., c n ) is encrypted one by one.
• (τ s ) ← SrchT oken (K I , w) is a possibly probabilistic algorithm for the client to generate a search token. It takes the key K I and a keyword w as inputs, and outputs a search token τ . • (R) ← Search (I, τ ) is a deterministic algorithm for the server. It takes the secure index I and search token τ as inputs, and outputs the search result R. • (τ u , c f ) ← U pdT oken (K, f ) is a possibly probabilistic algorithm for the client to generate an updated token. It takes the key K and a file f as inputs, and outputs an updated token τ u . • (I , C ) ← U pdate (I, c, τ u ) is a deterministic algorithm for the server. It takes the secure index I, a ciphertext c, and an updated token τ u as inputs, and outputs the new secure index I and new ciphertext collection C .
for the client to decrypt a ciphertext. It takes the key K F , and a ciphertext c as inputs, and outputs a file F .

Paillier encryption
Paillier encryption [23], a public-key cryptosy-stem based on composite degree residuosity class, is an additive homomorphic encryption (AHE) scheme that provides semantic security. It is a tuple that consists of three algorithms P E = (Gen, Enc, Dec): • (pk, sk) ← Gen (p, q) : Key generation algorithm. It takes two large prime numbers p, q as input. Let n = pq and λ = lcm (p − 1) (q − 1), then select a g ∈ Z + n 2 with uniform probability, and µ = L g λ mod n 2 −1 . It is noted that L is defined as L (x) = x−1 n , the public key is pk = (n, g), and the private key is sk = (λ, µ).
• c ← Enc pk (m) : Encryption algorithm. For the message m ∈ Z + n , a r ∈ Z + N is selected with uniform probability, then it outputs the ciphertext c = r n g m mod n 2 .

Oblivious transfer protocol
The OT protocol [43] transforms information in a fuzzy way to effectively protect the privacy of the parties, i.e., the sender and the receiver. In this paper, the 1-out-of-n OT protocol is applied to the search result. The server and the client are the sender (S) and the receiver (R), respectively.
Obliviously transform phase: • The client randomly selects r (r < p) to calculate y = g r h σ mod p and sends it to the server. • the server calculates (α 1 , c 1 ) , ..., (α n , c n ) and sends it to the client, where α j = g kj mod p, c j = ε (m j ) y h j kj mod p, k j ∈ R Z * p , j = 1, ..., n . • The client decrypts ε (m σ ) = c σ / α σ and obtains the symmetrically encrypted ciphertext.
The OT protocol has the following three basic properties: Correctness: If the S and the R follow the protocol step by step, the R only obtains the selected message Sender's privacy: After S and R execute this protocol, R does not know any messages except its choice. That is, the ciphertexts are computationally indistinguishable for R.
Receiver's privacy: After S and R execute this protocol, S does not know which message the R has obtained. That is, the choice of R is computationally indistinguishable for S.

Random response
Random response [44] is a disturbance mechanism of differential privacy protection. It protects the privacy of the original data through the uncertainty of the response to sensitive issues. It often adds noise by tossing the coin. The question is formalized: 1. I have a sensitive attribute A (yes); 2. I have no sensitive attribute A (No). At this point, a coin is tossed. If the head is obtained, the answer is "yes". Otherwise, the coin is tossed again. If the head is obtained, the answer is "yes"; otherwise, the answer is "no". Due to the reasonable denial of "yes" and "no", the random response is used to protect privacy. The algorithm is formalized as b ← RandomResb, p, q, where b represents the real answer, and b is the output of a random response.
Suppose the value n needs to be perturbed randomly, and each respondent answers the question once. The number of people who answered "yes" is n 1 , and Fig. 1 The OSM-DSSE system model the number of people who answered "no" is n−n 1 . According to the statistics, the proportion of users who answered "yes" and "no" is as follows.
where, X represents the event, p represents the probability of head on the first coin, and q represents the probability of head on the second coin.

System model
Our system utilizes the client-server model (refer to Fig.  1). The client extracts the keywords of the file and constructs an incidence matrix between the keywords and the file, encrypts the incidence matrix and the file, and sends them to the server. The client issues search and update requests to the server. The server stores the encrypted incidence matrix and responses to the client's search and update requests. Note that we consider a semi-honest (honest but curious) server. During the access, even though data files are encrypted, the cloud server may try to derive other sensitive information from users' search requests. Thus, although the server can faithfully follow the protocol, it can learn information.

System goal
Our goal is to effectively perform a privacy-protected keyword search and file update on an encrypted cloud databa-se. The main objectives of this system are as follows: • Hide the access pattern We utilize Paillier encryption to shuffle the incidence matrix. This algorithm can randomize the position of keywords in the incidence matrix, confuse access paths, and hide access patterns. • Hide the search pattern (1) Based on the group query, the server utilizes the two-level map to obtain the target data block containing multiple pieces of data. And it executes an efficient 1-out-of-n OT protocol with the client to obtain the target. In this process, the 1-out-ofn OT protocol makes the server unable to distinguish which keyword the client is searching for. The client also does not know the server's other messages except for the searched keyword. This protocol protects the privacy of the client and server simultaneously. Besides, the shuffling is performed after each search, which makes the row position of the keyword in the incidence matrix change, and it also can convert the deterministic token into a random token (explicit search pattern). The adversary cannot launch an attack by analyzing the search frequency.
(2) If the client searches for the same keyword, it always returns the same size file. The adversary can launch an attack by analyzing the response length. Th-erefore, this paper utilizes a differential privacy strategy based on random response to hide the response length (implicit search pattern).

System overview
The design goal of this system is to hide the search pattern and the access pattern of the searchable encryption scheme. Search and update are two core operations in the system.

Hiding the access pattern
The access pattern refers to the user's access path, where a repeated query is easily identified by the same access path. When the client sends a search request to the server, repeated searches also lead to the disclosure of the access pattern. The attacker can track the access path to obtain the query keyword and the information of the incidence matrix.
It is challenging to hide the access pattern while significantly reduce the computational and communicational cost of the searchable encryption scheme. The existing schemes [20,42] usually use the "fetch-decryptreencrypt-upload" strategy to hide the access pattern, but it causes high communication and computation overhead. The proposed scheme only uploads the confusion matrix to the server, and the server performs homomorphic calculation between the confusion matrix and the incidence matrix. The shuffling process is divided into two stages: shuffling and homomorphic decryption, which is shown in Fig. 2, it notes that the yellow lock indicates homomorphic encryption, and the blue lock indicates the symmetric encryption.
The specific procedure of shuffling is as follows. On the left in Fig. 2, the client calculates the confusion matrix based on the permutation matrix and the diagonal matrix, and the confusion matrix is encrypted with the Paillier pk. Here, some formulas are given to facilitate the calculation of the confusion matrix.
(1) Matrix-based data shuffling: Given a data sequence B = (B 1 , ..., B n ) and a n × n permutation matrix π, the position of the data block is changed by B ·π.
For example: the blocks in B = B 1 B 2 can be changed with the permutation matrix π = 01 10 : (2) Matrix-based data scaling: Give a data sequence B = (B 1 , ..., B n ) and a n × n diagonal matrix C. The , it can be obtained: (3) Based on the formulas (2) (3), the confusion matrix can be obtained: (4) The client encrypts the confusion matrix with the Paillier public-key pk: On the right in Fig. 2, the server performs the homomorphic calculation between the encrypted confusion matrix and the incidence matrix to obtain the shuffled incidence matrix.
The row position in the incidence matrix is changed when the shuffling phase is over. Homomorphic decryption is performed subsequently. The incidence matrix on the server is encrypted with homomorphic encryption after shuffling. It can be seen from the property of Paillier encryption that two parties involved in the calculation are homomorphic encryption and non-homomorphic encryption. Therefore, to facilitate the next shuffle operation, the server needs to decrypt the incidence matrix with sk of the Paillier before the next search is performed. It should be noted that the client generates different public/private key pairs (pk, sk) of Paillier to resist malicious attacks by the server.
The server performs homomorphic decryption with sk to get the symmetric encrypted incidence matrix.

Hiding the search pattern
Simon et al. [9] pointed out that the search pattern can be divided into the explicit and the implicit. The explicit means searching for the same keyword always generates the same deterministic token, while the implicit means searching for the same keyword always returns the result set of the same size. Attackers can perform query recovery attacks using the query volume and the frequency leakages. Therefore, both the explicit and the implicit search patterns are hidden in this paper. The diagram of the hidden search pattern is shown in Fig.  3. The group query is performed in this paper. It should be noted that if similar semantic keywords are grouped, the adversary can use similar semantics to infer the relationship between the keywords, causing a partial privacy leakage. So, the secure index is constructed where the location of keywords and files is determined by the pseudo-random function to ensure the randomness of the storage location. In this case, the attacker cannot infer the relationship between the keywords in the subsequent group queries.
For search, the two-level map and the efficient 1-outof-n OT protocol are utilized to hide the search pattern.
However, the adversary can still guess the search target by the response length. Therefore, the random response is used to further hide the response pattern (implicit). Also, after the search, the server performs the shuffling operation (shown in Fig. 2) to change the row position of the incidence matrix. Since the search token is related to the position of the keyword, the shuffling also converts the deterministic search token into a random search token.
The client utilizes the pseudo-random function and hash to randomly locate the keywords before performing the group query. The search process is as follows. The client obtains the line number of the search keyword according to the dictionary D and calculates the block number l to which the keyword belongs. Then, the client's selection σ is combined to generate a search token and send it to the server. Once the server receives the search token, it retrieves the two-level map Ω to obtain a row number group of size ν according to the search token, and then retrieves the incidence matrix according to the row number group to obtain a data block of size ν. After that, the server and the client execute an efficient 1-out-of-n OT protocol. The server returns the search results to the client for decryption. The client obtains the file identifier set containing the searched keyword. Then, to hide the response length (implicit), the random response is utilized to randomly perturb the set of file identifiers, and the file set containing virtual items from the server is returned.
After the search, the incidence matrix needs to be shuffled with the shuffling algorithm to generate the random search token. The two-level map is constructed through the following three steps: a) The dictionary D is partitioned into ν-blocks I 1 , ..., I t and I t is padded up to ν elements if necessary.
b) The block number is taken as the key of the M w , and the value corresponding to the key is the starting incidence of each data block in the array A.
c) Store the row number of the incidence matrix in the array A. 5. Address map table M f (s id , j). For the update, the server determines the column j where the update file is located according to the update token.

Construction of OSM-DSSE
In this section, the OSM-DSSE scheme is defined, consisting of the four algorithms (Setup, Search, pathShuf f ing, U pdate) presented in the subsections below.
Firstly, the client generates public parameters by DSSE.KeyGen. These parameters include the symmetric key K F to encrypt files and the key K I to encrypt index.
Secondly, the client constructs a secure random incidence matrix by DSSE.BuildIndex. As shown in Algorithm 1, the client generates a random key k 2 . The client extracts the keyword set W = (w 1 , ..., w m ) from the file set F = {f 1 , ..., f n } (each file has a unique identifier (id 1 , ..., id n )). The positions of each keyword and file in the incidence matrix are determined by the pseudorandom function G and the hash tables T f and T w .
Then, the client divides the keyword set W into data blocks of size ν. The last data block is padded up to ν elements if necessary, and it is numbered as (1, 2, ..., m / ν ). The client constructs a two-level map Ω (M w , A) and encrypts the key of M w . Simultaneously, the client constructs the dictionary D according to the T w (s wi , x i ) and the M f according to the T f s idj , y j , st .
Lastly, the client encrypts the file by ε.Enc, and sends the secure random incidence matrix I , encrypted file C, two-level map Ω, and address map table M f to the server. Meanwhile, the client saves δ locally.

Algorithm 1: BuildIndex
Input: files and keywords collection (F, W ) Output: secure random incidence matrix I

Search
• R ← Srch (τ ) : Input search token and get response results.
The client generates the search token τ ← (l||k 3 , y) by DSSE.SrchT oken. Where (l||k 3 ) is the encrypted block number and y is the client's selection that contains search target. The client obtains the row number x i by x i ← D [w i ] and calculates y = g r h σ mod p, where r (r < p) is a random value.
The server performs group queries according to the search token as follows: Firstly, the server parses τ ← (l||k 3 , y) and queries the M w (l||k 3 , i) of Ω (M w , A) to obtain the starting position of the keyword in the array A, and then sequentially searches A to obtain the row = (r i , ..., r i+v ). Secondly, the server searches the encrypted incidence matrix according to row = (r i , ..., r i+v ) and obtains a data block B = (b 1 , .., b ν ) of size ν, which is symmetrically encrypted. Then, for this result set, the server and the client execute the efficient 1-out-of-n OT protocol, where the value of n is to ν. The server calculates ((α 1 , c 1 ) , ..., (α ν , c ν )), where α j = g kj mod p, k j ∈ R Z * p , j = 1, ..., ν , α ν is the auxiliary parameter, and c ν is a ciphertext sent by the server to the client.
Finally, the client decrypts the ciphertext. The client performs the first decryption ε (m σ ) = c σ / α σ to obtain the result of symmetric encryption. Then, the client performs the second decryption with the initial row number r i and the key K I to obtain the file identifiers containing the keyword to be queried, i.e., Υ w ← ε (m σ ) ⊕ H (K I ||r i ), where Υ w = (0, 1) n indicates the result vector. At this time, the random response is used to fill the result vector randomly and perturb the data (refer to Fig. 4), shown in Algorithm 2. The size of the result file set returned each time is different. The server returns the file sets containing dummy items to the client based on the result vector after perturbation. After getting the encrypted data, the client decrypts the data using the key K F to obtain the data that satisfies the query.

PathShuffling
• I ← P athShuf f ling (I ) : Input the incidence matrix to be shuffled, and output the new incidence matrix after shuffling.
After the search, the server executes the path shuffling, as shown in Fig. 2. The client constructs and encrypts the confusion matrix with Paillier, then uploads it to the server. The server performs the homomorphic calculation. This process can change the access path to hide the search pattern and access pattern.
First, the client constructs the permutation matrix P and the diagonal matrix Q. The dot product operation is performed on P and Q to form the confusion matrix M . The realization of P is as follows. π i = i represents a random permutation function, where i = 1, ..., 2n. This function generates a permutation π by randomly and uniformly selected items from the set {1, ..., 2n}. Let p i,j denotes the value of the i-th row and the j-th column in the P , and P can be ex- The client generates a public and private key pair (pk, sk) by P E.Gen, and encrypts the confusion matrix M with pk, i.e., M = P E.Enc pk (M ). Then, the client sends M to the server. The server performs homomorphic calculation between the M and I to obtain the shuffled incidence matrix I, namely I ← M I . Since the Paillier encryption satisfies semantic security, the same plaintext can generate different ciphertexts. The result is homomorphical encryption after performing the homomorphic calculation. However, this calculation is not conducive to the next data shuffling. So, the client sends the sk to the server for homomorphic decryption before the next search, i.e., I ← P E.Dec sk I , where I represents the result of symmetric encryption. After shuffling, the position of the keyword in the incidence matrix will be changed. To correctly obtain the row number of the next keyword for search, the client needs to update the dictionary D. Moreover, the client should generate the different public and private key pairs to ensure the server cannot decrypt the confusion matrix.

Update
• (C , I ) ← U pda (τ U ) : Input the update token, and output the updated encrypted file and incidence matrix.
The update operation needs an interaction between the client and the server, and it contains add and delete operations. An update token is generated by DSSE.U pd T oken to perform updates by the server. It should be noted that the proposed solution will not reveal the update type, since both the add and delete operations are written back to the server.
For add operation. The client confirms the column j is to be added by T f and sets the status value to 1. The client extracts the keywords of the file and constructs a column matrix I according to the T w before it encrypts the file by ε.Enc. Then, the client sends (τ a , c ) to the server. The server utilizes the token to update the incidence matrix I , the address map table M f and the ciphertext C.
For delete operation. The client confirms the column j is deleted by T f and sets the status value to 0. The client constructs a column matrix I with all 0. Then, the client sends τ d to the server. The server utilizes the token to update the incidence matrix I , the address map table M f and the ciphertext C.

Security analysis
To prove the security of the above scheme, some definitions and theorems are given first. (1) I M Sim , C Sim ← SimStp (N, ID, < |c 1 | , ..., |c n | >): The simulator constructs an encrypted incidence matrix and an encrypted file with random values according to the information (N, ID, < |c 1 | , ..., |c n | >) lea-ked by the leakage function L stp (I , C). Then, the encrypted incidence matrix and file are sent to the adversary A.
The simulator randomly selects a keyword to simulate the search token τ s b,k according to the block number information l leaked by the leakage function L srch (I , D, Q). Then, the token is sent to the adversary A for the search.  The algorithm A satisfies ε-local differential privacy (ε ≥ 0) for any input x 1 , x 2 , it holds: Where y ∈ Opt (A) and ε represents the privacy budget ε = ln 1 + 1−q pq .
Corollary 1 Given the perturbation probabilities p and q of random response, the proposed scheme satisfies εdiffer-ential privacy, where ε = ln 1 + 1−q pq .
Proof. It is assumed: Where b is input, and b is the output of random response.
According to the definition of differential privacy, the two inputs b 1 , b 2 with only one different bit lead to the same output result of b . Assuming b 1 , b 2 differ in location i and location j, it holds , Theorem 1 Adaptive Semantic Security. Suppose that Σ OSM = (Setup, Search, pathShuf f ling, U pdate) is an interactive sch-eme based on an incidence matrix that hides the search pattern and the access pattern, and λ ∈ N is a security parameter. There is a leakage function L = (L stp , L srch , L upd ) for any PPT stateful adversary A that issues a polynomial query q, and there is a stateful simulator S = (SimStp, SimSrch, SimU pd) so that: It can be proven that Σ OSM exhibits adaptive semantic security under L.
Proof. For all PPT adversary A, the difference between the output probability of the Real Σ OSM A (λ) and Ideal Σ OSM A,S,L (λ) given in Theorem 1 is a negligible value. A series of five games are defined to prove the security (refer to the Appendix for details). The first game is a real experiment, and the last game is an ideal experiment. Also, the success event of each game is defined, where Game i stands for the event that the opponent correctly guesses the challenge bit b, and the Pr [Game i = 1] represents the probability of the succeeding adversary attacks. The security is proven by the progressive relationship of the related games, and the full proof is provided in the Appendix. Server storage: The server maintains the incidence matrix I , a two-level map Ω, and an address map table M f . The incidence matrix is a m×n-dimensional matrix with a storage cost of O (m * n). The two-level map Ω consists of two parts: an address map table M w and an array A. Storage of M w is proportional to the number of blocks. Assuming that the data are divided into t blocks, the storage cost of M w is O (t). The size of array A is related to the number of rows of the incidence matrix, and the storage cost is O (m). The storage cost of M f is related to the number of files, and the storage cost is O (n). Therefore, the total storage cost of the server is O (m · n + m + n + t).

Communication overhead
In the setup phase, the client sends the encrypted incidence matrix and the encrypted file to the server. The communication overhead is O (m * n + nc i ), where m × n is the size of encrypted incidence matrix and c i is the size of each encrypted file.
In the search phase, the client sends the m × m confusion matrix to the server, and the communication overhead is O m 2 . The server returns an encrypted data block of size ν with a communication overhead of O (ν).
In the update phase, the client sends the m × 1 column matrix to the server, and the communication overhead is O (m).

Computational overhead
Client computational overhead. The client mainly generates a permutation matrix, a diagonal matrix, and an encrypted confusion matrix. Both the permutation matrix and the diagonal matrix are m × m dimensions, so the confusion matrix is of dimension m×m. In addition to the permutation matrix and the diagonal matrix, there are m pieces of data that are not 0. The remaining numbers are all 0, and the computational cost of generating 0 is negligible. So, the computational cost of generating the permutation matrix and the diagonal matrix is O (m).
Server computational overhead. The server needs to re-encrypt data of size ν and perform 2ν times of modular exponentiation operation when executes the 1-outof-n OT protocol. The server mainly performs the homomorphic calculation between the confusion matrix and the incidence matrix. The size of the confusion matrix is m × m, and the target matrix has m rows. So, the computational cost is O m 3 .

Experiment preparation
The proposed OSM-DSSE scheme is evaluated in a real network environment and system setting. For search operation, a round of interaction is defined as client − >server − >client, which means that a search request is sent from the client to the server, and the data block is then downloaded from the server to the client . For update operation, a round of interaction is defined as client − >server, indicating that the client sends an update request to the server.
The hardware of the client and the server are configured as follows. The hardware configurations of client are Intel Core i5-8400 CPU @ 2.80 Hz, 16 GB RAM, 256 GB hard disk, and 1TB SSD. Besides, the client runs an operating system of Windows 10 64 bit. The hardware configurations of the server are 32 CPUs @ 2.70 GHz and 512 GB RAM. And the operating system of the server is CentOS 7.2 64-bit.
The Google sparse hash is used to realize the hash table T f and T w , and the hash tables are saved on the client. The file and the incidence matrix are preencrypted with the IND-CPA and sent to the server.
The online public dataset Enron [45] (mail dataset) is taken as the experiment dataset. The dataset contains data from approximately 150 users, and the corpus contains a total of about 500,000 messages. In the experiment, the emails of the 150 users are used. Since most of the emails are personal, they capture informal conversations between two individuals. Therefore, a stemming algorithm, namely the Porter Stemming Algorithm [46], is used to find each word's root in the document set and delete the most common words such as 'the', 'a', and 'from' to extract keyword sets from the corpus. For comparison, 300,000 files and 300,000 keywords are selected to construct an encrypted incidence matrix of different sizes (the largest incidence matrix has 9 × 10 10 keyword-file pairs).

Experimental results
In the experiment, the performance of search and update operations of the proposed scheme is evaluated and compared with existing schemes.
The time for creating an incidence matrix of different sizes is evaluated to illustrate how the size of the dataset influences the construction time of incidence matrix. As shown in Fig. 5, the construction time is 10.114 s for an incidence matrix of 10 3 × 10 3 , When the size of the incidence matrix exceeds 10 3 × 10 3 , the time to construct the encrypted incidence matrix increases  The relation between confusion metric and block size rapidly. For example, it takes approximately 20 minutes to construct a 10 4 ×10 4 incidence matrix with 10 8 data.
Since the incidence matrix is only constructed once during the setup phase, the relationship among the search, update time, and the size of the incidence matrix is mainly investigated.
Then the size of the data block is evaluated. For search operation, the server returns a data block to the client. A data block is composed of multiple rows of the incidence matrix. The size of the data block not only affects the search performance but also influences the hiding effect of the search pattern. Therefore, it is essential to choose a suitable block size. Here, the confusion metric is defined.

Definition 4
The confusion metric Ψ is defined as the probability that the server guesses the target item when  For the data block size, a multiple of 10 rows in [10,100] is selected as the experimental data to evaluate the confusion metric. It can be seen from Fig. 6 that the probability of the server correctly guessing the target item decreases along with the size of the data block increases, which shows a better hiding effect of the search pattern. Also, the confusion curve tends to be flat when the size of the data block exceeds 30 rows.
The response time of different data block sizes is evaluated. As shown in Fig. 7, the response time is 0.76s, 1.51s, and 2.32s for data blocks with the size of 10 rows, 20 rows, and 30 rows, respectively.
To obtain the optimal block size that contributes to both good confusion metric and response time, the response time is normalized to a range of [0, 1]. Fig. 8 (a) Fix q=0.1 (b) Fix p=0.9 Fig. 9 The effect of returned result size on recall   Recall. the recall ξ = |Υ | |Υ | is defined, where Υ is the set of true results; Υ is the result set after adding noise, and || is the number of 1 in the result set. According to Algorithm 2, probability p represents the probability of 1 in the result set, and q represents the probability of adding noise 1 to the confusion result set. Therefore, the impact of different p and q on the recall is evaluated, and the result is shown in Fig. 9. It can be seen from Fig.  9 (a) that for a fixed q, the larger p usually corresponds to a higher recall. For example, when p=0.9 and q=0.1, the value of recall exceeds 0.9.
As shown in Table 1, the proposed scheme OSM-DSSE is compared with some exiting schemes in terms of storage overhead, communication overhead, and the ability to hide search and access patterns. The overhead of all schemes is measured on average. For the serverside storage, only the size of the encrypted incidence is considered. m and n respectively denote the maximum number of keywords and files. k represents the number of servers, ν represents the size of the data block, p represents the number of processors.
The IM-DSSE is a traditional DSSE scheme that leakages the search pattern and access pattern. The ODSE employs multi-server PIR and Write-Only ORAM to hide the access pattern. The DOD-DSSE leverages two non-colluding servers to realize the "fetch-reencryptswap" strategy, so that the data structure-access pattern can be hided. Compared with the above schemes, the proposed scheme not only achie-ves a low storage and communication overhead, but also hides the search pattern and access pattern.
Finally, the performance of search and update operations of the proposed scheme is compared with that of DOD-DSSE [20] and ODSE [21] under different incidence matrix sizes, and the results are shown in Fig.  10 and Fig. 11. The DOD-DSSE leverages two noncolluding servers and exploits the properties of an incidence matrix to avoid information leakages. The ODSE harnesses the Write-Only ORAM for update operation and multi-server PIR for search operation, achieving a low end-to-end delay and good Infor-mation-theoretic security. It sho-uld be noted that although Paillier is used to achieve shuffling in the proposed scheme, the shuffling operation is performed after the search, which does not affect the search performance. It can be seen from Fig. 10

Conclusion
This article proposes a searchable encryption scheme na-med OSM-DSSE to hide the search and access patterns. An effective shuffling algorithm based on Paillier is proposed to shuffle the incidence matrix, so that the position of the row in the incidence matrix is changed. This scheme combines the 1-out-of-n OT protocol and the differential privacy strategy based on random response to realize random data access. Besides, the security of the proposed scheme is formally analyzed, showing that the proposed scheme provides an adaptive semantic security that can against selective adversaries. Furthermore, the OSM-DSSE achieves approximately 3-4x execution speed than existing schemes. In the future, the optimal block size will be investigated and the scenarios with different security levels will be updated.
Acknowledgements This work was supported by the Natural Science Foundation of Chongqing (Grant.cstc2018jcyjAX0 510). The authors thank TopEdit (www.topeditsci.com) for its linguistic assistance during the preparation of this manuscr ipt.

Compliance with ethical standards
Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

A Appendix
Here, the Real Σ OSM A and Ideal Σ OSM A,,L,S games for the semantic security described in Theorem 1 are introduced.
Game 0 (Algorithm 3): it is the same as Real Σ OSM A . At the beginning of the game, the adversary selects two sets of plaintext files of the same length and sends them to the challenger (Game 0 : line 1). The challenger decides the encrypted file set by tossing a coin (Game 0 : lines 2-3). The adversary outputs the keyword and file id (Game 0 : lines 4-5) for the next search or performs update operations based on previous learning results. If op=search, the adversary generates a search token for searching. Then, the server reshuffles the path and change the next access path (Game 0 : lines [6][7][8][9]. If op=update, the adversary generates an updated token Algorithm 5: Game 2 (Algorithm 4): it is the same as Game 1 , except that the determination function for generating the search or update token is replaced by a random function (Game 2 : lines 8, 13). The random function is truly random and secure, and the output value is indistinguishable from the output value of the hash function (pseudo-random function). So, Pr [Game 2 = 1] − Pr [Game 1 = 1] ≤ negl (λ).
Moreover, since the 1-out-of-n OT protocol performed in search is based on the difficult problem of DDH (Game 2 : lines 9), the client's choice is unconditionally secure. For any σ , there is r that satisfies y = g r h σ . The client hides its choice in the token sent to the server by introducing a random number. So, the server cannot obtain any information about the client's choice from the token.
Game 3 : it is the same as Game 2 , except that the values used for homomorphic calculation in the shuffle phase are replaced with other randomly selected values for calculation.
In this scheme, from the perspective of the server, the path shuffling algorithm invokes two parts: the confusion matrix M of homomorphic encryption and the incidence matrix I of symmetric encryption. The confusion matrix is composed of a permutation matrix P and a diagonal matrix Q that are randomly selected by the client. So, P and Q are not visible to the server. To declare the security of this part, the following theorem is given.
Theorem 2 Even if the encrypted confusion matrix M is given, the server cannot infer the permutation matrix P and the diagonal matrix Q.
Proof : The security of the confusion matrix M is based on the semantic security of Paillier encryption. The confusion matrix M does not reveal any information about P and Q. There are multiple choices of P and Q to generate the same confusion matrix M . These choices are not visible to the server, so the server cannot recognize the correct P and Q. The randomness of the matrix selection ensures that the server cannot correctly infer the true values of the P and Q, so the uploaded confusion matrix is safe.
For the server-side incidence matrix I , symmetric encryption is performed to meet the IND-CPA security standards.
According to theorem 2 and the above analysis on security, both parties involved in a homomorphic calculation are secure. At the same time, according to the homomorphic properties of the Paillier encryption system (i.e., any calculation performed by the homomorphic operation can protect the privacy of the original data and the calculation result), the calculation result is also secure, and the server cannot correctly distinguish the real confusion matrix from the randomly generated confusion matrix. So, the equation Pr [Game 3 = 1] − Pr [Game 2 = 1] ≤ negl (λ) is obtained.
Game 4 (Algorithm 5): it is the same as Game 3 , except that the output of the setup, search and update phases are replaced by the output of the simulator SimStp (·), SimSrch (·) and SimU pd (·) ( Game 4 : lines 3,9,14). According to the above analysis, the output of the simulator and Game 3 is indistinguishable, so Pr [Game 4 = 1]−Pr [Game 3 = 1] ≤ negl (λ) can be obtained. Because Game 4 is a game under the ideal experiment, Game 3 is indistinguishable from the ideal experiment.
Through the above games that include Game 0 , Game 1 , Game 2 , Game 3 , and Game 4 , it can be obtained that Pr Real Σ OSM A (λ) = 1 − Pr Ideal Σ OSM A,S,L (λ) = 1 ≤ negl (λ) Therefore, the scheme proposed in this paper provides adaptive semantic security.