Protecting security-sensitive data using program transformation and trusted execution environment

Cloud computing allows clients to upload their sensitive data to the public cloud and perform sensitive computations in those untrusted areas, which drives to possible violations of the confidentiality of client sensitive data. Utilizing Trusted Execution Environments (TEEs) to protect data confidentiality from other software is an effective solution. TEE is supported by different platforms, such as Intel’s Software Guard Extension (SGX). SGX provides a TEE, called an enclave, which can be used to protect the integrity of the code and the confidentiality of data. Some efforts have proposed different solutions in order to isolate the execution of security-sensitive code from the rest of the application. Unlike our previous work, CFHider, a hardware-assisted method that aimed to protect only the confidentiality of control flow of applications, in this study, we develop a new approach for partitioning applications into security-sensitive code to be run in the trusted execution setting and cleartext code to be run in the public cloud setting. Our approach leverages program transformation and TEE to hide security-sensitive data of the code. We describe our proposed solution by combining the partitioning technique, program transformation, and TEEs to protect the execution of security-sensitive data of applications. Some former works have shown that most applications can run in their entirety inside trusted areas such as SGX enclaves, and that leads to a large Trusted Computing Base (TCB). Instead, we analyze three case studies, in which we partition real Java applications and employ the SGX enclave to protect the execution of sensitive statements, therefore reducing the TCB. We also showed the advantages of the proposed solution and demonstrated how the confidentiality of security-sensitive data is protected.


Introduction
Applications have grown enormously in the public cloud, the total number of applications developed over the cloud has raised intensely over the past few years. However, the security problems that threaten the public cloud are very serious. This threat poses a significant risk to application security in the public cloud. This also has a major impact on application security and privacy [1]. In general, the user's application is required to be uploaded to and performed on the public cloud. However, public clouds are not as enough protected as users imagine. Security violation incidents and vulnerabilities found by researchers [2,3,5] appear most commonly. As a result, this can lead to violations of the confidentiality and integrity of security-sensitive data. Revealed incidents including the loss of confidentiality or integrity of data [8,13] increase these concerns. Under such a circumstance, a key solution to protect cloud users' program confidentiality and integrity in the public cloud setting is required. One of the most important parts of the program is to protect the confidentiality of its sensitive data, ________________________________________________ *Correspondence: ywang@park.edu 2 Park University, Parkville, MO, USA Full list of authors' information is available at the end of the article.
which determines the important component in the program that must be protected its data against unintentional, unlawful, or unauthorized access, disclosure, or theft.
To address this concern, server works such as [4,6] have utilized different Trusted Execution Environments (TEEs) technologies to protect security-sensitive data in applications. In comparison to a cryptographic co-processor, the TEEs are an execution environment from the rest of the applications using the hardware abilities of the platform. Moreover, TEEs protect their data from being accessed from outside the TEEs. The code in TEE is known as a trusted code while the other code is considered an untrusted code. Even though TEE provides security guarantees against strong attacks, few applications employ this technology. One common approach to protect the confidentiality of the security-sensitive data in the code is to annotate some variables (sources) by the developer, which is considered to be very helpful in protecting the confidentiality of the program. Therefore, several studies have investigated protecting data confidentiality in order to achieve a sufficient program's confidentiality protections. The sensitive data in the program including the set of security-sensitive functions, global variables, and local ones are a significant component of the program that needs first to be protected. The work in [7] aimed to extract the control flow and deploy it into a trusted environment. Some other efforts on protecting control flow and data flow confidentiality mainly leverage program transformation or distributed architectures. The results from the previous studies have limitations in the aspect of security [7,9,51]. Therefore, current works in this track either failed to grant high confidentiality guarantees or incurred high-performance overheads. We claim that our proposed solution is suitable for most TEEs. Despite all those different technologies, we plan to implement the proposed solution based on Intel's Software Guard eXtension (SGX) technology [10,11] due to its novelty and popularity in its field.
In this paper, we present and analyze our approach based on a fresh direction for securing applications using trusted execution technologies offered by modern CPUs such as SGX. SGX provides a TEE, called an enclave, that protects the integrity of the code and the confidentiality of the data inside it from other software, including the operating system and hypervisor. This paper provides a novel approach to program partitioning for TEE-secured applications. It describes the architecture of the proposed solution and the different phases that lead to the partitioning. In general, our approach can be used for partitioning critical Java applications into security-sensitive code to be run in a trusted execution environment and cleartext code to be run in the public cloud setting. It uses a case study a binary search application to validate the proposed solution. The results of the experimental verification are shown using concrete examples to show how the confidentiality of securitysensitive data is protected.
Our goal in this paper is to propose a security solution that is compatible with all TEE systems and applicable to most Java applications. Our proposed solution analyzes, partitions, and transforms existing Java applications for deployment of the security-sensitive parts and performs the necessary computations in a trusted area such as an SGX enclave.
In general, our proposed solution goes through four main stages as follows. I.
Data Annotation Stage. In this stage, a developer first annotates interest variables in the source code of a Java application that contains security-sensitive data and whose confidentiality should be protected. In other words, the developer provides information about the sources (inputs) of sensitive data by annotating variables whose values must be protected in terms of confidentiality. II.
Data Analysis Stage. Based on the annotation stage, our approach will use static program analysis to find data and control dependencies on security-sensitive data. Our approach will also use static forward slicing to observe a sub-graph with all statements in the program dependence graph (PDG) [12] on which statements in source annotated contain a control and data dependence.

III.
Program Partitioning Stage. Based on stages (1) and (2), our solution will generate the partition details (PD) that will define the set of security-sensitive functions and the set of security-sensitive variables. It will also define which part of the code must be placed inside the enclave to protect the confidentiality of its data. PD will also define the transformed program (untrusted code) that will be performed in the public cloud while the sensitive data will be transmitted to an SGX enclave. IV.
Code Generation Stage. In this stage, our solution will demonstrate the computations of the security-sensitive statements inside the enclave based on the output of PD. Moreover, this stage shows how we will return data from the enclave to the user environment. We also show how our approach will react with the security-sensitive and insensitive data that will be deployed to the trusted and untrusted areas, respectively. Our contributions can be summarized as follows. • We propose a general solution, that protects the confidentiality of sensitive data on most user-level programs that can be performed on TEE systems such as SGX-supported CPU. • We analyzed our proposed solution using concrete examples to show how the confidentiality of securitysensitive variables and functions is protected. • In our case studies, we leverage the program analysis, program partitioning, and SGX technology to hide only the security-sensitive statements of Java code inside an SGX enclave.

Paper Organization
The rest of this paper is organized as follows. In section 2, we give a brief background on TEE systems, SGX technology, and the trusted execution environment. Section 3 introduces the system design of this work. In Section 4, we discuss three case studies that can be applied to our proposed system. Section 5 describes the proposed implementation. In Section 6, we compare the proposed system to the two most related works in the field. Section 7 provides related work. The last section concludes this paper and discusses future work.

Trusted execution environment (TEE)
There are hardware-based solutions such as Intel SGX, ARM TrustZone [40], and software-only approaches, e.g., Virtual Ghost [25] and SKEE [41]. Software-based approaches apply compiler instrumentation or kernel deprivileging to isolate the TEE memory from the kernel memory. TEE provides secure execution of permitted software called Trusted Applications (TAs). The TA is composed of TEE Commands that cooperatively offer secure services to the TA's clients; meanwhile, it forces confidentiality, integrity and access rights to the code, data, and resources. Each TA is isolated and protected against illegitimate access from other TAs, providing an ecosystem of application vendors. TEE system such as ARM TrustZone technology offers a system-wide security solution, partitioning the hardware and software resources, therefore, they reside in one of two scenarios, secure scenario for the security subsystem and normal one for everything else. Several Android applications utilize this technology because of the standardization's lack. This subject is addressed by Global Platform [42,43] that established the standard for managing applications on secure chip technology and a set of specifications for the TEE system architecture.

Intel SGX
Intel SGX grants developers to move their sensitive parts of applications into a protected execution environment, called an enclave, to protect the code confidentiality and data integrity. Code and data of the enclave live in a protected memory region (i.e., the enclave) page cache (EPC). Only the code of application executing inside the enclave is authorized to access the EPC. The confidentiality of enclave memory is secured by transparent memory encryption achieved by the CPU. Enclave calls (ecalls) can be used to enter an enclave and outside calls (ocalls) can be used to call out of the enclave. Therefore, any interaction between the enclave and the OS via system calls, such as network, must execute outside of the enclave. SGX supports local attestation mechanisms which allow an enclave to prove to another enclave that it has a particular digest and runs on the same processor. This privileged mechanism enables the deployment of enclaves that support remote attestation.
In Intel SGX, the size of the TCB contains the enclave code and trusted hardware. Thus, only some portions of an application that require access to sensitive data should be implemented inside the enclave. Some studies [4,31] have resulted that increasing the code size leads to increasing the number of software bugs. As a result, increasing potential security vulnerabilities. To overcome this problem, it is important to minimize the size of the TCB. However, some factors impact the security of enclave data and code such as the complexity of the enclave interface. For instance, the security-sensitive code inside the enclave needs to interact with the non-enclave environment to call or return some data from/to the enclave.

The Security Model
The security objective is to protect the confidentiality of sensitive statements in the untrusted area, preventing an attacker from reading or modifying the stored sensitive data. To this end, we assume the attackers are interested in obtaining the sensitive data of the program uploaded by the user, i.e., compromising the data confidentiality. However, the attackers are not interested in compromising computation integrity, such as tampering with the computation results. For the environment setting, we assume that the user's zone is free of attacks. However, the public cloud is untrusted. On the public cloud, we assume the processors support SGX. Yet the software stacks on the public cloud host, such as the hypervisor and the OS, are untrusted.
To facilitate our description, we call the enclaves the trusted area and call the software stacks on the public cloud the untrusted area. The attackers can be outside attackers, malicious and cloud vendor employees, or malicious users who are co-hosted with benign cloud users. We do not have special restrictions on the programs to be protected. As long as the program itself does not reveal its sensitive data intentionally (e.g., explicitly printing out the annotated data or other sensitive information), our solution will work well.

Architecture
The architecture of our proposed solution is shown in Fig. 1. For the original Java program P that the user aims to perform on the public cloud, our proposed solution must know which data is security-sensitive in the code of P. Therefore, developers are required to annotate at least one variable in the program (stage 1 in Fig. 1) to provide cues to the partitioning phase. Once the source(s) (variable(s)) are marked in the program, our proposed solution will perform static dataflow analysis (stage 2 in Fig. 1). Based on the output of the dataflow analysis, the proposed solution will use the PDG to performs forward slicing to isolate the security-sensitive data from the code in the program, we call this process the Partition Details (PD). PD defines which part of the code must be protected by the enclave. In other words, it will partition the original program P into a transformed program PT and the Sensitive Extracted Data Matrix (SEDM). The latter includes all security-sensitive variables and functions (stage 3 in Fig. 1). After the partitioning, PT will be uploaded to and performed in the public cloud (i.e., non-enclave area). SEDM will be transmitted to and executed in an SGX enclave. Notice that the user necessarily needs to transmit the SEDM to the enclave in an encrypted manner marked as E(SEDM). In the enclave, we perform necessary computations for all securitysensitive statements inside the enclave based on SEDM (stage 4 in Fig. 1). We provide further discussion about each stage in the following section.

System design
In this section, we present our proposed solution, a new approach for securing applications using TEEs. This solution is built upon hiding the security-sensitive data of applications in terms of code confidentiality. Our approach starts with static dataflow analysis supported by data annotation. Then, we will classify the annotated statements and capture a bunch of the statements that will generate a secure partition to be deployed to an SGX enclave. For the partitioning goal, we will define all the statements that deliver confidential data from a certain variable to another one in a given context across reachable paths. To this end, we will apply static dataflow analysis and expand it to accurately capture contextual information by annotating statements that propagate variables from sources to sinks. We will follow standard dataflow analysis algorithms in [44] and [45] to capture sensitive information for a propagating variable statement in a tag t < source, successor >, where the source is an incoming security-sensitive variable (predecessor flow), a successor is a security-sensitive variable propagating further (successor flow). The four stages of the proposed solution can be explained as follows.

Data annotation stage
In this stage, our proposed solution must know which variables are security-sensitive in the program. In other words, the developer should provide information about the source(s) of security-sensitive data by annotating variables whose values must be protected in terms of confidentiality. These annotated variables are marked as SA.
To clearly understand how a developer marks securitysensitive data in a program, we consider a piece of Java code in Fig. 2. Only variable x at line 3 is annotated as a securitysensitive variable, indicating that all the variables at line 6,9,11,16,20 and 21 become sensitive variables due to the information flow from the annotated variable x to those statements. Therefore, the annotated variable and all its related security-sensitive statements must be stored and executed inside a special SGX enclave.
Although there is information flow from variable z to c at line 13, it is considered as normal data, because neither variable z nor c has an interaction with the annotated variable x. Meaning that variables z and c are cleartext. Thus, they will be executed in the untrusted area including all other statements that have no interactions with the annotated variable. int z= 2; 6.
if (x < y) // Sensitive statement -Marked by the algorithm 10.

Data analysis stage
Based on the annotated source(s) code SA, our approach will distinguish a part of the code that will be considered as a sensitive statement from the one that is considered as insensitive ones. Therefore, the data analysis stage will identify all security-sensitive statements in the program that possess dependencies on the set of all annotated statements SA. The proposed solution will use static dataflow analysis to examine all security-sensitive statements. Static dataflow analysis is workload independent and therefore conservative decisions must be made about dependencies.
To extract all security-sensitive variables in a Java code, we will transform the original code into another representation. Based on the standard approaches' analysis, we will also use the standard PDG. Where in the standard PDG, vertices represent statements and edges are both data and control dependencies between statements. Therefore, we will use a partition technique mainly based on the graph reachability problem over the PDG. PDGs are considered efficient representations for program partitioning [46]. The program slicing technique was instructed as a sequence of dataflow analysis problems. Using a standard dataflow analysis algorithm and the PDG, our approach will obtain the set of all security-sensitive statements as follows.
Firstly, in term of Static dataflow analysis and by given SA and PDG, our approach will use graph-reachability to observe a subgraph PC of PDG which contains all statements with a transitive control or data dependence on statements in PDG (i.e., vertices reachable from statements in SA via edges in PDG). For statements in SA that are annotated as securitysensitive data in the program, our approach will use an encryption method [47] to perform encryption on the sensitive data before placing it inside the enclave (i.e., E(SEDM)), see Fig 3. Secondly, given SA and PDG, our approach will use static forward slicing to observe a subgraph PF with all statements in PDG on which statements in SA contain a control /data dependence (i.e., all vertices from which statements in SA are reachable via PDG).
Thirdly, the set of all security-sensitive statements ST is taken by combining PC and PF As a result, our proposed solution constructs a new step, we call this step the partition details PD.

Program partitioning stage
In this stage, we define which part of the code must be placed inside the enclave to protect the confidentiality of data. Based on static program analysis, we will define the code that will be performed in the enclave. As a result, this will define the enclave boundary interface of the sliced code which includes ecall and ocall to the untrusted area. Our approach will construct the partition details (PD) from ST with the set of security-sensitive functions and variables, these sensitive data will be stored in the SEDM in order to transmit them to the enclave area in an encrypted manner. The PD contains all statements' functions and variables that include at least one variable in ST. It also includes the transformed program PT. Moreover, it provides a special function ecall to the nonenclave code to retrieve these security-sensitive variables when needed. Therefore, our proposed solution will generate a tuple for each security-sensitive statement inside the SEDM, marked as L(s). In General, PD contains two main components; SEDM, which will be deployed to the trusted area (i.e., the enclave), and the transformed program PT that the user aims to execute on the public cloud. The transformation will be achieved at the user's zone inside PD. In the whole process, the insensitive functions and variables remain in the user's zone or the untrusted area. Only the security-sensitive statements in the program will be transmitted to the enclave. As a result, this will create an enclave boundary interface that will establish all securitysensitive statements transmitted to enclave functions and perform all necessary computations inside the enclave and finally return the results outside the enclave (i.e., to the user environment). In general, our proposed solution will check each security-sensitive statement in the SEDM to know whether the statement is a function call, expression statement, or a control flow statement. In other words, we classify each security-sensitive statement in the SEDM into one of the following three types. Expression statement. For this kind of statement, a tuple will be created, recording some information about that statement. This means, during the transformation process, the proposed solution will replace each securitysensitive expression statement in the program with bracket (1), where bracket (1) includes two functions, i) the stmtextract() function and ii) the stmtreturn() function. The function stmtextract() will be used to extract all variables from each sensitive expression statement in the program and store them in the SEDM. For each statement in the stmtextract(), a tuple will be created, called L(stmt) represented by the bracket (2), records the statement id stmtid that will be used to pick up the proper statement during the execution of the program inside the enclave, statement type sttype that will be used to determine the type of statement (either expression or control flow statement), variable type vartype that determines the data type of each variable in the security-sensitive statement based on its index in Table 1, the left operand leftop, the right operand rightop, and the operator of the statement stmtop. The tuple L(stmt) can be seen in bracket (2).
) > (1) In Table 1, we assume different encoding for each primitive data type in Java (i.e., byte, short, int, long, float, double, boolean, char), plus the object type. After that, we indexed all the data types starting with '00' until '08'. Similarly, these indexes can be used to determine the data type of the returned value by a Java method, for instance, the '09' index can be used when the method does not return a value. In our solution, for each boolean type, we will convert true and false to 1 and 0, respectively. We will also use the hash code (an integer value) to represent each object type. The function stmtreturn() will be used to retrieve the return values of each statement that will be computed inside the enclave based on our scheme in Fig 4. For each statement inside the enclave, there will be return values; those return values will be generated based on the switch-case statement code in each function (i.e., stmtexp function and stmtcf function inside the enclave). A special function will be executed inside the enclave. The idea of the special function is as follows. Based on the statement id stmtid and statement type stmttype, it will look up the SEDM, identifying the proper tuple and choose the required variables from L(s) based on the variable type vartype and its index in Table 1, and then return the evaluation result of a certain statement to the user environment (see Fig 3). Based on bracket (1) and bracket (2), the proposed solution will store the sensitive statements of the simple code listed in Fig  2. Table 2 shows the stored sensitive statements of the simple program. The first column in Table 2 represents the statement id that will be generated sequentially for each securitysensitive statement in Fig 2; this means, we will generate a sequence unique number for each security-sensitive statement. The second column represents statement type; for each security-sensitive statement, we use 0 and 1 to assign the expression statement and the control flow statement, respectively. For instance, the statement at line 9 in Fig 2 is a control flow statement, thus we will encode it with 1 as it is shown in Table 2, where the other statements in Fig 2 are expression statements, thus we encode them all with 0. The third column represents the variable type vartype which stores the data type of each statement; therefore, we pick up the proper data type from Table 1 based on its definition in Fig  2. The fourth and fifth columns represent the left and right operand for each security-sensitive statement in Fig.2, respectively.
Notice that if the value of the left or the right operand is constant, we will store the actual value in the tuple, otherwise, we will retrieve its position from the corresponding array that will be generated inside the enclave (see Table 5). The last column in Table 2 stores the actual operator of each securitysensitive statement in Fig 2. Table 2 shows all securitysensitive statements in Fig 2, where the stmtid(0), stmtid(1), stmtid (2), stmtid (3), stmtid(4), stmtid (5), stmtid (6), represent lines 3,4,6,7,9, 11 and 14, respectively.

Control flow statement:
For this kind of statement, we will apply the same solution that will be used in the expression statements above. The control flow statement differs from the expression statement in that the value is stored in the statement type, where 1 and 0 indicate a control flow statement and the expression statement, respectively.

Function call statement:
In our proposed solution, we consider a function in a Java program as a security-sensitive function if its body or its definitions contain at least one statement in ST. In other words, any function that its body or its definitions contain a statement related to the annotated variable(s), will be considered as a sensitive function. For each sensitive function, we replace the sensitive function in a Java application with the bracket (3). Note that the function call statement differs from the expression statement and control flow statement in that the stored value in the statement type parameter. Where 2 indicates a function call statement, 0 and 1 indicate the expression statement and control flow statement, respectively. Where the function funextract() in bracket (3) will be used to extract the statement based on the fun(list) and funid. Note that the ( ) is nothing but bracket (4). The tuple in bracket (4) records the statement id (stmtid) which defines a Table 2 The sensitive expression statements and control flow statements of Fig.2  unique Identifier for each security-sensitive statement, the statement type (stmttype) which indicates the current statement type, the function id (funid) states a function modifier (funmodifire) defines the access type of unique Identifier for each security-sensitive function, the application), the function name (funname) which returns the string name function, the function type (fuctype) which returns the return type of the function, and finally the Parameter list (parm(list[])) which stores the list of the input parameters, preceded with their data types from the sensitive function and list them in a data matrix as is shown in bracket (4) (i.e., from which it can be accessed in Java).
In our solution, we will index the access modifiers parameter (funmodifire) for each sensitive method based on its access modifier in Table 3 and store the index in the enclave. The function fun (return ()) in bracket (5) will be used to read the return values that will be generated inside the enclave for each security-sensitive function based on its statement id and its function id. Note that, the list of the return value will be created inside the enclave. For each return value, a tuple will be created in the Sensitive Returned Data Matrix (SRDM) which can be used to store the returned values from the enclave to the user environment.
, , Meanwhile, we will encrypt the data matrix E(SRDM) before we send it to the user environment. In the user environment, we will decrypt the received data matrix D(E(SRDM)) during program execution and pick a proper value for each function based on its statement id and function id in the given tuple in the bracket (6). Table 3 records the return id (retid), the statement id (stmtid), the statement type (stmttype), function id (funid), and Table 3 The indexes of the access modifiers that will be used in our proposed solution.  (3) will be used to extract all information from the target function based on fun(list) (i.e., bracket (4)) and funid and then list all the information in SEDM. The function funreturn() in the bracket (5) will be used to read the return values that will be generated inside the enclave for each securitysensitive function based on its statement id, statement type, and function id. Note that, the list of the return value will be created inside the enclave. For each return value, a tuple will be created in the Sensitive Returned Data Matrix (SRDM) which can be used to store the returned values from the enclave to the user environment. At that time, we will encrypt the data matrix E(SRDM) before we send it to the user environment. In the user environment, we will decrypt the received data matrix D(E(SRDM)) during program execution and pick a proper value for each function based on its statement id, statement type and function id in the given tuple in the bracket (6). Table 4 shows how our proposed solution will store the actual values of the sensitive method at line 16 in Fig 2 in the enclave based on bracket (4).

Code generation stage
The code generation stage demonstrates the whole computations of the security-sensitive statements inside the enclave based on the Sensitive Extracted Data Matrix (SEDM). Moreover, it illustrates the return values that will be transmitted from the enclave to the user environment as is shown in Fig 3. Fig 3 demonstrates the execution process of the security-sensitive and insensitive statements of the program in Fig 2. After the partitioning process, we obtain The sliced code <the sensitive statements in the trusted area> int x = 0; int y = 4; The Original Program P  Table 5 The actual values of the program in Fig.2 inside the enclave.
two partitions, the sensitive one on the left side of Fig.3 which includes all security-sensitive variables and functions; and the insensitive part on the right side of Fig 3 which contains all cleartext statements. The statements in the sensitive part will be transmitted to the enclave, where the insensitive ones will be transmitted to the untrusted area. After performing all the necessary computations inside the enclave based on the scheme in Fig 4, the return values will be encrypted inside the enclave using a proper encryption method and returned value to the user environment. We assume that the user environment is a secured area, therefore, we will decrypt the return values in the user environment using a corresponding decryption method to ensure that the returned value will be accessed only by a trusted user and thus cannot be leaked out to the attacker. As it is illustrated in Fig 3, the return value is the function" sum", thus, this value will be encrypted E(sum) inside the enclave and then transmitted to the user environment in an encrypted manner.
In the user environment, the return value will be decrypted D(E(sum)) using the same encryption method that is used inside the enclave. The main design scheme of our proposed solution inside the enclave is shown in Fig 4. We define an interface to create several arrays, where each array contains values with the same type and different arrays have different types as is shown in Table 5. We use these arrays to store the actual values of sensitive variables, thus, we only store their positions in the tuples instead of storing the actual values and that is because we aim to secure the actual variables inside the enclave. Therefore, we can read the actual sensitive values using their positions in each matching array. In our scheme, we read these arrays' positions to obtain the actual values of each sensitive value and then perform necessary computations on it. In  Fig 2. Moreover, we explore how we achieve the return values for each. Thus, for each executed statement inside the enclave, the return value will be generated and sent back to the user environment.
We use the switch-case statement inside the enclave to determine which type of the statement will be executed based on the statement type stmttype (i.e., either expression statement, control flow statement or a function call statement). After each execution, a tuple will be created in (SRDM) based on bracket (7) for expression and control flow statements, bracket (6) for function call statements, those tuples will be used to store the returned values of each executed statement. The SRDM will be signed and encrypted inside the enclave and sent to the user environment E(SRDM) (see Fig 4).
In the user environment, we will handle the corresponding decryption operations. Thus, we decrypt the received data matrix (i.e., D(E(SRDM)) in the user environment and pick the proper return value for each statement based on the statement id and statement type in the given matrix (SRDM). Notice that both the user environment and enclave use the same encryption method mentioned above. The tuple in bracket (7) records the return id retid, the statement id (stmtid), statement type stmttype, and the return value stmtret of each expression and control flow returned statement. In our scheme in Fig 4, we define three functions, one for executing function call statements, referred to as funcall, for executing the expression statement, referred to as stmtexp and the other one for executing control flow statements referred to as stmtcf. Based on the statement type stmttype, the three functions can be invoked from a special function in the enclave, called mainstmtFn. The difference between the function stmtexp and the function stmtcf() is that in the function stmtexp() we acquire return values that will be returned to the user environment in an encrypted manner, called E(SRDM), where the function stmtcf() " returns a boolean value (either true or false) to determine whether the condition of the control flow statement is executed successfully. In the user environment, we will use the stmtreturn function in bracket (1) with its tuple in bracket (7) to retrieve the return values of stmtcf and stmtexp functions from E(SRDM). Similarly, we will use the funreturn function in bracket (5) with its tuple in bracket (6) to retrieve the return values of the funcall function from E(SRDM).

Case study
In this section, we analyze our approach and show its experimental verification by applying it to three real java applications, Binary Search application, Bubble Sort application, and QuickSort application as follows.

Binary Search Application
The binary search application in Fig 5 is a real java application which is a search algorithm that finds the position of a target value within a sorted array.

Data annotation
In this section, we assume the search key at line 19 in Fig 5 is a security-sensitive variable, all other statements that interact with the variable key are security-sensitive statements. Thus, the annotation process at line 19 in Fig 5  marks the content of the variable key as security-sensitive data. Note that the statements in the function binarysearch() at lines 4,5,6,7,9,10, and 12 are become security-sensitive statements due to the information flow.   if (result == -1)

Dataflow analysis
Next, our proposed solution must recognize the annotation variable(s) and PDG. In general, two main steps will be performed as follows.

Static dataflow analysis
For analyzing and extracting security-sensitive variables in Fig 5, we will follow standard dataflow analysis algorithms to capture all the sensitive information based on the annotated variable(s) in the program.

Static forward slicing
Then, we perform forward slicing to find a subgraph with all statements in PDG on which statements in the variable key (i.e., the annotated variable) have a control and data dependence. As it is shown in Fig 6, the sliced statements are highlighted in yellow colour, while the cleartext ones are highlighted in yellow. We consider the highlighted statements in yellow as sensitive statements, while the ones in yellow are cleartext statements. Fig 5 and Fig 6 illustrate the original binary search and the partitioned one, respectively.

Program partitioning and transformation
In this part, we show how our proposed solution will store sensitive variables that will be trans-mitted to the enclave and the rest of the code that will be deployed to the untrusted area. After performing partitioning on the binary search application, we will perform a transformation process on the code and we target the partitioned part in particular. For each expression statement and control flow statement in the partitioned part, we replace it with bracket (1) to extract all the security-sensitive variables. For the function call statement, we replace it with bracket (3) to extract all the security-sensitive information. As a result, we store all the security-sensitive variables based on bracket (1) and bracket (3) in the SEDM. Based on the previous two stages (i.e., data annotation and dataflow), our proposed solution will construct the partition details (PD) which contain securitysensitive variables and insensitive ones. Therefore, we will use the function stmtextract() to extract all variables from each security-sensitive statement in the partitioned part and store them in the SEDM based on bracket (2). Meanwhile, we will store the security-sensitive functions in the SEDM by extracting their data using bracket 3(3). Table  6 shows how the security-sensitive statements (the expression statements and control flow statements) in Fig 6 will be stored in the SEDM. Each statement id in Table 6 represents a single sensitive statement in the partitioned code. It also shows all the executed statements of the partitioned code in Fig 6. The stmtid(0), stmtid(1), stmtid(2), stmtid (3), stmtid (4), and stmtid(5) record the expression statement at line 19, the control flow statement at line 4, the expression statements at line 5, the control flow statement at line 6, the expression statements at line 7, and the control flow statement at line 9, respectively. Table 7 shows the information of the binarySearch function in the SEDM.

Code generation
In this section, we show statements' computations of the binary search application that will be executed inside and outside the enclave. As aforementioned in the program partitioning stage, the security-sensitive data will be sent to Sliced statements (sensitive statements).
Once the enclave receives the E(SEDM) and verifies the execution environment, it will be able to decrypt D(E(SEDM)) using a corresponding decryption method. Following our security model scheme in Fig 4, we will define an interface in the enclave that will generate several arrays, where each array contains values with the same type and different arrays have different types (see Table 8). For expression and control flow statements, we use these arrays to store actual values of each variable coming from (3) (1) and 2 in the partitioned code, and bracket (3) and bracket (4) for the functions call statements; next, we will store their positions in tuples instead of storing their actual values. As a result, we will pick up the actual variables using their positions in each array. Table 8 shows that we only have integer values in the binary search application, therefore, all the values will be stored in the integer array (i.e., in sequence 0, first row). In Table 9, Position−0, Position−1, and Position−2 store variables low, high, and mid, respectively. Whereas Position-3 stores the value of the variable key. The positions of the first three variables will keep on updating until we find the required index of the search key element as it appears in Table 9. Table 9 demonstrates the actual values of low, high and mid variables according to their execution sequence inside the enclave. Eventually, we return the last row in Table 9 to the user environment. To do so, we define three functions inside the enclave (see Fig 4). Based on the statement type stmttype, both functions stmtexp() and stmtcf() will be invoked from the main function mainstmtFn() and return the proper results. We compute all security-sensitive expression statements and control flow statements in functions stmtexp() and stmtcf(), respectively. After performing the necessary computations, we will return all security-sensitive variables to the user environment. In the user environment, we will use the function stmtreturn() in bracket (1) with its tuple in bracket (7) to retrieve all expression and control flow values from E(SRDM). Meanwhile, we will use the funreturn() function in bracket (5) with its tuple in bracket (6) Fig 6; i0, i1, and i2 refer to int low, int high, and int key, respectively. The function statement at line 21 will be transformed into a Jimple form as follows: (i2=staticinvoke<BSClass:intbinarySearch (int[],int,int,int)>(r1, 0, i1, b0);). The above transformation includes the variable result and the function call of the binarySearch method at line 21. For the security-sensitive function in the binary search application, we replace it with the bracket (3). Where the function funextract() in bracket (3) will be used to extract fun(list) based on funid. Notice that the fun(list) is the bracket (4) which contains the statement id stmtid, the statement type stmttype, the function id funid, the function modifier funmodifire, the function name funname, the function type fuctype, and finally the parameter list parmlist[] to list them all in a data matrix. These values will be placed inside the enclave as it is shown in Table 7. The function funreturn() in the bracket (5) will be used to read the return values that will be generated inside the enclave for each security-sensitive function based on its statement id and function id. Note that, the list of the return value will be created inside the enclave. For each return value, a tuple will be created in the Sensitive Returned Data Matrix SRDM which will be used to store the returned values from the enclave to the user environment. At that time, we will encrypt the data matrix E(SRDM) before transmitting it to the user environment. In the user environment, we will decrypt the received data matrix D(E(SRDM)) during program execution and pick a proper value for each function based on its statement id and function id in the given tuple in  i0 := @parameter1: int; 8.
i2 := @parameter3: int; the bracket (6). The tuple in bracket (6) records the return id retid, the statement id stmtid, the statement type stmttype, function id funid, and the return value of the function funreturn. This tuple will be used to retrieve all the security-sensitive functions in the program to the user environment.

Bubble sort application
The bubble sort application in Fig 9 is a real java application that is considered as the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order.

Data annotation
In this subsection, we assume the array arr[] at line 21 in Fig  9 is a security-sensitive statement, all other statements that interact with the array arr[] are security-sensitive statement Thus, the annotation process at line 21 in Fig 9 marks the content of the array arr[]is security-sensitive data. Note that the statements in the function bubbleSort() at lines 2, 3, 4, 5, and 8 are become security-sensitive statements due to the information flow from the annotated statement.

Data flow analysis
Next, our proposed solution must recognize the annotation statement(s) and PDG. In general, two main steps will be performed for this purpose as follows.

1-Static dataflow analysis
For analyzing and extracting the security-sensitive statement in Fig 9, we will follow the standard dataflow analysis algorithm mentioned in the first case study to capture all sensitive information.

2-Static forward slicing
The binary search application in Fig 5 is a real java application which is a search algorithm that finds the position of a target value within a sorted array. As aforementioned in the program partitioning stage, we perform forward slicing to find a subgraph with all statements in PDG on which statements in the annotated variable have a control and data dependence. As it is shown in Fig 10, the sliced statements are highlighted in yellow, while the cleartext ones are highlighted in yellow. We consider the highlighted statements in yellow as sensitive statements, while the ones in yellow are cleartext statements.

Program partitioning and transformation
In this part, we show how our proposed solution will store sensitive statements that will be transmitted to the enclave and the rest of the code that will be deployed to the untrusted area. After performing the partitioning task on the bubble sort application, we will perform a trans formation process on the code and we target the partitioned part in particular. For each expression statement and control flow statement in the partitioned part, we replace it with bracket (1) to extract all the security-sensitive variables. For the function call statement, we replace it with bracket (3) to extract all the security-sensitive functions. As a result, we store all the security-sensitive statements based on bracket (1) and bracket (3) in the SEDM. Based on the two first stages (i.e., data annotation and dataflow), our proposed solution will construct the partition details (PD) which contains securitysensitive variables and insensitive ones. ob.bubbleSort(arr);

23:
System.out.println("Sorted array"); 24: ob.printArray(arr); 25: } 26: } Fig. 10 The Partitioned Bubble Sort Application. Therefore, we will use the function stmtextract() to extract all variables from each security-sensitive statement in the partitioned part and store them in the SEDM based on bracket (2). Meanwhile, we will store the security-sensitive functions in the SEDM by extracting their data using bracket (3). Table 10 shows how the security-sensitive statements (the expression statements and control flow statements) in Fig 9 will be stored in the SEDM. Each statement id in Table  10 represents a single sensitive statement in the partitioned code. It also shows all the executed statements of the partitioned code in  Table 10 shows the information on the bubbleSort function in the SEDM.

Code generation
In this section, we show statements' computations of the bubble sort application that will be executed inside and outside the enclave. As aforementioned in the program partitioning stage, the sensitive data will be transmitted to the enclave side in an encrypted manner (i.e., E(SEDM)). Once the enclave receives the E(SEDM) and verifies the execution environment, it will be able to decrypt D(E(SEDM)) using a corresponding decryption method. Under our security model scheme in Fig 4, we will define an interface in the enclave that will generate several arrays, where each array contains values with the same type and different arrays have different types. For expression and control flow statements, we use these arrays to store actual values of each variable coming from bracket (1) and bracket (2) in the partitioned code, and bracket (3) and bracket (4) for the functions call statements; return; 25: } Fig. 12 The 3-address code form of the security-sensitive method at line 21 in Fig. 10.
next, we will store their positions in tuples instead of storing their actual values. To return the proper values to the user setting, we define three functions inside the enclave (see Fig  4). Based on the statement type stmttype, both functions stmtexp() and stmtcf() will be invoked from the main function mainstmtFn() and return the proper results. We will perform all security-sensitive data of expression statements and control fowl statements in functions stmtexp() and stmtcf(), respectively.
After that, we will return all security-sensitive variables to the user setting. In the user setting, the function stmtreturn() in bracket (1) with its tuple in bracket (7) will be used to retrieve all expression and control flow values from E(SRDM). Meanwhile, we will use the funreturn() function in bracket (5) with its tuple in bracket (6) to retrieve the function call values from E(SRDM).  System.out.println(Arrays.toString(x)); 5. int low = 0; 6. int high = x.length -1; 7.

8.
System.out.println(Arrays.toString(x)); 9.    Notice that the fun(list) is nothing but the bracket (4). These security-sensitive statements will be placed inside the enclave as it is shown in Table 11 for the bubbleSort function. The function funreturn() in the bracket (5) will be used to read the return values that will be generated inside the enclave for each security-sensitive function based on its statement id and function id. Note that, the list of the return values will be created inside the enclave. For each return value, a tuple will be created in the Sensitive SRDM which will be used to store the returned values from the enclave to the user setting. Meanwhile, we will encrypt the data matrix E(SRDM) before transmitting it to the user environment. In the user environment, we will decrypt the received data matrix D(E(SRDM)) during program execution and pick a proper value for each function based on its statement id and its functionid in the given tuple in the bracket (6). The tuple in bracket (6) will be used to retrieve all the security-sensitive functions in the program to the user setting.

Quicksort application
The Quicksort application in Fig 13 is a divide and conquer algorithm. It first divides a large list into two smaller sublists and then recursively sorts the two sub-lists. Our proposed solution can be applied to the Quicksort application as follows.

Data annotation
In this section, we assume the variable "pivot" at line 16 in Fig 13 is a sensitive variable, and all other statements that interact with the variable "pivot" are security-sensitive statements. Therefore, the annotation process at line 16 in Fig 13 marks the content of the variable "pivot" as sensitive data. Likewise, the QuickSort() method at line 10 is considered as a sensitive statement due to the information flow from the annotated variable to the other statements. For the same reason, all statements in the main method at lines 3,4,5,6,7 and 8 are considered sensitive statements. System.out.println(Arrays.toString(x)); 5. int low = 0; 6. int high = x.length -1; 7.

Data flow analysis
Following, our proposed solution will recognize the annotation variable(s) as follows:

1-Static dataflow analysis
For analyzing and extracting sensitive variables in Fig 13, we will utilize the standard dataflow analysis algorithm mentioned in the first case study to extract all sensitive information based on the annotated variable(s) in the program.

2-Static forward slicing
In this phase, we perform forward slicing to find a subgraph with all statements in PDG on which statements in the variable key (i.e., the annotated variable) have a control and data dependence. As it is shown in Fig 14, the sliced statements are highlighted in yellow colour. We consider the highlighted statements in yellow as security-sensitive statements. Fig 13 and Fig 14 illustrate the QuickSort application and the partitioned one, respectively. return; 50: } Fig. 15 The 3-address code of the sensitive method at line 10 in Fig. 14

Program partitioning and transformation
In this phase, we show how our proposed solution will store sensitive variables and the rest of the code that will be deployed to the enclave and the untrusted area, respectively. After performing partitioning on the binary search application, we will perform a transformation process on the code and we target the partitioned part in Fig 14. For each expression statement and control flow statement in the partitioned program, we replace it with the bracket (1) to extract all the security-sensitive variables from the code. For the function call statement, we replace it with the bracket (3) in order to extract all security-sensitive information from the method itself. Thus, we store all security-sensitive variables that we extracted from the expression statements, control flow statements, and function call statements based on the bracket (1) and bracket (3) in the SEDM.
Table 12 records the first iteration of the application and shows how we will store the security-sensitive statements (the expression statements and control flow statements) in Fig 4. in the SEDM. Each statement id in Table 12 represents a single sensitive statement in the partitioned code. It also shows all the executed statements of the partitioned code in Fig 13. The stmtid(0), stmtid (1), stmtid (2), stmtid (3), stmtid (4), and stmtid (5) Table 11 displays the sorted elements of the" quickSort" method in the SEDM.

Code generation
In this section, we show statements' computations of the QuickSort application that will be executed inside and outside the enclave. As mentioned before in the program partitioning stage, the sensitive data will be transmitted to the enclave side in an encrypted manner (i.e., E(SEDM)).
Once the enclave receives the E(SEDM) and verifies the execution environment, it will be able to decrypt the data matrix (i.e., D(E(SEDM)) using the corresponding decryption method. According to the scheme of the security model in Fig 4, we will define an interface in the enclave that will generate several arrays, where each array contains values with the same type and different arrays have different typesFor expression and control flow statements, we use these arrays to store actual values of each variable retrieving from bracket (1) and bracket (2) in the partitioned code, and bracket (3) and bracket (4) for the functions call statements.
In order to return the proper values to the user setting, we define three functions inside the enclave (see Fig. 4). Based on the statement type stmttype, both functions stmtexp() and stmtcf() will be invoked from the main function mainstmtFn() and return the proper results. We will perform all securitysensitive of expression statements and control flow statements in functions stmtexp() and stmtcf(), respectively. After that, we will return all security-sensitive variables to the user setting. In the user setting, the function stmtreturn() in bracket (1) with its tuple in bracket (7) will be used to retrieve all expression and control flow values from E(SRDM). Meanwhile, we will use the funreturn() function in bracket (5) with its tuple in bracket (6) (3) will be used to extract fun(list) based on funid. Notice that the fun(list) is nothing but the bracket (4). For the quickSort method, these security-sensitive statements will be placed inside the enclave as is shown in Table 13. The function funreturn() in the bracket (5) will be used to read the return values that will be generated inside the enclave for each security-sensitive function based on its statement id and functionid. Note that, the list of the return values will be created inside the enclave. For each return value, a tuple will be created in the SRDM which will be used to store the returned values from the enclave to the user setting. Next, we will encrypt the data matrix E(SRDM) before transmitting it to the user environment. In the user environment, we will decrypt the received data matrix D(E(SRDM)) during program execution and pick the proper value for each function based on its statement id and function id in (6). This tuple will be used to retrieve all securitysensitive functions in the program to the user setting. Fig 15 shows the 3-address code of the security-sensitive method at line 10 of the QuickSort application represented in Fig 14. Note that the 3-address code is an intermediate code used by optimizing compilers to aid in the implementation of codeimproving transformations. Each 3-address instruction has at most three operands and is typically a combination of assignment and a binary operator.

The proposed implementation
In this section, we briefly describe our proposed implementation and validation steps as follows. The workflow of our proposed solution is shown in Fig 1. First, users mark a certain variable(s) as security-sensitive sources in the Java program to be analyzed. We will utilize the FlowDroid [45] to perform this task, we will consider the value tainted by the source as a slicing criterion. Figs 2 and 5 show how developers can mark a certain variable in the code and consider it as a slicing criterion. Once the source(s) will be marked in the program, the Soot framework [48] will be used to analyze the original program and then transform it into another representation (i.e., the 3-address form). The Soot framework is an open-source Java-based compiler tool. The program analysis and transformation can be performed in the Jimple Transformation Pack (jtp) phase in the execution of the Soot program. After this step, FlowDroid will be used, a dataflow analysis tool, an extension to the Soot framework to perform static dataflow analysis, and code partitioning. FlowDroid is a static data flow tracker. There is a certain similarity between the two concepts (data slicing and data flow tracker). In our work, we will use the FlowDroid as follows. FlowDroid generates the main method from the list of entry points. This main method is then used to generate a call graph and an inter-procedural control-flow graph (ICFG). We will then detect all sources which are reachable from the given entry points. Starting at these sources, the taint analysis tracks taints by traversing the ICFG. Thus, the value tainted by the source would be considered as a slicing criterion. FlowDroid will track taints forward through the inter-procedural control flow graph (ICFG). Each statement that transforms a taint abstraction could be seen as part of a code partition. However, since FlowDroid is a taint tracker, it doesn't distinguish between statements that simply pass on taints (because they, e.g., do not reference the tainted value at all) and those that actively transform one taint abstraction into another. We may need to extend the implementation to build a graph of taint-transforming statements while computing the IFDS flow functions. In the end, FlowDroid reports all discovered flows from sources to sinks. Depending on the options the user has chosen either the whole path with all intermediate variables is displayed or only the source and the sink statement.
The Enclave code will be implemented with Intel SGX SDK. Therefore, all the modules in the Enclave side will be executed in C++. During the implementation, the developer should be aware of and faced with the two main issues of Intel SGX. i) the limited memory size of 128MB plus 4GB (but with huge overhead), we encourage curious readers to solve this issue by using paging support to go beyond that limitation and that is because the limit of 128MB comes from the BIOS itself. Notice that, the Linux driver supports paging, but Windows does not. And ii) the impossibility of execution system calls from within enclaves. The boundary between user and kernel space is system calls. Typically, userspace programs have no direct access to the hardware. Instead, the user space program requests the operating system to allocate memory and perform I/O on its behalf. The system call interface rules the interaction between the operating system and user-space applications. For every system call, an enclave exit and re-entry would be one way to issue system calls in the presence of SGX. Besides the standard user/to transition, the enclave mode switch could be provided. The developer should notice that system calls are disallowed in enclave mode. However, system calls are the standard way for any user-space application to demand service from the privileged operating system kernel. Every valuable program has to allow system calls for external communication; for instance, reading and writing from/to disk and the network involve system calls. We encourage developers to refer to [49] to understand more on how to handle system calls issues.
The modules in the untrusted environment will be executed in Java. The two parts (i.e., the Java side and the C++ side) will be linked with the Java Native Interface (JNI). We will convert some data types in Java into certain C++ types. For instance, we will convert types short, boolean, byte, and object into int type. For each object, we use its hash code (an integer) in C++.

Comparisons
To illustrate the benefit of our proposed solution, we compare it with the most related works in terms of i) system design and ii) theoretical analysis as follows. The system design and theoretical analysis of our proposed solution are inspired by the Glamdring framework [9] and the CFHider prototype system [21]. In the Glamdring framework, it targets C/C++ applications, uses the static analysis function provided by the LLVM compiler to separate the code. Glamdring then automatically partitions the application into untrusted and enclave parts. Glamdring uses data flow analysis to identify functions that may be exposed to sensitive data. It uses backward slicing to identify functions that may affect sensitive data. Glamdring then places security-sensitive functions inside the enclave and adds runtime checks and cryptographic operations at the enclave boundary to protect it from attack.
In CFHider, it protects the control flow confidentiality of the programs and places it in a data matrix to transmit it to the enclave, and then transmit the other part of the program to the untrusted environment. CFHider combines program transformation with Intel SGX. It transforms the condition of each branch statement into a CFQ function call and moves its execution into the enclave that is considered as an opaque and trusted memory space, i.e., the enclave.
However, our proposed solution differs from the aforementioned approaches in that it goes through four stages. The first three stages (Data annotation stage, Data analysis stage, and Program partitioning stage) are inspired by the design of the Glamdring framework. the fourth stage (Code generation stage) is inspired based on the design of the CFHider framework. In our proposed solution, we proposed a prototype system targeting protecting the data confidentiality of Java programs. Our solution will use the static analysis provided by FlowDroid. To perform this task, we will consider the value tainted by the source as a slicing criterion. Once the source(s) will be marked in the program, the Soot framework will be used to analyze the original program and then transform it into another representation (i.e., the 3-address form). Users must first annotate sensitive variable(s). It will partition the original program into a transformed program and the SEDM. The latter includes all security-sensitive variables. After the partitioning, the transformed program will be uploaded to and performed in the public cloud (i.e., non-enclave area). The SEDM will be transmitted to and executed in an SGX enclave. On the enclave side, we will perform necessary computations for all security-sensitive statements inside the enclave based on SEDM. In our future work, we will implement the proposed solution and compare it with some related works in the field concerning performance, evaluation, and execution time. Table 14 illustrates the comparison between the three approaches. Concerning data flow analysis, our approach applies to both forward and backward data flow analysis, while the Glamdring framework and CFHider prototype are applicable for the backward and forward data flow analysis, respectively. Our approach aims to protect the confidentiality of sensitive data as well as the control flow confidentiality. Glamdring protects the confidentiality and integrity of sensitive data but it cannot protect program control-flow confidentiality. CFHider aims to protect the confidentiality of control flow but not the sensitive data program. Another factor is the programming language, the proposed approach and CFHider are applicable for most programming languages while Glamdring was designed for C/C. The last aspect in our comparison is the platforms that the approaches were designed for. Glamdring and CFHider were designed to be executed in SGX technology. Although we evaluate the proposed solution with SGX in this study, we claim that the proposed solution is suitable for most TEEs.

Related work TEE Infrastructure
TEEs isolate security-sensitive application logic from the operating system and other applications and therefore protect applications by transmitting confidential partition to TEE. In general, TEEs can be used to decrease the impact of code injection attacks that attempt to steal an application's data, such the case for inaudible data attacks [14][15][16] or exfiltrate data existing in another TEE.
Some recent efforts [17][18][19] include general solutions in the standardization of TEE interfaces and protocols. However, most TEEs do not take various types of compartments with various privileges into consideration. PrivateZone [20] presented a framework to enable individual developers to utilize TrustZone resources. In this project, developers can run Security Critical Logics (SCL) in a Private Execution Environment (PrEE). This work relies on ARM TrustZone. ARM servers emerge as a serious and competitive alternative to existing Intel and AMD servers [50].

Program data protection
CFHider [21] and E-CFHider [22] aim to protect the control flow confidentiality in the public cloud setting. However, it hides conditions of branch statements to an opaque SGX enclave and injects fake branch statements to obfuscate the control flow. Yongzhi Wang and Jinpeng Wei [7] proposed runtime control flow obfuscation (RCFO) to protect the confidentiality of the outsourced program control flow. Some existing software-based methods such as [23] and [24] cannot fully meet security, performance, and generality at the same time. These two methods are projected to replace the conditional instructions with lambda calculus and Turing machine simulations, respectively, which can defeat symbolic execution-based reverse-engineering attacks. Virtual Ghost [25] protects application memory from a secured operating system by extending the virtual machine monitor (VMM). These works put trust in the virtual machine monitor, and unable to protect against attackers with privileged access, such as system administrators.

Trusted hardware (SGX)
SGX provides a TEE, called an enclave, that protects the integrity of the code and the confidentiality of the data inside it from other software, including the operating system and hypervisor. LightBox [26] utilizes SGX to build the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to data. SGX-Tor [27] presents a practical approach to enhance the security and privacy of Tor by utilizing Intel SGX. EnclaveDB [28] a database engine that guarantees confidentiality, integrity, and freshness for data and queries. Panoply [29] allows applications to be partitioned into multiple compartments and to be run across multiple enclaves following the principle of least privilege. However, this approach is not easily applicable to complex applications such as databases. Oblix [6] a search index for encrypted data that hides access patterns. It relies on a combination of novel oblivious-access techniques and recent hardware SGX enclave platforms. Another study [30] designed a scheme for the existing methods based on software and hardware. Although the scheme was designed based on SGX, it leaks the access pattern.
Graphene [31,32], and SCONE [4] have verified the possibility of implementing whole applications inside enclaves, supporting that by using appropriate systems, such as a library OS or the C standard library, to the enclave. However, these approaches have a large trusted computing base (TCB) that violates the principle of least privilege due to placing all code inside the enclave. Ryoan [33] aimed to protect the confidentiality of security-sensitive data, it provides a distributed sandbox, leveraging SGX to protect sandbox instances from possibly malicious software. However, it does not protect the confidentiality of program control flow. VC3 [34] the system that lets users running distributed MapReduce computations in the cloud, but placing the code and data in a secure area. VC3 depends on SGX technology to isolate memory regions on individual computers. Bahmani et al. [35] proposed a secure multi-party computation protocol where one of the parties has access to SGX hardware and performs the bulk of the computation. Coppolino Luigi, et al. [36] reviewed some techniques for securing Java software with Intel SGX, the authors selected some promising projects for an experimental comparison in terms of effort, security, and performance. SERECA project [37,38,39] aims to remove technical impediments to secure cloud computing, it proposes to develop a secure environment for reactive cloud applications using Intel SGX.

Conclusions and future work
In this paper, we proposed a solution that can be applied to most TEE systems. Due to the novelty and popularity of Intel SGX in this field, we used SGX technology in our proposed solution as an intended platform to protect the data confidentiality of Java applications.
We describe our proposed solution, the partitioning technique that helps developers leveraging program transformation techniques, program partitioning, and TEE technologies for protecting security-sensitive data of applications. Our proposed solution uses static dataflow analysis to decide which security-sensitive statements must be protected. Therefore, the proposed solution showed how our concrete examples were used to protect their securitysensitive data in terms of confidentiality. Precisely, the proposed solution focuses on protecting the computations of the expression statements, control flow statements, and function call statements of applications in the public cloud setting. The results of the experimental verification are analyzed using real Java concrete applications i.e., Binary Search application, Bubble Sort application and QuickSort application in Fig 2, Fig 5, Fig 6, Fig 9, Fig 10, Fig 13 and  Fig 14 to show how the confidentiality of security-sensitive data is protected.
It is our future work to implement the approach, evaluate it, and compare it with other works from the related field. The future work of this research is going to carry out based on the workflow of the proposed approach demonstrated in Fig 1 and the proposed implementation discussed in the proposed implementation section where the necessary steps of the implementation were discussed in detail. In our future work, we will further investigate program analysis mechanisms and partitioning techniques for efficient transformation. However, this proposed solution helps us to obtain a better understanding of how to utilize the program analysis, transformation technique and TEE technologies for protecting security-sensitive data of programs. As a result, it will help us to implement the current proposed approach and thus obtain a much-reduced performance overhead than existing software-based solutions.