Let, P1 and P2 are two processes running on the same CPU core. Both are using some function from a shared library. So, the physical memory frame of the function Fs is shared by both P1 and P2.
Now let’s say P1 starts executing. It fetches a lot of garbage instructions from the memory which replaced all the cache lines with garbage. At this point, P1 is certain that Fs are not in the cache.
Now it’s time for P2 to execute. If P2 needs Fs it will face a cache miss. So, the frame Fs will be brought back to the cache for execution. After executing for some time context switch happens again.
Now P1 comes online. This time P1 tries to fetch Fs and monitors the timing. If P2 has used Fs in the previous cycle, then P1 will have a cache hit, taking a lower time to execute instructions in Fs. Otherwise, P1 will face a cache miss, which takes relatively longer. So by analysing the timing information, P1 can determine if P2 has used Fs or not.
This is a kind of cache-side channel attack. The side channel here is time. There are small variations in this information retrieval algorithm, but the gist is the same.
Now it is not always a problem. It does not matter if another process knows my process is using the rendering library, or some maths library etc. It can’t do any harm or alter the execution of my process from that information. But it becomes a serious issue when someone can recover secret cryptographic keys using this technique. Let us see an example,
Recovering cryptographic key-
The above figure (1) is an example of a RSA algorithm implementation. The di ‘s are digits of the secret key that must be private to the program executing this algorithm. These kinds of algorithms are widely used by millions of programs. So, generally OS’s have a shared library containing the instructions of the algorithm. That way all the programs using this do not need to have a copy of the instructions, thus reducing memory usage of the overall system.
Here the above figure (1) is our SL. Suppose, P1 is executing the algorithm using it’s own secret di’s. Let’s see how P2 can recover values of di’s -
- P1 executes for some time.
- Context switch.
- P2 flushes the cache or loads some random useless data to the cache so that the cache does not contain memory blocks corresponding to the line numbers 6 & 7 of figure 1.
- P2 starts waiting for some time - context switch.
- P1 starts executing. Now here are two possibilities.
- If the current value of di is ‘0’ then P1 will fetch the memory blocks of lines 6 & 7 from RAM to cache and execute.
- If the current value of di is ‘1’ then P1 will execute lines 9 & 10. So, lines 6 & 7 are still absent in the cache.
- Context switch happens.
- P2 loads the instructions on lines 6 & 7.
- If the instructions run relatively fast (faster from a predefined threshold), then it can be said that it is a cache hit. So, in the previous cycle, P1 must have used those instructions. So, the value of di in the previous cycle was ‘0’.
- If the instructions run relatively slow, then it's a cache miss. So, in the previous cycle P1 did not use those instructions. So, di was ‘1’.
Following this above procedure, P2 can get the secret key of P1.
We have simulated this attacking process virtually in a c program. Here one function (simulating victim process, vctm_proc) generates a 16-bit key in its personal block. The key is never passed to anyone. Despite that, another function (simulating the attacker, atkr_proc) is able to retrieve the key from timing analysis. The output is something like this,
In practical computing environments, due to background noises (multi-core CPU, multi-level cache, other running processes, and OS itself), the execution of timing-based side-channel attacks is more complicated than that. The attacker needs to analyze timing data statistically over many execution cycles. But the basic principle is the same.
Here we have taken the situation most suitable for the attacker. So the attacker can retrieve the key in only one attempt. Our argument is that, if we can prevent attackers from retrieving keys in the most favorable situation, the key is safe in all other possible scenarios.
So how do we do it?
Let's first analyze where the problem begins. If the execution of a block of code in the shared library does not depend on the value of the key, then the attacker will not be able to gain any extra information other than the victim process using that library, which is not a problem here. But when there is a conditional execution of a block of code in the shared library whose condition depends on the value of the key, then the attacker is able to gain extra information.
Here two terms are important, key dependent block, and key independent block.
Now our solution is simply “avoid sharing the key dependent block between processes.” Provide copies of the dependent block in separate physical locations for each process.
In static libraries, the libraries are copied to the programs using them in compile time which takes a lot of memory. On the other hand, shared libraries pose the aforementioned security loopholes in some cases. So we have chosen a middle round.
Programmer’s perspective-
It is kept in the hand of the programmers of the shared library to decide which parts/functions of the library may reveal secrets and which parts are okay when shared. The parts that should not be shared are marked using a keyword, say, “personal” before the function name. It will look like this,
Now the programs that uses this library does not get any copy of the library code at compilation time. So, unless the program is executed, there is no memory overhead.
When a program is loaded into the memory for execution, during that time, the instructions in the personal functions in the shared library have to be copied in a separate memory block and mapped to the virtual memory of the process by manipulating the page table of the process. This job has to be done by the loader with the help of the kernel.
A similar simulation using the personal function method shows the effectiveness of this technique,
Memory overhead-
Let a shared library has N instructions, of them m portion is defined as personal functions. So total number of shared instructions is, N*m
Let there be P programs using a shared library in a computer. Among those, the p portion is currently being executed. So the total number of copies of those personal functions is, P * p
So total memory being used by the library is,
\(P\cdot p\cdot N\cdot m+N\cdot (1-m)\) instruction length
For comparison purposes, using only static libraries will take up space,
Ptot * N instruction length (put m = 1 and p = 1 in prev expression)
And for using only a shared library without personal functions will take,
N instruction length (put p = 0 and m = 0 in the prev expression)
Here is how they compare graphically,
Here X axis is P and Y axis is memory used.
The Red is using static library,
Blue is using shared library, without personal functions.
Green is using shared libraries with personal functions.
Now, if there are total n libraries in a system, we have memory consumption
Σ Pi * pi * Ni * mi + Ni * (1 - mi), for all i = 1:1:n