








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This technical document delves into the concept of secure processor architecture for encrypted computation, focusing on the heroic (homomorphic encrypted runtime optimized instruction code) approach. It explores the challenges of protecting sensitive data during computation and proposes a solution using encrypted instructions and data manipulation. A detailed explanation of the fxa instruction set, which combines arithmetic operations with encryption, and discusses the importance of obfuscation to prevent unauthorized access and manipulation of data. It also examines the role of compilers in generating secure machine code and the implications of using encrypted data for runtime operations.
Typology: Lecture notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!
Peter T. Breuer^1
Hecusys LLC, Atlanta, GA, ptb@hecusys.com
Abstract. Privacy for arbitrary encrypted remote computation in the cloud depends on the running code on the server being obfuscated from the standpoint of the operator in the computer room. This paper shows that that may be arranged on a platform with the appropriate machine code architecture, given the obfuscating compiler described.
famous result of van Dijk et al. in the security arena reduces security for arbitrary computations in the cloud to obfuscation of the machine code on the server [9]. ‘Obfuscation’ means that access to the code running on the server gives no advantage, logical or statistical, to the hypothetical spy in the computer room [11]. The computation must take encrypted inputs and produce encrypted outputs or the I/O could be read directly, but that can be arranged with the appropriate processor hardware. The argument of van Dijk et al. is that the remote user’s program and the input for it could together be supplied as (encrypted) input to a virtual machine running on the server. Running the program securely in that environment means that the (encrypted) output must be returned as though from an impenetrable black box, and that is exactly what the – perfectly accessible – code of the virtual machine being ‘obfuscated’ means. There are processor contexts that effectively produce this kind of obfusca- tion. An example is the Ascend co-processor [10], which executes code in ‘Fort Knox’-style physical isolation, with no operator access to it while it is running. The external observables, such as power consumption, timing of I/O (including memory accesses), cache hit/miss ratio, etc., that might leak information via side-channels [18, 19], obey statistics that are configured beforehand and have no correlation with the running program, apart from the run time. The machine code executable is also input in encrypted form. Since the operator’s access to the running program is physically restricted to no more than would be the case for a black box implementation, the program is obfuscated by definition. However, a classic result of Barak et al. affirms that obfuscation of programs is generally impossible [2]. Reconciling that with Ascend’s evident physically- based obfuscation implies that assumptions about what the operator can see of the code while running, or what form the code may take, may not have universal relevance. It turns out that one has to go only a little way along the path of Ascend in securing the processor from the operator, yet obfuscation of the code on the server is technically and practically achievable while still allowing that:
The operator has the conventional full access to running code.
That means access (via a debug exception) to registers and memory, while the user’s code may be single stepped, repeated, retried, altered, etc. The ‘magic’ is in an appropriate machine code instruction architecture, as set out in Section 2. However, software toolchain support is needed, otherwise the code itself might obviate all protections. For example, it might include a trojan library function that takes one hour to complete if the answer is (encrypted) ‘1’, and two hours if the answer is (encrypted) ‘2’, which slowly generates a codebook that translates the encryption for the interested observer^1. Moreover, a human author will write code incorporating small numbers that can be guessed in a dictionary attack. A kind of obfuscating compiler is still required, and this paper provides it. This compiler at each recompilation of the same program produces an exe- cutable that in the code and at any point in the runtime trace has arbitrarily and uniformly distributed differences in the data values (underneath the encryption) with respect to any other compilation. Otherwise, the code and trace looks the same, in particular the control flow visible in the trace. The idea is that a spy does not know which compilation is being run. Suppose hypothetically that the malfeasant operator has a method of determining what the encrypted output or trace data is under the encryption at any point. Then, since that in fact is something different under the encryption for every possible compiled executable, and the operator’s method must see no difference between the different compilations or traces because the differences that do exist are all in the (encrypted) constants and data that cannot be read by the operator, so the putative method cannot work (Theorem 1, Section 2). For the argument to go through, there are two hardware requirements:
(1) Individual machine code instructions must be obfuscated.
That means that individual instructions are treated as Ascend treats an entire program, so they are individually as secure as an entire program is in Ascend. But, as in Ascend, the ‘I/O’ from the instruction must also be in encrypted form or it would be readable by the operator:
(2) Each machine code instruction must read and write data in encrypted form.
otherwise the operator, able to single-step the machine and with access to reg- isters, can read every action. However, (1) and (2) are not necessarily without overlap, as obfuscation and encryption are equivalent in some contexts [1]. There are platforms that satisfy those requirements. HEROIC [17, 16] is a prototype 16-bit machine running with a Paillier (2048 bit) encryption [14]. Its core performs a single machine code addition instruction in 4000 cycles and (^1) A side-channel consisting of signalling via repeat accesses to the same memory lo- cation is also closed in Ascend by the use of oblivious RAM [13], which remaps the logical to physical address relation dynamically, maintaining aliases, so access pat- terns are statistically impossible to spot. It also masks programmed accesses in a sea of independently generated extra random accesses.
The situation is clarified by a particular machine code instruction set. Instruction sets are conventionally made up of instructions that (a) perform a relatively simple binary arithmetic operation, such as ‘addition’ on data in registers or in memory and return a result to registers or memory, instructions that (b) perform a binary comparison operation such as ‘less than’ between two values in registers or memory, setting or clearing a flag, plus (c) control instructions that alter which program instruction is executed next. By default that is the one with the next higher address in memory, but a jump instruction unconditionally alters the sequence. A branch instruction alters the sequence conditionally on the value of the flag set by a previous comparison. However, AMD and Intel in 2011-2013 introduced instruction sets that con- tain instructions (aa) that carry out a combination of two arithmetic operations at once, in a so-called fused instruction. They were introduced in the context of the newer processors such as Intel’s ‘Knights Landing’ Xeon Phi that contain many (72) cores on the one chip, each running with very wide integers (512 bits). While addition takes one cycle to complete on such processors, multiplica- tion takes much longer (about ten cycles). Moreover, the repeating subunit that forms the multiplication logic multiplies two short integers and adds in two short incoming ‘carry’ integers from subunits ‘right’ and ‘below’ in a 2-dimensional ar- ray. The column and row of subunits at extreme ‘right’ and ‘bottom’ respectively may be used to feed two full integer addends into the calculation at no extra cost. Thus a ‘fused multiply and add’ (FMA) instruction was introduced in AMD and Intel’s FMA3 and FMA4 instruction sets for reasons of efficiency. Compilers emit FMA instructions instead of single multiplications followed by add, particularly in connection with parallel matrix- and tensor-oriented computation. Denote by a fused anything and add (FxA) instruction architecture one in which the arithmetic instructions subtract constants k 1 , k 2 from the operands x 1 , x 2 and add a constant k 3 into the result. So FxA multiplication does:
(x 1 − k 1 ) ∗ (x 2 − k 2 ) + k 3
A concrete example of an FxA instruction set dealing with encrypted data is illustrated in Table 1. Some instructions, such as the addition instruction, for example, need only admit one constant addend, as
(x 1 − k 1 ) + (x 2 − k 2 ) + k 3 = x 1 + x 2 + k where k = k 3 − k 1 − k 2
The constants (the k in the above) appear in encrypted form in the instruction, so cannot be read by the operator, who does not have the encryption key. The instructions manipulate encrypted data, either, as in Ascend, by decrypting in- put and encrypting output, or, as in HEROIC, by making use of an encryption E with homomorphic properties such that the arithmetic may be carried out on the encrypted data as is. It only matters that the instructions are atomic. For FxA instructions to work securely, the hardware should also ensure:
Table 1. An FxA machine code instruction set for working with encrypted data
fields kind semantics
add r 0 r 1 r 2 [k]E (aa) add r 0 ← [[r 1 ]D + [r 2 ]D + k]E sub r 0 r 1 r 2 [k]E (aa) subtract r 0 ← [[r 1 ]D − [r 2 ]D + k]E mul r 0 r 1 r 2 [k 0 ]E [k 1 ]E [k 2 ]E (aa) multiply r 0 ← [([r 1 ]D −k 1 )∗([r 2 ]D −k 2 )+k 0 ]E div r 0 r 1 r 2 [k 0 ]E [k 1 ]E [k 2 ]E (aa) divide r 0 ← [([r 1 ]D −k 1 )/([r 2 ]D −k 2 )+k 0 ]E cmov r 0 r 1 r 2 (a) conditional move r 0 ← flag? r 2 : r 3 sfeq r 1 r 2 [k]E (b) set flag if [r 1 ]D = [r 2 ]D + k else clear it sfne r 1 r 2 [k]E (b) set flag if [r 1 ]D 6 = [r 2 ]D + k ” sflt r 1 r 2 [k]E (b) set flag if [r 1 ]D < [r 2 ]D + k ” sfgt r 1 r 2 [k]E (b) set flag if [r 1 ]D > [r 2 ]D + k ” sfle r 1 r 2 [k]E (b) set flag if [r 1 ]D ≤ [r 2 ]D + k ” sfge r 1 r 2 [k]E (b) set flag if [r 1 ]D ≥ [r 2 ]D + k ” bf j (c) skip j instructions if flag set else continue bnf j (c) skip j instructions if flag not set else continue b j (c) unconditional skip j instructions
... Legend: the r are register indexes or memory locations, the k are 32-bit integers, the j are instruction address increments, ‘←’ is assignment. The function [ · ]E represents encryption, [ · ]D represents decryption of a value or register/memory content. Kind (a) are arithmetic, (aa) fused arithmetic, (b) comparators, (c) control.
(3) There are no collisions between (i) encrypted constants [k]E that appear in instructions and (ii) runtime encrypted data values in registers or memory.
In practice that is done by introducing different padding or blinding factors into the encryptions for the two, and checking that in the processor pipeline. The idea is that the hardware should not allow constants that appear in instructions to work correctly when used as data inputs for arithmetic, and data generated at runtime should not work correctly as constants in instructions. That makes it impossible for a ‘spy in the computer room’ to pass the encrypted constants seen in other programs through the processor arithmetic, patching the results back into code snippets by way of experiment or tampering.
Theorem 1. There is no method by which the privileged operator can read a program C constructed using FxA instructions, nor deliberately alter it using those instructions to give an intended encrypted output.
Proof. Suppose for contradiction that the operator has a method f (T, C) = y of knowing that the output [y]E of C encrypts y, having observed the trace T. Now imagine that every number has 7 added to it under the encryption. Replace every FxA instruction of the form r 0 ← (r 1 − k 1 ) (^) Θ(r 2 − k 2 ) + k 0 with r 0 ← (r 1 − k′ 1 ) (^) Θ(r 2 − k′ 2 ) + k′ 0 where k′ i = ki + 7, i = 0, 1 , 2. Then the operations still make sense, operating on numbers that are 7 more than they used to be to produce a number that is 7 more than it used to be. The comparisons r 1 R r 2 + k in C still make sense, because they compare numbers that are 7 more than they
(c) whether it will lie or tell the truth for C = A&&B.
The (a) corresponds to whether ∆A=0 (truth) or ∆A=1 (liar) is added mod 2 to the result for A. Similarly (b) corresponds to whether ∆B =0 (truth) or ∆B = (liar) is added in to the result for B. Let a be true when (a) is set to tell truth (∆A=0), and false when (a) is set to lie (∆A=1). Similarly for b with respect to (b), and c with respect to (c). Then what should be computed at runtime is:
c ↔ ((a ↔ A)&&(b ↔ B))
where the two-sided arrow stands for the boolean biconditional operator, the complement of exclusive or. That is
if a b c then A&&B if a b c then A&&B if a b c then A&&B if a b c then A&&B
if a b c then A&&B if a b c then A&&B if a b c then A&&B if a b c then A&&B
where the overline means boolean negation. The compiler knows a and b and chooses c with 50/50 probability, deciding which of A&&B, A&&B, etc., it will generate machine code for. All the generated codes look the same, modulo the encrypted constants, unreadable by the operator. If [A] is the compiled code for A and [B] is the compiled code for B, then the compiler produces in every case a machine code sequence [C]:
[A]; ia; bnf l; [B]; ib; l : ic
where if a is true (‘truth teller’) then for ia a machine code sequence is generated by the compiler that maintains the flag set by A that the bnf instruction tests. It would suffice to emit nothing, but it is required that the sequence look the same for all possible cases, and ‘nothing’ would be a give-away that that portion of the compilation has been carried out honestly. If a is false (‘liar’) then ia is a machine code sequence of the same length that flips the flag set by A, the compilation of A having been such that it deliberately gives the ‘wrong’ result. Whatever the details of the truth/liar compilation decisions in the internals of A and B, the code produced has the same length, so the length of the jump in the branch instruction (here represented via the assembler label l, not the numerical value later inserted) is always the same. Apart from the possibly flag-flipping inclusions ia, ib, ic, the sequence has the classical form that a compiler should emit for A&&B. In particular, the branch takes the ‘short circuit’ route to an early out if A (possibly flipped by ia) fails. The machine code for the ia, ib, ic is in each case one of the two possibilities:
bf l 1 ; snf; b l 2 ; l 1 : sf; l 2 : # keep flag bf l 1 ; sf; b l 2 ; l 1 : snf; l 2 : # flip flag
where sf and snf are equal-length codes that look alike apart from encrypted constants. They respectively set and clear the flag that is tested by conditional branch instructions. They are:
sflt r 0 r0 [1]E # sf (set flag) sflt r 0 r0 [0]E # snf (clear flag)
where r0 can be any register. The ‘1’ in sf can be any positive value, and the 0 in snf may be any non-positive value. Encryption in any case produces different values for [0]E and [1]E at every invocation, because of random padding/blinding. The way to compile the computational disjunction A || B is similarly
[A]; ia; bf l; [B]; ib; l : ic
replacing the bnf from the conjunction compilation with a bf instruction. To compile source code constant expressions ‘true’ and ‘false’ the compiler arbitrar- ily emits the sf or the snf code, remembering its truth-teller/liar choice for the compilation of the expression of which those form a part. This compiler emits code that generates runtime values that are arbitrarily different in each register and memory location for each unique compilation, yet the compiled codes (and the traces) look exactly the same, up to the encrypted constants (and runtime data), between one compilation and the next.
The compiler works with a database D : DB = Loc → Int containing (32-bit) integer offsets indexed per register or memory location. As the compiler works through the source code, the offset represents by how much the data underneath the encryption is to vary from nominal at runtime at that point in the program. The compiler also maintains a database L : Var → Loc of the locations (reg- isters, memory) for the source variable placements:
CL[ : ] : DB × source code → DB × machine code
To simplify the presentation here, details of the management of database L are omitted. It is used to look up the location to which a source code variable corresponds. A pair in the cross product is written D : s here.
Sequence: The compiler works left-to-right through a source code sequence:
CL[D 0 : s 1 ; s 2 ] = D 2 : m 1 ; m 2 where D 1 : m 1 = CL[D 0 : s 1 ] D 2 : m 2 = CL[D 1 : s 2 ]
The database D 1 that results from compiling the left sequent s 1 in the source code, emitting machine code m 1 , is passed in to the subsequent compilation of the right sequent s 2 , emitting machine code m 2 that follows on directly from m 1 in the object code file and its memory image when loaded.
probability is p(y=Y )=
Y ′
p(f (x+∆x)=Y ′^ ∧ ∆y =Y −Y ′). The probabilities are
independent (because I is only generated once by the compiler and ∆y is newly introduced for it), so that sum is p(y=Y )=
Y ′
p(f (x+∆x)=Y ′)p(∆y =Y −Y ′).
That is p(y=Y )= (^2132)
Y ′
p(f (x+dx)=Y ′). Since the sum is over all possible Y ′,
the total of the summed probabilities is 1, and p(y=Y )=1/ 232. The distribution of x+∆x in other locations is unchanged.
Another intuition is that ∆y has maximal entropy, so adding it in in an instruc- tion swamps any other information the instruction might expose. The theorem states that code is obfuscated. Having the machine code in hand does not tell the operator which of the many compilations of the program it might be, and the data under the encryption at runtime can vary arbitrarily and uniformly across recompilations without changing the trace or the code, as far as a malfeasant who does not have the encryption key is concerned. Could FxA fused arithmetic machine code instructions, which may be costly at runtime, be done without? No, a single classical arithmetic machine code instruction without the addend would introduce a weakness: the operator has access – via a debugger, for example – to running code, so every machine in- struction can be observed and experimented with. The action may be repeatedly observed to build up an encrypted arithmetic table. Then the operator might come across, for example, two encrypted additions ‘a+b=c’ and ‘c+b=a’, from which it may be deduced that 2b=0 (mod 2^32 ), so b=0 or b=2^31. Observing in- stead an FxA addition instruction only permits 2(b+k)=0 to be deduced, where k is an unknown extra addend, which says nothing about b. The argument of Theorem 2 works in a more general context:
Theorem 3. A correctly compiled program in an instruction set managing en- crypted data and satisfying requirement (1) ‘obfuscated instructions’ in context in the program, is such that the probability across different compilations that any particular runtime 32-bit value x for [x]E is in location l at any given point in the program is uniformly 1 / 232.
Proof. The argument of Theorem 2 may be repeated. Obfuscation ‘in context in the program’ means that an instruction I that nominally writes a value y to location l can by that hypothesis be varied by the compiler to write a value y + ∆y instead that is uniformly distributed across recompilations (while retaining correctness of the compilation).
Resuming the argument, the compiler is required to collaborate in producing the obfuscation by the hypothesis that each instruction in the program is ob- fuscated in context in the program. That means the output of each instruction cannot be predicted or guessed with any statistical accuracy, so it must vary uniformly across recompilations. Since processors are deterministic, that means the instruction must vary comprehensively but invisibly to the observer, by dint of some encrypted hence effectively hidden parameters that control its behaviour
and which the compiler varies. The compiler defined in this section does that by controlling the offset ∆y in the result y of the instruction. By the well-known ‘Shannon’s inequality’ of information theory for the case when ∆y has maximal entropy, that implies y+∆y has maximal entropy too, which is the desired result, as maximal entropy means uniformly probable distribution across the range.
The Ackermann function [15] written in C is as follows:
int A(int m, int n) { if (m == 0) return n+1; if (n == 0) return A(m-1, 1); return A(m-1, A(m, n-1)); };
It is a classic function for studying computationally complex behaviour. Every increment in its first argument produces an exponential increase in complexity with respect to the second argument, A(m, n) = 2...^2 n+ − 3 where the ellipsis covers m − 2 exponentiations. The first exponential case is A(3, n) = 2n+3^ − 3. Compilation^2 produces the machine code in Table 2. Places where a 0 con- stant is nominal (instructions 5, 8, 14, 26, 29, 35) do not all contain a 0 (the literal 0s and 1s in the keep and flip flag macros are for recognisability in testing). Similarly for places where a 1 constant is nominal. At instruction 38, for example, the offset from the nominal value 1 in register t1 is − 587705998 −1 = −587705999, −587705998 being written, but two instruc- tions later at instruction 40, when 1 is written into the same register again, the offset from nominal is 1111468055 − 1 = 1111468054, 1111468055 being written. For testing, the offset of the return value from functions and the offsets for function parameters have been set at 0, but they may be arbitrary random values known only to the compiler. The entry and exit offsets for a complete executable program, on the other hand, must be known to the remote user who compiled the code and commissioned its execution. The trace of a run for A(2, 1) = 5 is shown in Table 3, with the result in register v0 after 1104 steps. The A(3, 1) computation reaches result 13 in 8288 steps. Although both stack pointer (register sp) and the base addresses held in registers for load memory and store memory instructions (not shown in Table 2) are also obfuscated, so might possibly access invalid RAM addresses, that is not a problem in practice because processors that work with encrypted data are engineered to cope with exactly that. Data addresses are data too, and, being encrypted on those platforms, are distributed over the whole of the possible range, which might be only partially backed by physical RAM. RAM, on the other hand, like any other external local hardware device, must remain ignorant
(^2) Haskell source code for the compiler and a virtual machine, including this example, may be downloaded from http://nbd.it.uc3m.es/˜ptb/obfusc comp-0 7.hs.
Table 3. Runtime trace (abridged) for the Ackermann function on (2,1), result 5.
PC instruction update flag ... 4 add t0 a0 zer [-1704185953]E t0 = [-1704185951]E 0 5 add t1 zer zer [2104023132]E t1 = [2104023132]E 0 6 sfeq t0 t1 [486758211]E 0 7 bf 2 0 8 sflt zer zer [0]E 0 9 b 1 0 11 bf 2 0 12 sflt zer zer [0]E 0 13 b 1 0 15 bnf 9 0 25 add t0 a1 zer [-989123886]E t0 = [-989123885]E 0 26 add t1 zer zer [-580299623]E t1 = [-580299623]E 0 27 sfeq t0 t1 [-408824263]E 0 28 bf 2 0 ... 102 add v0 t0 zer [-1566613208]E v0 = [5]E 1 ... 106 jr ra 1 STOP
This paper has considered the privacy of remote encrypted computation with respect to the operator/administrator of the cloud-based server as adversary. A minimally encrypted machine code instruction set for computation on the server has been defined that fuses arithmetic operations with the addition of one or more constants. It has been shown that the instruction set architecture allows code and program traces and (encrypted) data circulating in the processor to be accessible to the operator in the conventional way, while being formally private for the remote user. An ‘obfuscating compiler’ has been defined that provides uniformly distributed runtime variations across different recompilations at every point in the program trace. It supports the formal assurances of privacy by eliminating the possibility of attacks based on the likely use by a human author of small numbers in program or data. It does not contradict Barak et al.’s famous result that program obfuscation is impossible, rather constructs it among a reduced class of machine code programs: those that can be compiled for the special target architecture from the same high-level source by this compiler.