Вы находитесь на странице: 1из 22

High Efficiency Counter Mode Security Architecture via Prediction and Precomputation

Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Alexandra Boldyreva School of Electrical and Computer Engineering Georgia Institute of Technology

Content

Motivation Related Work Counter/Decryption Pad Prediction Profile Prediction Failures 2Level Prediction Context Based Prediction Conclusions

Why Encrypt System Memory?


Protect sensitive data stored in the RAM (many simple devices
can bypass OS memory protection and directly access physical memory)

Digital Right Management (industry witness of gradual addition of


encryption to each platform component, encrypted PCI-E, encrypted disk, encrypted flash memory, then toward encrypted RAM)

Anti-reverse engineer (majority software licenses require users not


to do reverse engineer, count on the users not breaking the promise)

Military (customer of encrypted FPGA chips, lots of embedded military


software)

Program randomization (intrusion prevention, CCS 2003)


3

Different Solutions
Crypto Engine Flash

Micro Controller

Create a little secure world, limited application scenarios (code signing, BIOS signature verification)

Processor Core Cache Crypto Engine

SoC. Memory is on-chip. Apply to limited platforms such as small embedded systems (cell phones)

Configurable system RAM encryption. More usage models.


4

Related Work
Use dedicated cache (sequence number cache) to reduce
latency overhead of memory decryption (Micro 2003)

Prefetch based memory pre-decryption (WASSA 2004)

Prediction based memory decryption (this paper)


Fully exploit pre-computation capability enabled by counter mode

encryption. Use wasted idle crypto engine pipeline stages for prediction and pre-computation. Less area overhead than caching and less memory pressure than prefetch based pre-decryption.

Counter Mode - Encryption


Processor Core
Cache Line Cache Line ... Cache Line Cache Line

Counter+2 Counter+1 Counter Key

VAddr

Counter+2 Counter+1 Counter

Vaddr+2

Crypto Engine

Key

AES Block Cipher

AES Block Cipher

Encryption pad

Encryption pad

16B Cache Line 16B Cache Line


Counter+2 Counter+1 Counter Encrypted 16B Encrypted 16B

Each memory line has its own counter. Each time memory line is updated, increment the counter.

Counter Mode -Decryption


Counter+2 VAddr Key Counter+2 Vaddr+2

Processor Core
Key
Cache Line Cache Line ... Cache Line Cache Line

Crypto Engine

AES Block Cipher

AES Block Cipher

Encryption pad

Encryption pad

16B Cache Line 16B Cache Line

Counter has to be
fetched for memory line missing L2.

Encrypted 16B

Encrypted 16B

Counter Prediction
Counters exhibit both spacial and temporal coherence. To exploit spacial coherence, memory blocks from the same page
start counting from the same initial value (page root counter)
Page Base Addr 0x0000ff00 ... ... Page Root Counter (64 bits) 0xabcddcba12344321 ... ...

counter
Memory line Memory line ... Memory line Memory line 0xabcddcba123443f1 0xabcddcba12344e0a ... 0xabcddcba12344325 0xabcddcba12344321

frequently updated data infrequently updated data

static data
8

Use Free Idle Pipeline Stages for Prediction

Time Line AES Pipeline


Pipeline Idle
Retrieving Counter Value and Encrypted Line

Generate Decryption Pad

Memory Pipeline

decrypted line

Unrolled and pipelined AES decryption logic often stays idle from tens
to hundreds of cycles when data is missing L2.

Use Free Idle Pipeline Stages for Prediction


Use the idle pipeline stages to generate decryption pads based on
predicted counter values (a small window of look ahead counter values based on page root counter number) E(K,G4) correct guess

Time Line AES Pipeline


First Speculated Decryption Pad

Memory Pipeline

Retrieving Counter Value and Encrypted Line

decrypted line
10

Handle Frequent Updates


Page Base Addr 0x0000ff00 ... ... Page Root Counter (64 bits) 0xabcddcba12344321 ... ... Prediction History Vector (16bits) ... ... ...

TLB Shift Register


0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0

Counter Value Prediction Logic


1

If total(miss)>threshold, reset the corresponding Page Root Counter to a new number

Prediction Miss/Prediction Hit (miss =1, hit = 0)

Window based dynamic tracking of prediction rate for each page. For frequently updated memory blocks, according to prediction history
vector, reset root counter number. All future write-backs will count from the new number.
11

Experiment Setup
Parameters L1 I/D Cache Value DM, 8KB

L2 Cache
Memory Bus CPU Clock AES Latency (256-bit) Prediction Depth

4way, unified, 256KB/1M


200MHz, 8B wide 1GHz Total 64 pipeline stages, 1ns each 5

Prediction History Window 16 Bits

Simplescalar 3.0 SPEC2000 INT/FP, benchmarks with high L2 misses. Prediction hit rate study (8 billion instructions) IPC performance (400 million on representative window)
12

Prediction Rate
1.2 1 0.8 0.6 0.4 0.2 0
lu gr id pa rs er sw im tw ol f vo rt ex p ar t bz ip 2 cf gc c vp up r w is Av e er ag e p m gz i ap p m Am m

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0


lu

Prediction Hit Rate (1M L2)

gr id pa rs er sw im tw ol f vo rt ex

ar t bz ip 2

gc c

cf

128K_Counter_#_Cache

512K_Counter_#_Cache

Pred

128K_Counter_#_Cache

512K_Counter_#_Cache

Prediction hit rate under 8 billion instructions No counter number cache when using prediction Prediction depth = 5 Average prediction hit rate, about 82-83%
13

vp up r w is Av e er ag e
Pred

ap p

gz i

Am

IPC
1.2 1 0.8 0.6 0.4 0.2 0
m p Ap pl u ar t gr id Pa rs er Sw im Tw ol Vo f rt ex Vp up r w is Av e er ag e p ip 2 Gc c cf Gz i M Bz Am M

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0


m p Ap pl u ar t

Normalized IPC (1M L2)

gr id Pa rs er Sw im Tw ol Vo f rt ex

Counter_Cache_4K

Counter_Cache_128K

Counter_Cache_512K

Pred

Counter_Cache_4K

Counter_Cache_128K

Counter_Cache_512K

IPC normalized with the scenario without decryption. In general, outperform 128K counter cache On average, in par with 512K counter cache
14

Vp up r w is Av e er ag e
Pred

ip 2

Gc c

p Gz i

Am

Bz

cf

Prediction Miss
Reasons of prediction misses Prediction depth is too small. Reset of page root counter number. Memory lines whose counter
values based on the old page root counter cannot be predicted correctly using the new page root counter.

Solutions (details in the next few slides) Two-level prediction (divide prediction depth into sub ranges,
increase effective prediction depth without adding more predictions) Page root counter history memorization (predict using both the current page root counter and the previous root counter, only having marginal improvement) Context based prediction (exploit temporal coherence of accessing memory locations with coherent update frequency)
15

Two-level Prediction

Counter Number In Natural Order

10 01 00 11

Prediction Window Prediction Window Prediction Window Prediction Window

Divide prediction window into ranges (power of 2) With 2bits per line, effectively quadruple the prediction depth. Overhead is about 2KB on chip memory for 64-entry TLB.
16

Context Based Prediction

Counter Number In Natural Order

Prediction Window

Store the previous lines counter number depth value in a global


register. Generate new predictions based on Page Root Counter and the value in Context Register. Can be combined with regular and 2-level predictions. Feed all the predictions into the decryption pipeline.

17

Why Does It Work?


Memory Page (128 lines)
Memory line Memory line ... Memory line Memory line

{ while (1) { for all lines of the page write to the line; for all lines of the page read the line; } }

Regular Prediction (prediction depth=4) Prediction miss of memory read (%) 20% (for each line, every 5 reads, 1 miss)

Context Based Prediction 0.1% (for every 128*5 reads, 1 miss)


18

Prediction Rate
1.2 1 0.8 0.6 0.4 0.2 0
lu gr id pa rs er sw im tw ol f vo rt ex p ar t bz ip 2 cf gc c vp up r w is Av e er ag e p m gz i ap p m Am m

Prediction Hit Rate (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0


lu

Prediction Hit Rate (1M L2)

gr id pa rs er sw im tw ol f vo rt ex

ar t

gc c

cf

Regular_Pred

Two-level_Pred

Context + Regular_Pred

Regular_Pred

Two-level_Pred

Context + Regular_Pred

8 billion instruction window Two-level prediction about 93% prediction hit Context based + regular prediction almost 99% prediction hit
19

vp up r w is Av e er ag e

bz ip

ap p

gz i

Am

IPC
1.2 1 0.8 0.6 0.4 0.2 0
m p Ap pl u ar t gr id Pa rs er Sw im Tw ol Vo f rt ex Vp up r w is Av e er ag e p ip 2 Gc c cf Gz i M Bz Am M

Normalized IPC (256K L2)

1.2 1 0.8 0.6 0.4 0.2 0


m p Ap pl u ar t

Normalized IPC (1M L2)

gr id Pa rs er Sw im Tw ol Vo f rt ex

Regular_Pred

2level_Pred

Context + Regular_Pred

Regular_Pred

2level_Pred

Context + Regular_Pred

IPC normalized to scenario of no decryption 1-3% loss of performance using best prediction

Vp up r w is Av e er ag e

ip 2

Gc c

p Gz i

Am

Bz

cf

20

Conclusions
Counter value prediction allows pre-computing of pads speculatively
without counter value caching.

Spacial and temporal coherence of memory update frequency enables


effective counter value prediction.

Use idle cycles of pipelined decryption engine Counter prediction achieves better performance than some of the large
cache settings.

Complementary with caching technique

21

Questions

22

Вам также может понравиться