Академический Документы
Профессиональный Документы
Культура Документы
Outline
n Motivation n Design
Motivation
Web Server on the SPECWeb99 benchmark yield very low score 200
n 220:
Standard Apache n 400: Customized Apache n 575: Best - Tux on comparable hardware
n However,
to be fast
Motivation
Web Servers (+ full SpecWeb99) n High throughput CPU/network n Large working sets disk activity n Dynamic content multiple programs n QoS requirements latency-sensitive n Workload scaling
overhead-sensitive
4
Motivation
Current Tradeoffs
Statistical sampling
DCPI, Vtune, Oprofile
Measurement calls
getrusage() gettimeofday( )
Fast C ompleteness Detailed High overhead > 40% Online Guesswork, inaccuracy
5
Motivation
gettimeofday( )
In-kernel measurement
6
Design
Design Goals
n Correlate
kernel information with application-level information n Low overhead on useless data n High detail on useful data n Allow application to control profiling and programmatically react to information
Design
Add profiling primitives into kernel Return feedback with each syscall
n
Design
DeBox Architecture
res = read (fd, buffer, count)
App Kernel
buffer
errno
DeBox Info
online or
offline
Implementation
Implementation
CallTrace
Basic DeBox Info PerSleepInfo[0] 4 System call # (write( )) 1270 occurrences CallTrace 3591064 Call # time (usec) (depth 723903 Time (usec) 3591064 ) blocked 989 # write( of page faults 0 )
biowr Resource 1used 25 ) label 2 # holdfp( of PerSleepInfo kern/vfs_bio.c File where blocked 2 ) 05 # fhold( of CallTrace used 2727 Line where 1 3590326 dofilewrite( ) blocked # processes on 2 entry 3586472 1 fo_write( ) exit 0 # processes on
11
Implementation
In-kernel Implementation
n 600
n Instrument
n
scheduler
Time, location, and reason for blocking n Identify dynamic resource contention
n Full
n
Implementation
13
Case Study
SPECWeb99 Results
800 900
n Experimental
n
setup
Mostly 933MHz PIII n 1GB memory n Netgear GA621 gigabit NIC n Ten PII 300MHz clients
orig
VM Patch
sendfile
14
Case Study
inode - 28
Case Study
n Mostly
locking orig n Direct all metadata calls to name convert VM Patch helpers sendfile n Pass open FDs using sendmsg( ) FD Passing
Dataset size (GB): 0.12 0.75
1.38
2.01
2.64
16
metadata
Case Study
When blocking happens: abort( ); (or fork + abort) Record call path
1. Which call caused the problem 2. Why this path is executed (App + kernel call trace)
17
Case Study
n Cold
cache miss path orig n Allow read helpers to open cache missed VM Patch files sendfile n More benefit in latency
0
FD Passing
FD Passing
Dataset size (GB): 0.12 0.75
1.38
2.01
2.64
18
Case Study
Call trace indicates: File descriptors copy - fd_copy( ) VM map entries copy - vm_copy( )
Call time of fork( ) as a function of invocation
19
Case Study
n Similar
Case Study
n Cache
pmap/sfbuf entries n Return special error for cache misses n Pack header + data into one packet
21
Case Study
SPECWeb99 Scores
0
100 200 300 400 500 600 700 800 900
orig
VM Patch
sendfile
FD Passing Fork helper No mmap() New CGI Interface New sendfile( )
0.12 0.43
0.75
1.06
1.38
1.69
2.01
2.32
2.64
2.95
Dataset size (GB)
22
Case Study
changes
FD passing helpers n Move fork into helper process n Eliminate mmap cache n New CGI interface
n Kernel
n
sendfile changes
Reduce pmap/TLB operation n New flag to return if data missing n Send fewer network packets for small files
23
Other Results
3.3 GB
24
Other Results
47X
25
Other Results
Throughput (Mb/s)
3.3 GB
Other Results
3.7X
27
Summary
n
Low overhead on real workloads Fine detail on real bottleneck Flexibility for application programmers SPECWeb99 score quadrupled
n
Case study
n
n n n
Up to 36% throughput gain on static workload Up to 112x latency improvement Results are portable
28
Thank you
SpecWeb99 Scores
n Standard
Flash n Standard Apache n Apache + special module n Highest 1GB/1GHz score n Improved Flash n Flash + dynamic request module
30
SpecWeb99 on Linux
n Standard
Flash 600 n Improved Flash 1000 n Flash + dynamic request module 1350
n 3.0GHz
31
32