Академический Документы
Профессиональный Документы
Культура Документы
March 2014
Outline Part 1
Motivation - Why main memory processing
What is main memory computing
SAP HANA overview
Architecture
Usage ( SQL, Hana studio)
Distribution
Scale out vs scale up
Data split
Parallelization
Outline Part 2
The insert/update problem : Delta table
data model
Data access ( insert only) / cost model
Data visibility
L2D the state of the art approach for a delta table
L1D - discussions
Transaction managememt
UDIV handling
MVCC
Tx lists
Distributed transactions
Consistency models incl. eventual consistency
Central operators
Joins ( i.e. semi join reducer)
Parallel aggregation
Sort
2013 SAP AG or an SAP affiliate company. All rights reserved.
Outline Part 3
Optimizer and query execution
Execution plans
Plan generation
Execution engine
Optimizer models
SQL versions ( SQL 92, 99 )
HW Technology Innovations
Multi-Core Architecture
(8 CPU x 15 Cores per blade)
Massive parallel scaling with many
blades
Compression
Partitioning
64bit address space 3-6 TB in
current servers
Dramatic decline in
price/performance
No Aggregate Tables
In-Memory computing
Use cache-conscious data-structures and algorithms
Programming against a new scarce resource
CPU
Core
CPU Cache
Performance bottleneck today:
CPU waiting for data to be
loaded from memory into cache
Main Memory
Disk
In-Memory computing
Challenges of In-memory Computing
Challenge 1:
Parallelism! Take
advantage of tens, hundreds
of cores
Challenge 2: Data
locality!
Yes, DRAM is 100,000
times faster than disk
But DRAM access is still 460 times slower than onchip caches
10
In-Memory computing
Delegation of data intense operations to the in-memory
computing
Todays applications
execute many data
intense operations in
the application layer
Application Layer
Data Layer
11
In-Memory computing
Delegation of data intense operations to the in-memory
computing
Traditional
In-Memory Computing
Application
Mass data
Database
Mass data
12
In memory
computing
Engine developments
13
SAP HANA
Software component view
SQL
Script
SQL
Text Analytics
MDX
Other
Planning + Consolidation
Application Function
Libraries
Enterprise search
Relational Stores
Row based
Columnar
Managed Appliance
Appliance Packaging
15
SAP HANA
Deployment view
Single host configuration
Multi-node cluster configuration
Name Server
Index Server
Statistics Server
Preprocessor
Node 2
Index
Server
Node n
Index
Server
Preprocessor
Preprocessor
SAP Host
Agent
SAP Host
Agent
XS Engine
SAP HANA Studio Repository
SAP Host Agent
Software Update Manager
Shared persistency for fail-over and recovery
16
In-Memory computing
Security implications
Traditional
Client
In-Memory Computing
3 tier architecture:
2 tier architecture:
Client
Application
Server
Database
Authorization is handled by
application server
DB is accessed with
technical user
Security is handled by
application server
2013 SAP AG or an SAP affiliate company. All rights reserved.
HANA
Authorization is handled by
HANA
Security is handled by
database
17
19
Applications writing
directly into SAP HANA
][ ][ ][
20
Row
Column
21
Data Stores
Persistency Layer
Sav
e
Poi
nt
Log
s
Backup/
Restore
Backu
Backu
p
p
22
> SELECT a
FROM test.myTable;
23
24
Attribute
View
Calculation View
T
T
T
T
T
T
Table
Analytic View
25
Query
Statement Processor
Execution plan
SELECT
FROM
WHERE
O
p
Calculation Engine
O
p
O
p
O
p
Data Stores
O
p
Views
O
p
O
p
O
p
Persistency Layer
Sav
e
Poi
nt
Log
s
26
Set Operations
Operation
Calculations on Data
R Procedure Calls
27
Att2
Att3
Column Store
stores tables by column
Att4
Att1
Att5
Tuple 1
Tuple 1
Tuple 2
Tuple 2
Tuple 3
Tuple 3
Tuple n
Tuple n
Att2
Att3
Att4
Att5
Database Technology
Row and column based storage for a table (principle)
Column Store
Row Store
Table
Country
Product
Sales
US
US
JP
UK
Alpha
Beta
Alpha
Alpha
3.000
1.250
700
450
Row 1
Row 2
Row 3
Row 4
US
Alpha
3.000
US
Beta
1.250
JP
Alpha
700
UK
Alpha
450
Product
US
US
JP
UK
Alpha
Beta
Alpha
Alpha
3 000
3.000
1.250
700
450
30
Database Technology
Multiple data storage methods: Column Store I
Classical DB
Company
[CHAR50]
Region
[CHAR30]
Group
[CHAR5]
INTEL
USA
Siemens
Europe
Siemens
0
1
2
3
INTEL
Siemens
SAP
IBM
0 Germany
1 USA
0 A
1 B
2 C
Europe
SAP
Europe
SAP
Europe
IBM
USA
Index Vector
Stored in one memory chunk
=> data locality for fast scans
31
Database Technology
How Data is Mapped to Memory
conceptual view
A
10
35
40
12
mapping to memory
1. organize by row
A 10 B 35
2. organize by column
A B C D E 10 35
40 12
40
12
$
memory
address
$
memory
address
32
Column C2
Column C1
Data
Array
Dictionary
Vector
Inverted
Index
Dict
Index
Data
Array
Dict Value
Vector
Data
Array
Inverted
ColumnIndex
C3
Dict
DictValue
Index
Vector
Data
Array
Inverted
ColumnIndex
C4
Dictionary
Dict
Vector
Index
Inverted
Index
Dict
Index
33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Row
positions
(implicit)
5
3
2
3
4
1
6
2
0
4
1
2
2
0
0
1
2
3
4
5
6
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
Dictionary
Value Vector
0
1
2
3
4
5
6
Dictionary Index
9, 15
6, 11
3,8,12,13
2,4
5,10
1
7
Inverted Index
34
Database Technology
Column Store: Dictionary Compression
35
References
1
2
3
4
5
6
7
8
9
10
11
12
13
14
5
3
2
3
4
1
6
2
0
4
1
2
2
0
Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
36
Bitwise decompression
Native C and SSSE / AVX
38
39
Performance / core
Future
extensions
Intel AVX2:
256-bit wide integer vectors
FMA (2x peak flops)
Gather Instructions
Since 1999:
128-bit Vectors
2011
2012
2013
20??
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change
without notice.
Microarchitectures
2013
SAP AG or an SAPcode
affiliate
company.
AllBridge,
rights reserved.
Intel
name:
Sandy
Ivy Bridge and Haswell
40
41
42
Intel AVX2 completes the 256-bit extensions started with Intel AVX: 256-bit
integer, cross-lane permutes, gather, FMA
43
C
...
65537
31455
6
128
4
4711
100000
0
42
Unpack
128
4711
31 different implementation
for each n from 1 to 32
Source: Lemke, et al. Speeding up queries in column
stores: a case for compression, DaWaK'10
100000
42
17 bits
32 bits
44
Assembly
vector load v
from input array
byte shuffle v
vector shift v
vector and v
vector store v
in output array
New variable
shift instruction
45
decoded integers/cycle
3,5
3
2,5
2
1,5
1
0,5
0
0
10
15
Bit-Case #
20
25
30
Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013
Intel Streaming SIMD Extensions (Intel SSE)
2013
AG or Vector
an SAPExtensions
affiliate company.
rights reserved.
IntelSAP
Advanced
2 (Intel All
AVX2)
46
scalar
1,8
decoded integers/cycle
1,6
1,4
1,2
1
0,8
0,6
0,4
0,2
0
0
10
15
Bit-Case #
20
25
30
Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013
2013
AG or Vector
an SAPExtensions
affiliate company.
rights reserved.
IntelSAP
Advanced
2 (Intel All
AVX2)
47
2013
AG or an SAP Synchronization
affiliate company. AllExtensions
rights reserved.
Transactional
IntelSAP
(Intel
48
Application with
Coarse
Grain Lock
scaling
2013
SAP
AG or an SAP affiliate
company. All Extensions
rights reserved.
Intel
Transactional
Synchronization
(Intel
TSX)
49
Inverted index
Architecture & Technology
Inverted index
References
Inverted Index
Cupertino
9,14
San Jose
6,11
Palo Alto
3,8,12,13
Dublin
2,4
Freemont
5,10
Oakland
San Francisco
1
2
3
4
5
6
7
8
9
10
11
12
13
14
5
3
2
3
4
1
6
2
0
4
1
2
2
0
Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
51
1)
2)
coding is a lossless data compression method using a family of data compression codes invented
by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an
optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input
stream is significantly more likely than large values.
Rice
coding (invented by Robert F. Rice) denotes using a subset of the family of Golomb codes to produce a simpler
(but possibly suboptimal) prefix code. Rice used this set of codes in an adaptive coding scheme; "Rice coding" can refer
either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter
that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes
Rice codes convenient for use on a computer, since multiplication and
52
53
Remainder
Code
1 00
1 01
1 10
1 11
0 1 00
0 1 01
0 1 10
0 1 11
00 1 00
00 1 01
10
00 1 10
11
00 1 11
12
000 1 00
13
000 1 01
14
000 1 10
15
000 1 11
54
55
Better compression of the inverted index ( still 30-50% of complete column size)
Cupertino
3,4
San Jose
2,3
Palo Alto
1,2,3,4
Dublin
Freemont
2,3
Oakland
San Francisco
References
1
2
3
4
5
6
7
8
9
10
11
12
13
14
5
3
2
3
4
1
6
2
0
4
1
2
2
0
Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
But : higher CPU costs and worse performance cause of additional block scan
56
Database Technology
Dictionary definition
DEFINITION (STRING DICTIONARY).
A string dictionary is a read-only data structure that implements at least the following two
functions:
1) Given a value ID id, extract(id) returns the corresponding string in the dictionary.
2) Given a string str, locate(str) returns the unique value ID of str if str is in the
dictionary or the value ID of the first string greater than str otherwise.
58
Database Technology
Column Store: Delta Compression For Strings
59
Database Technology
Column Store: other string dictionary compression schemas
We can apply these compression schemes to two main dictionary data structures:
60
Database Technology
Column Store: string dictionary compression schemas
Inline Front Coding : In order to improve sequential access, a Front Coding variant
stores the prefix lengths interleaved with the string suffixes.
Front Coding with Difference to First : In order to trade some space for speed, another
Front Coding variant that stores the suffixes differing from the first string of a block
instead of the difference to the previous string can be used. Hence decompression of a
string essentially consists of two memcpys.
Fixed Length Array: For very fast access to small dictionaries, an array implementation
that does not need pointers to the string data can be used. For each string, the same
amount of space is allocated in a consecutive array.
Column-Wise Bit Compression : For columns with strings that all have the same length
and a similar structure, a specific compression scheme and be used. Divide the
dictionary into blocks. Then vertically partition each block into character columns, which
are then bit compressed.
61
Sorted
(String) Dictionary
0
Abu Dhabi
Abuja
Accra
n-2
Yerevan
n-1
Zagreb
CEFS: Cache-Efficient Function Store, a read-optimized data structure for dictionary indexing
Idea 1: When we use only hashed values, we can save space and comparisons
Idea 2: Tune for maximum cache-efficiency
Recursive data structure of multiple levels, each comprising an array of buckets
62
Perfect hash
function
sigs
slot1
slot2
slot3
slot4
63
Perfect hash
function
sigs
slot1
slot2
slot3
slot4
64
References Compression
Architecture & Technology
Database Technology
Compression of Value ID Sequence
66
Database Technology
Compression with run length encoding
Difficult to compress
Dictionary compressed
0
1
2
3
INTEL
Siemens
SAP
IBM
0 A
1 B
2 C
0
1
2
3
INTEL
Siemens
SAP
IBM
0 Europe
1 USA
0 A
1 B
2 C
1 x 0
1 x 1
1 x 0
2 x 1
4 x 0
1 x 1
2 x 2
1 x 1
1 x 2
1 x 3
Company
[CHAR50]
Region
[CHAR30]
Group
[CHAR5]
INTEL
USA
Siemens
Europe
Siemens
Europe
SAP
Europe
SAP
Europe
IBM
USA
0 Europe
1 USA
3 x 0
67
Decimal arithmetic
Some interesting algorithmic challenges
Decimal arithmetic
Decimals are defined as numbers with precision and scale
Decfloat can be implemented as either a floating-point number or as a fixed-point number. In the fixedpoint case, the denominator would be set to a fixed power of ten. In the floating-point case, a variable
exponent would represent the power of ten to which the mantissa of the number is multiplied.
By using integer representation the big issue is rounding and overflow handling.
Example : a = 10.001 ( with p=5, s=3)
69
Parallelization / Distribution
Architecture & Technology
10
35
40
12
Vertical
concurrent processing on vertical partitions
(disjoint set of columns)
Horizontal
concurrent processing on horizontal partitions
(disjoint subset of rows)
10
40
35
12
C
2
Server 1
Server 2
71
Database Technology:
Technology:
Parallelization and partitioning over multiple nodes
Product
Group
Color
10
red
20
blue
30
green
40
red
50
red
60
red
Product
Group
Color
10
red
20
blue
30
green
Product
Group
Color
40
red
50
red
60
red
Node 2
Node 1
72
Database Technology:
Technology:
Example for parallelization in a Column Store
73
74
Database Technology
Parallelization and horizontal partitioning over multiple nodes
Product
Group
Color
10
red
20
blue
30
green
40
red
50
red
60
red
Product
Group
Color
10
red
20
blue
30
green
Product
Group
Color
40
red
50
red
60
red
Node 2
Node 1
75
Database Technology
Exploit multi-core architectures by parallelization of
operations
10
35
40
12
Vertical
concurrent processing on vertical partitions
(disjoint set of columns)
Horizontal
concurrent processing on horizontal partitions
(disjoint subset of rows)
A
10
40
35
12
C
2
Server 1
Server 2
76
Alternative 1
Alternative 2
Serial Execution
Inter-Operator
Parallelism
Intra-Operator
Parallelism
Todo:
Range-aware algorithms
Parallelism controller
Adjust plan generator
Todo:
Parallel algorithms
New internal APIs
Parallelism controller
77
78
79
2)
function DICTIONARY:searchDict(value)
3)
4)
80
References
Ingo Mller, Peter Sanders, Robert Schulze, Wei Zhou. Retrieval and Perfect Hashing using Fingerprinting.
SEA 2014, Copenhagen, Denmark, June/July 2014.
Ingo Mller, Cornelius Ratsch, Franz Faerber. Adaptive String Dictionary Compression in In-Memory
Column-Store Database Systems. EDBT 2014, Athens, Greece, March 2014.
Thomas Willhalm, Ismail Oukid, Ingo Mller, Franz Faerber. Vectorizing Database Column Scans with
Complex Predicates. ADMS 2013, Riva del Garda, Italy, August 2013.
Jonathan Dees, Peter Sanders. Efficient Many-Core Query Execution in Main Memory Column-Stores.
ICDE 2013, Brisbane, Australia, April 8-12, 2013
Sikka, V., Frber, F., Lehner, W., Cha, S. K., Peh, T., & Bornhvd, C. (2012). Efficient transaction processing
in SAP HANA database. SIGMOD Conference (p. 731).
Frber, F., May, N., Lehner, W., Groe, P., Mller, I., Rauhe, H., & Dees, J. (2012). The SAP HANA
Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 28-33.
Frber, F., Cha, S. K., Primsch, J., Bornhvd, C., Sigg, S., & Lehner, W. (2011). SAP HANA Database - Data
Management for Modern Business Applications. SIGMOD Record, 40(4), 45-51.
Lemke, C., Sattler, K.-U., Faerber, F., & Zeier, A. (2010). Speeding up queries in column stores: a case for
compression, 117-129.
More HANA publications at
http://scn.sap.com/docs/DOC-26787
81