Вы находитесь на странице: 1из 81

HANA database lectures

March 2014

Outline Part 1
Motivation - Why main memory processing
What is main memory computing
SAP HANA overview
 Architecture
 Usage ( SQL, Hana studio)

Main memory Column Store







Row vs column store


Data model
Basic operations ( C ++- scan, AVX/SSSE scan)
Compression ( references, dictionary, index)

Distribution
 Scale out vs scale up
 Data split
 Parallelization

2013 SAP AG or an SAP affiliate company. All rights reserved.

Outline Part 2
The insert/update problem : Delta table






data model
Data access ( insert only) / cost model
Data visibility
L2D the state of the art approach for a delta table
L1D - discussions

Transaction managememt






UDIV handling
MVCC
Tx lists
Distributed transactions
Consistency models incl. eventual consistency

Central operators
 Joins ( i.e. semi join reducer)
 Parallel aggregation
 Sort
2013 SAP AG or an SAP affiliate company. All rights reserved.

Outline Part 3
Optimizer and query execution






Execution plans
Plan generation
Execution engine
Optimizer models
SQL versions ( SQL 92, 99 )

Persistency & delta


 Mapping from main memory structures to persistency pages ( PAX)
 Logging
 Shadow page concept

Text & GIS extensions


 Text data model & operations
 GIS data model & operations

YediDB a first prototype in LUA


2013 SAP AG or an SAP affiliate company. All rights reserved.

Why memory processing

Why memory processing


Example 1 : multi threading
 Generate multiple threads which add to one global (atomic) variable
 Compare against local counters and summing up afterwards

Example 2 : cache line effects


 Generate multiple threads which have local variables, but shard cache lines
 Compare against local variables with separate cache lines

Example 3: memory locality


 Create an array of fixed sized strings ( i.e. 10 bytes and do a full table scan)
 Generate the array either by inplace strings or by pointers to strings
 Compare the 2 versions

2013 SAP AG or an SAP affiliate company. All rights reserved.

What is main memory processing

What is In-Memory computing


Orchestrating technology innovations
Dramatically improved hardware economics and technology innovations in
software have made it possible for SAP to deliver on its vision of the RealTime Enterprise with in-memory business applications

HW Technology Innovations
Multi-Core Architecture
(8 CPU x 15 Cores per blade)
Massive parallel scaling with many
blades

SAP SW Technology Innovations


Row and Column Store

Compression

Partitioning
64bit address space 3-6 TB in
current servers
Dramatic decline in
price/performance

No Aggregate Tables

Insert Only on Delta

2013 SAP AG or an SAP affiliate company. All rights reserved.

In-Memory computing
Use cache-conscious data-structures and algorithms
Programming against a new scarce resource
CPU

Core

CPU Cache
Performance bottleneck today:
CPU waiting for data to be
loaded from memory into cache

Main Memory

Disk

Performance bottleneck in the past:


Disk I/O

requires cache-conscious data-structures and algorithms.


2013 SAP AG or an SAP affiliate company. All rights reserved.

In-Memory computing
Challenges of In-memory Computing

 Challenge 1:
Parallelism! Take
advantage of tens, hundreds
of cores

 Challenge 2: Data
locality!
Yes, DRAM is 100,000
times faster than disk
But DRAM access is still 460 times slower than onchip caches

2013 SAP AG or an SAP affiliate company. All rights reserved.

10

In-Memory computing
Delegation of data intense operations to the in-memory
computing

Todays applications
execute many data
intense operations in
the application layer

Application Layer

Data Layer

In-Memory Computing Imperative:

2013 SAP AG or an SAP affiliate company. All rights reserved.

High performance apps


delegate data intense
operations to the
in-memory computing

Avoid movement of detailed data


Calculate first, then move results

11

In-Memory computing
Delegation of data intense operations to the in-memory
computing

Traditional

In-Memory Computing

Application

Mass data

2013 SAP AG or an SAP affiliate company. All rights reserved.

Database
Mass data

12

In memory computing - reasoning


Decrease of DRAM prices
Increase of computing power ( multicore)

In memory
computing

Upcoming NVM technologies


Big data performance requirements
Advances in network technologies
Success of sensor technologies

Expectations of mobile users


Transactional memories

Main memory technology


Is the backbone of all future

Slow improvements of memory bandwidth

Engine developments

2013 SAP AG or an SAP affiliate company. All rights reserved.

13

SAP HANA Overview


HANA Development
February 2014

SAP HANA
Software component view
SQL
Script

SQL

Text Analytics

MDX

Other

Planning + Consolidation

Application Function
Libraries

Enterprise search

Business Function Library

Data quality , Genome

Predictive Analysis Library

Parallel Calculation engine

Analytical and Special


interfaces

 Application logic extensions

 Parallel data flow computing


model

Relational Stores
Row based

Text, GIS, Graph,


non SQL stores

 Multiple in-memory stores

Columnar

Managed Appliance

2013 SAP AG or an SAP affiliate company. All rights reserved.

 Appliance Packaging
15

SAP HANA
Deployment view
Single host configuration
Multi-node cluster configuration

SAP HANA Appliance


SAP HANA Database

Maintains landscape information

Name Server

Holds data and executes all operations

Index Server

Collects performance data about HANA

Statistics Server

Text analysis pre-processor


Extended Application Services
Repository for HANA Studio updates
Enables remote start/stop
Manages SW updates for HANA

Preprocessor

Node 2

Index
Server

Node n

Index
Server

Preprocessor

Preprocessor

SAP Host
Agent

SAP Host
Agent

XS Engine
SAP HANA Studio Repository
SAP Host Agent
Software Update Manager
Shared persistency for fail-over and recovery

2013 SAP AG or an SAP affiliate company. All rights reserved.

16

In-Memory computing
Security implications

Traditional
Client

In-Memory Computing

3 tier architecture:

2 tier architecture:
Client

Application
Server

Database

Users exist in application


server only

Authorization is handled by
application server

DB is accessed with
technical user

Security is handled by
application server
2013 SAP AG or an SAP affiliate company. All rights reserved.

HANA

Users log on directly to


HANA

Users exist in HANA

Authorization is handled by
HANA

Security is handled by
database
17

How do I use SAP HANA?


Following data down the rabbit hole

Storing data in SAP HANA

At its heart, SAP HANA is a SQL DBMS

> CREATE SCHEMA test


> CREATE TABLE test.myTable (a int)
> INSERT INTO mytable VALUES (1)

2013 SAP AG or an SAP affiliate company. All rights reserved.

19

Storing data in SAP HANA

Applications writing
directly into SAP HANA

Real-time replication using


SAP LT Replication Service

Message queue integration


with Sybase CEP

Data loaded from files


using IMPORT / INSERT

][ ][ ][

2013 SAP AG or an SAP affiliate company. All rights reserved.

Data loaded at certain events


using Business Objects Data Services

20

Storing data in SAP HANA

SAP HANA uses a hybrid store to combine


the benefits of row- and column-wise data
handling.

2013 SAP AG or an SAP affiliate company. All rights reserved.

Row

Column

21

Storing data in SAP HANA

SAP HANA has a safety net which ensures


the
durability of all data the persistency layer.

Data Stores

Persistency Layer
Sav
e
Poi
nt

Log
s

Backup/
Restore

2013 SAP AG or an SAP affiliate company. All rights reserved.

Backu
Backu
p
p

22

Using data in SAP HANA

SAP HANA speaks SQL and MDX use


Excel as your frontend if you like.

> SELECT a
FROM test.myTable;

2013 SAP AG or an SAP affiliate company. All rights reserved.

23

Using data in SAP HANA

You define views, to make data


easily accessible to everyone.

2013 SAP AG or an SAP affiliate company. All rights reserved.

24

Using data in SAP HANA

Attribute
View
Calculation View

Views enable real


real-time computing by
transforming
data on the fly.

T
T
T
T
T
T

Table

2013 SAP AG or an SAP affiliate company. All rights reserved.

Analytic View

25

Using data in SAP HANA

Query
Statement Processor

Execution plan

SELECT
FROM
WHERE

O
p

Calculation Engine

O
p

O
p

O
p

Data Stores
O
p

Views

O
p

O
p

O
p

Persistency Layer
Sav
e
Poi
nt

2013 SAP AG or an SAP affiliate company. All rights reserved.

Log
s

26

Using data in SAP HANA

Set Operations

Operation
Calculations on Data

Business Function Calls

Predictive Analytics Algorithms

R Procedure Calls

2013 SAP AG or an SAP affiliate company. All rights reserved.

Operations can be all sorts


of operations on data not just basic
SQL operations but also more complex logic

27

Main memory column store


Architecture & Technology

Database Technology Rowstore vs Columnstore


Row Store
stores tables by row
Att1

Att2

Att3

Column Store
stores tables by column
Att4

Att1

Att5

Tuple 1

Tuple 1

Tuple 2

Tuple 2

Tuple 3

Tuple 3

Tuple n

Tuple n

Application often processes single records at once




many selects and /or updates of single records

Application typically accesses the complete record

Columns contain mainly distinct values

Aggregations and fast searching not required

Small number of rows (e.g. configuration tables)


2013 SAP AG or an SAP affiliate company. All rights reserved.

Att2

Att3

Att4

Att5

Search and calculation on values of a few columns

Big number of columns

Big number of rows and columnar operations




aggregate, scan, etc.

High compression rates possible




Most columns contain only few distinct values


29

Database Technology
 Row and column based storage for a table (principle)
Column Store

Row Store

Table
Country

Product

Sales

US
US
JP
UK

Alpha
Beta
Alpha
Alpha

3.000
1.250
700
450

Row 1

Row 2

Row 3

Row 4

2013 SAP AG or an SAP affiliate company. All rights reserved.

US
Alpha
3.000
US
Beta
1.250
JP
Alpha
700
UK
Alpha
450

Product

US
US
JP
UK
Alpha
Beta
Alpha
Alpha
3 000
3.000
1.250
700
450

30

Database Technology
 Multiple data storage methods: Column Store I

Classical DB

HANA Column Store

Company
[CHAR50]

Region
[CHAR30]

Group
[CHAR5]

INTEL

USA

Siemens

Europe

Siemens

0
1
2
3

INTEL
Siemens
SAP
IBM

0 Germany
1 USA

0 A
1 B
2 C

Europe

SAP

Europe

SAP

Europe

IBM

USA

2013 SAP AG or an SAP affiliate company. All rights reserved.

Dictionary for attribute/


column Group

Index Vector
Stored in one memory chunk
=> data locality for fast scans

31

Database Technology
 How Data is Mapped to Memory
conceptual view
A

10

35

40

12

mapping to memory
1. organize by row
A 10 B 35

2. organize by column
A B C D E 10 35

2013 SAP AG or an SAP affiliate company. All rights reserved.

40 12

40

12

$
memory
address
$
memory
address

32

Column Table Structures


A table is represented by one or more columns
Each table column is represented by data array (aka index vector) containing value IDs of values in a dictionary,
dictionary, dictionary index and optionally inverted index

Column C2

Column C1
Data
Array

Dictionary
Vector

Inverted
Index

Dict
Index

Data
Array

Dict Value
Vector
Data
Array

Inverted

ColumnIndex
C3

Dict
DictValue
Index
Vector
Data
Array

Inverted

ColumnIndex
C4

Dictionary
Dict
Vector
Index

Inverted
Index

Dict
Index

2013 SAP AG or an SAP affiliate company. All rights reserved.

33

Column Structures and Terminology


Dictionary and column data vector (aka index vector n-bit compressed)
Dictionary index (for unsorted dictionaries in delta)
Inverted index (optional; for fast lookups, e.g., for primary key)

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Row
positions
(implicit)

5
3
2
3
4
1
6
2
0
4
1
2
2
0

0
1
2
3
4
5
6

Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco

Dictionary
Value Vector

0
1
2
3
4
5
6

Dictionary Index

9, 15
6, 11
3,8,12,13
2,4
5,10
1
7

Inverted Index

Column Data Array,


aka Index Vector

2013 SAP AG or an SAP affiliate company. All rights reserved.

34

Database Technology
 Column Store: Dictionary Compression

2013 SAP AG or an SAP affiliate company. All rights reserved.

35

Bitwise / bytewise compression of references

Option 1 : all references are 32 bit integers

Non CPU intensive operations

High memory consumption high memory bandwith


needed

Very simple algorithm

Option 2 : references are byte compressed


depending on dictionary size ( 1 byte in this
example)

Minimal CPU intensive operations

Medium memory consumption

Option 3 : references are bit compressed


depending on dictionary size ( 3 bit in this
example)

High CPU consumption

Very good memory consumption

Complex effective algorithms

2013 SAP AG or an SAP affiliate company. All rights reserved.

References
1
2
3
4
5
6
7
8
9
10
11
12
13
14

5
3
2
3
4
1
6
2
0
4
1
2
2
0

Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco

Column Data Array,


aka Index Vector
Row
positions
(implicit)

36

Architecture & Technology

Bitwise decompression
Native C and SSSE / AVX

C++ sample code for decompression of bitwise references


(nave coding)
unsigned int decompressValue( unsigned int arrayPos, unsigned int bitWidth, long long *column)
{
unsigned int valueId;
long long bitPos = arrayPos * bitWidth;
unsigned int integerPos = ((bitPos) >> 6); // divide by 64 and multiply by bitWidth
unsigned int startBit = bitPos % 64;
If ( startBit + bitWidth > 64)
{
int shift1 = startBit;
int shift2 = 7-shift1;
unsigned long mask=masks[bitWidth];
valueId = ((column[integerPos] >> startBit) | ((column[integerPos+1]) << shift2));
valueId = valueId & mask;
} else{
unsigned long mask = masks[bitWidth];
valueId = column[integerPos] >> startBit;
unsigned int pos=0;
valueId = valueId & mask;
}
return valueId;
}

2013 SAP AG or an SAP affiliate company. All rights reserved.

38

C++ sample code for decompression of bitwise references


(optimized version)

while (outBuffer < 100000) {


outBuffer[0] = data[0] & 0x7ful;
outBuffer[1] = (data[0] >> 7) & 0x7ful;
outBuffer[2] = (data[0] >> 14) & 0x7ful;
outBuffer[3] = (data[0] >> 21) & 0x7ful;
outBuffer[4] = (data[0] >> 28) & 0x7ful;
outBuffer[5] = (data[0] >> 35) & 0x7ful;
outBuffer[6] = (data[0] >> 42) & 0x7ful;
outBuffer[7] = (data[0] >> 49) & 0x7ful;
outBuffer[8] = (data[0] >> 56) & 0x7ful;
outBuffer[9] = ((data[0] >> 63) & 0x7ful) | ((data[1] & 0x3ful) << 1);
outBuffer[10] = (data[1] >> 6) & 0x7ful;
outBuffer[11] = (data[1] >> 13) & 0x7ful;
outBuffer[12] = (data[1] >> 20) & 0x7ful;
outBuffer[13] = (data[1] >> 27) & 0x7ful;
outBuffer[14] = (data[1] >> 34) & 0x7ful;
outBuffer[15] = (data[1] >> 41) & 0x7ful;
outBuffer[16] = (data[1] >> 48) & 0x7ful;
outBuffer[17] = (data[1] >> 55) & 0x7ful;
outBuffer[18] = ((data[1] >> 62) & 0x7ful) | ((data[2] & 0x1ful) << 2);
outBuffer[19] = (data[2] >> 5) & 0x7ful;

2013 SAP AG or an SAP affiliate company. All rights reserved.

39

Intel Advanced Vector Extensions


(Intel AVX) Growth

Performance / core

Future
extensions
Intel AVX2:
256-bit wide integer vectors
FMA (2x peak flops)
Gather Instructions

Half-float support, random


numbers
Intel Advanced Vector Extensions
2X peak flops: 256-bit floating-point vectors

Since 1999:
128-bit Vectors

2011

2012

2013

20??

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change
without notice.
Microarchitectures
2013
SAP AG or an SAPcode
affiliate
company.
AllBridge,
rights reserved.
Intel
name:
Sandy
Ivy Bridge and Haswell

40

SSE/AVX implementation : see attached PDF

2013 SAP AG or an SAP affiliate company. All rights reserved.

41

Next Gen Intel Xeon Processor E7 Family


Ivy Bridge-EX

3X Memory Capacity vs Prior Gen:


Up to 12TB in 8S node
SAP HANA Proof of concept:
CRM with 6TB in 4S node
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to
change without notice.

2013 SAP AG or an SAP affiliate company. All rights reserved.

42

Intel AVX2 with Haswell Architecture


Extends 128-bit integer vector instructions to 256-bit
Including:
 Intel SSE2, Intel Supplemental SSE3 and Intel SSE4

Floating Point Fused Multiply Add


Double peak FLOPS
Enhanced vectorization
Gather, Variable shifts, Powerful permutes

Intel AVX2 completes the 256-bit extensions started with Intel AVX: 256-bit
integer, cross-lane permutes, gather, FMA

Intel Streaming SIMD Extensions (Intel SSE)


2013 SAP
AG or an SAP affiliate company. All rights reserved.
Intel Advanced Vector Extensions 2 (Intel AVX2)
Intel Advanced Vector Extensions (Intel AVX)

43

Compression Building Blocks:


Packed Bit-Fields
Large number of integers, each with n number of bits
Example: 17-bit per entry:

C
...

65537

31455

6
128

4
4711

100000

0
42

Unpack
128

4711

31 different implementation
for each n from 1 to 32
Source: Lemke, et al. Speeding up queries in column
stores: a case for compression, DaWaK'10

2013 SAP AG or an SAP affiliate company. All rights reserved.

100000

42

17 bits

32 bits

44

Intel AVX2 Unpacking of Bit-Fields


Pseudo Code

Assembly

vector load v
from input array

vmovdqu xmm8, xmmword ptr[rax+rcx*1+0x11]


vinserti128 ymm9, ymm8, xmmword ptr
[rax+rcx*1+0x19], 0x01

byte shuffle v

vpshufb ymm10, ymm9, ymm1

vector shift v

vpsrlvd ymm11, ymm10, ymm0

vector and v

vpand ymm12, ymm11, ymm2

vector store v
in output array

vmovdqu ymmword ptr [r8+0x20], ymm12

New variable
shift instruction

Double the number of data elements vs. Intel SSE


Implementation takes advantage of new variable shift
Intel Streaming SIMD Extensions (Intel SSE)
2013
AG or Vector
an SAPExtensions
affiliate company.
rights reserved.
IntelSAP
Advanced
2 (Intel All
AVX2)

45

Intel AVX2 Unpacking - Performance


Intel AVX2

Intel SSE 4.1

decoded integers/cycle

3,5
3
2,5
2
1,5
1
0,5
0
0

10

15
Bit-Case #

20

25

30

Bit-field unpacking runs up to 1.6x faster on average


with Intel AVX2

Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013
Intel Streaming SIMD Extensions (Intel SSE)
2013
AG or Vector
an SAPExtensions
affiliate company.
rights reserved.
IntelSAP
Advanced
2 (Intel All
AVX2)

46

SAP HANA Complex Scan with Intel AVX2


Intel AVX2

scalar

1,8

decoded integers/cycle

1,6
1,4
1,2
1
0,8
0,6
0,4
0,2
0
0

10

15
Bit-Case #

20

25

30

Complex scan operation in SAP HANA runs up to 1.9x faster on


average with Intel AVX2

Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013

2013
AG or Vector
an SAPExtensions
affiliate company.
rights reserved.
IntelSAP
Advanced
2 (Intel All
AVX2)

47

Intel Transactional Synchronization Extensions


Intel TSX: Instruction set extensions for IA
Transactionally execute lock-protected critical sections
Execute without acquiring lock  expose hidden concurrency
Hardware manages transactional updates All or None
 Other threads cant observe intermediate transactional updates
 If lock elision cannot succeed, restart execution & acquire lock

Hardware support to enable lock elision


Focus on lock granularity optimizations
Fine grain performance at coarse grain effort

Intel TSX Exposes Concurrency through Lock Elision


Intel Architecture Instruction Set Extensions Programming Reference (http://software.intel.com/file/41604)
TSX)

2013
AG or an SAP Synchronization
affiliate company. AllExtensions
rights reserved.
Transactional
IntelSAP
(Intel

48

Intel TSX applied

Application with
Coarse
Grain Lock

scaling

Coarse Grain Lock +


Intel TSX

Scaling benefits of Intel TSX


Threads

Secondary benefits of Intel TSX

Fine Grain Locks +


Intel TSX
scaling

Same application with


Finer Grain Locks

Coarse Grain Lock

Fine Grain Locks


Threads

Fine Grain Behavior at Coarse Grain Effort

2013
SAP
AG or an SAP affiliate
company. All Extensions
rights reserved.
Intel
Transactional
Synchronization
(Intel

TSX)

49

Inverted index
Architecture & Technology

Inverted index

References

Inverted Index

Cupertino

9,14

San Jose

6,11

Palo Alto

3,8,12,13

Dublin

2,4

Freemont

5,10

Oakland

San Francisco

2013 SAP AG or an SAP affiliate company. All rights reserved.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

5
3
2
3
4
1
6
2
0
4
1
2
2
0

Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco

51

Inverted index compression

Properties of inverted index :

1)

Rowlist of one index item is read completely

Order inside the list is irrelevant

Delta encoding on each row list


Reason : make the row list numbes smaller ( for better binary compression)
Example : 3,8,12,13 -> 3, 5, 4, 1

2)

Golomb encoding on top of the delta lists


Golomb

coding is a lossless data compression method using a family of data compression codes invented

by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an
optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input
stream is significantly more likely than large values.

Rice

coding (invented by Robert F. Rice) denotes using a subset of the family of Golomb codes to produce a simpler

(but possibly suboptimal) prefix code. Rice used this set of codes in an adaptive coding scheme; "Rice coding" can refer
either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter
that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes
Rice codes convenient for use on a computer, since multiplication and

division by 2 can be implemented more

efficiently in binary arithmetic.

2013 SAP AG or an SAP affiliate company. All rights reserved.

52

Inverted index compression : Golomb Rice coding


To Golomb-code a number, find the quotient and remainder of division by the
divisor. Write the quotient in unary notation, then the remainder in truncated binary
notation. In practice, you need a stop bit after the quotient: if the quotient is written
as a sequence of zeroes, the stop bit is a one. The length of the remainder can be
determined from the divisor.
A Golomb-Rice code is a Golomb code where the divisor is a power of two,
enabling an efficient implementation using shifts and masks rather than division
and modulo.

2013 SAP AG or an SAP affiliate company. All rights reserved.

53

Inverted index compression : Golomb coding


Example with m(divisor) = 4
Value

Remainder

Code

1 00

1 01

1 10

1 11

0 1 00

0 1 01

0 1 10

0 1 11

00 1 00

00 1 01

10

00 1 10

11

00 1 11

12

000 1 00

13

000 1 01

14

000 1 10

15

000 1 11

2013 SAP AG or an SAP affiliate company. All rights reserved.

54

Inverted index compression : Golomb Rice coding

Golomb code of parameter m for positive integer n is


given by coding n div m (quotient) in unary and
n mod m (remainder) in binary.
When m is power of 2, a simple realization also known
as Rice code.
Example: n = 22, k = 2 (m = 4).
 n = 22 = 10110. Shift right n by k (= 2) bits. We get 101.
 Output 5 (for 101) 0s followed by 1. Then also output the
last k bits of N.
 So, Golomb-Rice code for 22 is 00000110.

Decoding is simple: count up to first 1. This gives us the


number 5. Then read the next k (=2) bits - 10 , and
n = m x 5 + 2 (for 10) = 20 + 2 = 22.
HANA Bluebook, p.53
2013 SAP AG or an SAP affiliate company. All rights reserved.

55

Block Inverted index


Idea :

Better compression of the inverted index ( still 30-50% of complete column size)

Scan ( at least of small blocks) are extremely fast

Column encoding is good for small values

Inverted Block Index ( with blocksize = 4)

Cupertino

3,4

San Jose

2,3

Palo Alto

1,2,3,4

Dublin

Freemont

2,3

Oakland

San Francisco

References

Better compression cause of


Smaller numbers
Smaller lists (

if multiple hits in one block)

1
2
3
4
5
6
7
8
9
10
11
12
13
14

5
3
2
3
4
1
6
2
0
4
1
2
2
0

Dictionary
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco

But : higher CPU costs and worse performance cause of additional block scan

2013 SAP AG or an SAP affiliate company. All rights reserved.

56

Dictionary compression & index


Architecture & Technology

Database Technology
 Dictionary definition
DEFINITION (STRING DICTIONARY).
A string dictionary is a read-only data structure that implements at least the following two
functions:
1) Given a value ID id, extract(id) returns the corresponding string in the dictionary.
2) Given a string str, locate(str) returns the unique value ID of str if str is in the
dictionary or the value ID of the first string greater than str otherwise.

An access to a string attribute value in a column-store database often corresponds to


an extract-operation in a string dictionary. Thus, it is important that extract operations
can be performed very fast.
A typical use case for the locate operation is a WHERE clause in an SQL statement
that compares a string attribute against a string value. Here, only one locate
operation is needed to execute the statement. Hence, the performance of the locate
operation is not as critical as the extract performance.

2013 SAP AG or an SAP affiliate company. All rights reserved.

58

Database Technology
 Column Store: Delta Compression For Strings

2013 SAP AG or an SAP affiliate company. All rights reserved.

59

Database Technology
 Column Store: other string dictionary compression schemas

Huffman / Hu-Tucker Compression : Hu-Tucker compression is used only if the order


preserving property is needed.
Bit Compression ( each character is represented as a ( small ) number of bits used if
just a subset of characters is needed in the dictionary
N-Gram Compression : Frequent 2-grams or 3-grams are replaced by 12 bit codes.
Re-Pair Compression : Re-Pair Compression using either 12 bits or 16 bits to store a
rule.

We can apply these compression schemes to two main dictionary data structures:

Array : One class of dictionary implementations is based on a simple consecutive array


containing the string data. Pointers to each string in this array are maintained in a
separate array.
Front Coding : The strings of a dictionary are divided into blocks, which are encoded
using Front Coding. The resulting blocks are then stored in a consecutive array. Pointers
to each block are maintained in a separate array. The prefix length values of one block
are stored in a header at the beginning of the block.

2013 SAP AG or an SAP affiliate company. All rights reserved.

60

Database Technology
 Column Store: string dictionary compression schemas

Inline Front Coding : In order to improve sequential access, a Front Coding variant
stores the prefix lengths interleaved with the string suffixes.

Front Coding with Difference to First : In order to trade some space for speed, another
Front Coding variant that stores the suffixes differing from the first string of a block
instead of the difference to the previous string can be used. Hence decompression of a
string essentially consists of two memcpys.

Fixed Length Array: For very fast access to small dictionaries, an array implementation
that does not need pointers to the string data can be used. For each string, the same
amount of space is allocated in a consecutive array.

Column-Wise Bit Compression : For columns with strings that all have the same length
and a similar structure, a specific compression scheme and be used. Divide the
dictionary into blocks. Then vertically partition each block into character columns, which
are then bit compressed.

2013 SAP AG or an SAP affiliate company. All rights reserved.

61

Dictionary Index : CEFS - Cache-Efficient Function Stores


Context and Idea

Wei Zhou, Ingo Mller, Robert Schulze

Sorted
(String) Dictionary
0

Abu Dhabi

Abuja

Accra

n-2

Yerevan

n-1

Zagreb

CEFS: Cache-Efficient Function Store, a read-optimized data structure for dictionary indexing
Idea 1: When we use only hashed values, we can save space and comparisons
Idea 2: Tune for maximum cache-efficiency
 Recursive data structure of multiple levels, each comprising an array of buckets

2013 SAP AG or an SAP affiliate company. All rights reserved.

62

CEFS - Cache-Efficient Function Stores


Design and Conclusions

Wei Zhou, Ingo Mller, Robert Schulze

Perfect hash
function

Encoded signatures (tag)

sigs

slot1

2013 SAP AG or an SAP affiliate company. All rights reserved.

slot2

slot3

slot4

63

CEFS - Cache-Efficient Function Stores


Design and Conclusions

Wei Zhou, Ingo Mller, Robert Schulze

Perfect hash
function

Encoded signatures (tag)

sigs

slot1

2013 SAP AG or an SAP affiliate company. All rights reserved.

slot2

slot3

slot4

64

References Compression
Architecture & Technology

Database Technology
Compression of Value ID Sequence

2013 SAP AG or an SAP affiliate company. All rights reserved.

66

Database Technology
Compression with run length encoding

Classical Row Store

NewDB Column Store:

NewDB Column Store:

Difficult to compress

Dictionary compressed

Run length compressed*

0
1
2
3

INTEL
Siemens
SAP
IBM

0 A
1 B
2 C

0
1
2
3

INTEL
Siemens
SAP
IBM

0 Europe
1 USA

0 A
1 B
2 C

1 x 0

1 x 1

1 x 0

2 x 1

4 x 0

1 x 1

2 x 2

1 x 1

1 x 2

1 x 3

Company
[CHAR50]

Region
[CHAR30]

Group
[CHAR5]

INTEL

USA

Siemens

Europe

Siemens

Europe

SAP

Europe

SAP

Europe

IBM

USA

0 Europe
1 USA

3 x 0

* Note that there is a variety of compression methods


and algorithms like run-length compression
(see Comparison of Compression Algorithms`)

2013 SAP AG or an SAP affiliate company. All rights reserved.

67

Decimal arithmetic
Some interesting algorithmic challenges

Decimal arithmetic
Decimals are defined as numbers with precision and scale

Precision is the number of decimal digits

Scale is the number of fraction digits

i.e. 965.23 could be represented with a decimal of precision=5, scale =2

There are 2 principal ways of implementing decimals

By internally use decfloat type

By internally use integers/ long / very long integers

Decfloat can be implemented as either a floating-point number or as a fixed-point number. In the fixedpoint case, the denominator would be set to a fixed power of ten. In the floating-point case, a variable
exponent would represent the power of ten to which the mantissa of the number is multiplied.

By using integer representation the big issue is rounding and overflow handling.
Example : a = 10.001 ( with p=5, s=3)

and b = 3.33 with (p=3, s=2)

We want to calculate a/b


what are the output type options
how to calculate the division for a output type (p=5, s=3) with internal integer representation ?
2013 SAP AG or an SAP affiliate company. All rights reserved.

69

Parallelization / Distribution
Architecture & Technology

Database Technology: Exploit Multi-Core Architectures by


parallelization of operations II

10

35

40

12

Vertical
concurrent processing on vertical partitions
(disjoint set of columns)
Horizontal
concurrent processing on horizontal partitions
(disjoint subset of rows)

10

40

split horizontally by blades B

35

12

C
2
Server 1

2013 SAP AG or an SAP affiliate company. All rights reserved.

Server 2

71

Database Technology:
Technology:
 Parallelization and partitioning over multiple nodes

Product

Group

Color

10

red

20

blue

30

green

40

red

50

red

60

red

Product

Group

Color

10

red

20

blue

30

green

Product

Group

Color

40

red

50

red

60

red

Node 2

2013 SAP AG or an SAP affiliate company. All rights reserved.

SAP HANA Management System

Node 1

Select * from table


where Group = A

72

Database Technology:
Technology:
 Example for parallelization in a Column Store

HANA Bluebook, p.14


2013 SAP AG or an SAP affiliate company. All rights reserved.

73

Distributed In-Memory Computing Engine

2013 SAP AG or an SAP affiliate company. All rights reserved.

74

Database Technology
Parallelization and horizontal partitioning over multiple nodes

Product

Group

Color

10

red

20

blue

30

green

40

red

50

red

60

red

Product

Group

Color

10

red

20

blue

30

green

Product

Group

Color

40

red

50

red

60

red

Node 2

2013 SAP AG or an SAP affiliate company. All rights reserved.

SAP NewDB Management System

Node 1

Select * from table


where Group = A

75

Database Technology
Exploit multi-core architectures by parallelization of
operations

10

35

40

12

Vertical
concurrent processing on vertical partitions
(disjoint set of columns)
Horizontal
concurrent processing on horizontal partitions
(disjoint subset of rows)
A

10

40

split horizontally by blades B

35

12

C
2
Server 1

2013 SAP AG or an SAP affiliate company. All rights reserved.

Server 2

76

Alternatives for Parallel Processing


Standard Solution

Alternative 1

Alternative 2

Serial Execution

Inter-Operator
Parallelism

Intra-Operator
Parallelism

Todo:
Range-aware algorithms
Parallelism controller
Adjust plan generator

2013 SAP AG or an SAP affiliate company. All rights reserved.

Todo:
Parallel algorithms
New internal APIs
Parallelism controller

77

YediDB : very first steps I


Class COLUMN
Defines and holds the data for one column, which is mainly the dictionary and the
reference vector. Dictionary is a seperate class. The reference vector is blocked into
chunks of 1 million references. Each block is stored in the class
REFERENCE_BLOCK
Class REFERENCE_BLOCK
Stores one block of references into the dictionary ( one block has the default size of
1 million references)
Class DICTIONARY
Stores the dictionary of a column in the member sortedData_. Every column has a
dictionary independant of the type of the column

2013 SAP AG or an SAP affiliate company. All rights reserved.

78

YediDB : very first steps


Class TABLE
Stores the set of columns of a table, the table name and the maximal row number of
this table
Class CSV
Handles the insert of an CSV file into YediDB
Test1.lua
Test program which should run after this first exercise

2013 SAP AG or an SAP affiliate company. All rights reserved.

79

YediDB : first exercise


Implement 4 functions in the YediDB sceleton
1)

function DICTIONARY:addValueToSortedDict(value, pos)

2)

function DICTIONARY:searchDict(value)

3)

function COLUMN:fillColumnInitial( columnValues )

4)

function COLUMN:searchRowIdsByValueId(txId, valueId,


highestRowNumber

2013 SAP AG or an SAP affiliate company. All rights reserved.

80

References
Ingo Mller, Peter Sanders, Robert Schulze, Wei Zhou. Retrieval and Perfect Hashing using Fingerprinting.
SEA 2014, Copenhagen, Denmark, June/July 2014.
Ingo Mller, Cornelius Ratsch, Franz Faerber. Adaptive String Dictionary Compression in In-Memory
Column-Store Database Systems. EDBT 2014, Athens, Greece, March 2014.
Thomas Willhalm, Ismail Oukid, Ingo Mller, Franz Faerber. Vectorizing Database Column Scans with
Complex Predicates. ADMS 2013, Riva del Garda, Italy, August 2013.
Jonathan Dees, Peter Sanders. Efficient Many-Core Query Execution in Main Memory Column-Stores.
ICDE 2013, Brisbane, Australia, April 8-12, 2013
Sikka, V., Frber, F., Lehner, W., Cha, S. K., Peh, T., & Bornhvd, C. (2012). Efficient transaction processing
in SAP HANA database. SIGMOD Conference (p. 731).
Frber, F., May, N., Lehner, W., Groe, P., Mller, I., Rauhe, H., & Dees, J. (2012). The SAP HANA
Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 28-33.
Frber, F., Cha, S. K., Primsch, J., Bornhvd, C., Sigg, S., & Lehner, W. (2011). SAP HANA Database - Data
Management for Modern Business Applications. SIGMOD Record, 40(4), 45-51.
Lemke, C., Sattler, K.-U., Faerber, F., & Zeier, A. (2010). Speeding up queries in column stores: a case for
compression, 117-129.
More HANA publications at

http://scn.sap.com/docs/DOC-26787

2013 SAP AG or an SAP affiliate company. All rights reserved.

81

Вам также может понравиться