Вы находитесь на странице: 1из 81
HANA database lectures March 2014
HANA database lectures
March 2014
Outline Part 1 Motivation - Why main memory processing What is main memory computing SAP

Outline Part 1

Motivation - Why main memory processing

What is main memory computing

SAP HANA overview

Architecture Usage ( SQL, Hana studio)

Main memory Column Store

Row vs column store Data model Basic operations ( C ++- scan, AVX/SSSE scan) Compression ( references, dictionary, index)

Distribution

Scale out vs scale up Data split Parallelization

Outline Part 2 The insert/update problem : Delta table data model Data access ( insert

Outline Part 2

The insert/update problem : Delta table

data model Data access ( insert only) / cost model Data visibility L2D – the state of the art approach for a delta table L1D - discussions

Transaction managememt

UDIV handling MVCC Tx lists Distributed transactions Consistency models incl. eventual consistency

Central operators

Joins ( i.e. semi join reducer) Parallel aggregation Sort

Outline Part 3 Optimizer and query execution Execution plans Plan generation Execution engine Optimizer models

Outline Part 3

Optimizer and query execution

Execution plans Plan generation Execution engine Optimizer models SQL versions ( SQL 92, 99 …)

P

e s s e

i

r

t

n

cy

& d

e

lt

a

Mapping from main memory structures to persistency pages ( PAX) Logging Shadow page concept

Text & GIS extensions

Text data model & operations GIS data model & operations

YediDB – a first prototype in LUA

Why memory processing
Why memory processing
Why memory processing Example 1 : multi threading Generate multiple threads which add to one

Why memory processing

Example 1 : multi threading

Generate multiple threads which add to one global (atomic) variable Compare against local counters and summing up afterwards

Example 2 : cache line effects

Generate multiple threads which have local variables, but shard cache lines Compare against local variables with separate cache lines

Example 3: memory locality

Create an array of fixed sized strings ( i.e. 10 bytes and do a full table scan) Generate the array either by inplace strings or by pointers to strings Compare the 2 versions

What is main memory processing
What is main memory processing
What is In-Memory computing Orchestrating technology innovations Dramatically improved hardware economics and technology

What is In-Memory computing Orchestrating technology innovations

Dramatically improved hardware economics and technology innovations in software have made it possible for SAP to deliver on its vision of the Real- Time Enterprise with in-memory business applications

HW Technology Innovations SAP SW Technology Innovations Row and Column Store Multi-Core Architecture (8 CPU
HW Technology Innovations
SAP SW Technology Innovations
Row and Column Store
Multi-Core Architecture
(8 CPU x 15 Cores per blade)
Massive parallel scaling with many
blades
Compression
Partitioning
64bit address space – 3-6 TB in
current servers
No Aggregate Tables
Dramatic decline in
price/performance
Insert Only on Delta
In-Memory computing Use cache-conscious data-structures and algorithms Programming against a new scarce resource… CPU

In-Memory computing Use cache-conscious data-structures and algorithms

Programming against a new scarce resource…

CPU Core CPU Cache Main Memory Disk
CPU
Core
CPU Cache
Main Memory
Disk
scarce resource… CPU Core CPU Cache Main Memory Disk Performance bottleneck today: CPU waiting for data
scarce resource… CPU Core CPU Cache Main Memory Disk Performance bottleneck today: CPU waiting for data

Performance bottleneck today:

CPU waiting for data to be loaded from memory into cache

Performance bottleneck in the past:

Disk I/O

… requires cache-conscious data-structures and algorithms.

In-Memory computing Challenges of In-memory Computing Challenge 1: Parallelism ! Take advantage of tens, hundreds

In-Memory computing Challenges of In-memory Computing

In-Memory computing Challenges of In-memory Computing Challenge 1: Parallelism ! Take advantage of tens, hundreds of

Challenge 1:

Parallelism! Take

advantage of tens, hundreds of cores

Challen e 2: Data locality!

g

Yes, DRAM is 100,000 times faster than disk…

But DRAM access is still 4- 60 times slower than on- chip caches

In-Memory computing Delegation of data intense operations to the in-memory computing Today‘s applications execute many

In-Memory computing Delegation of data intense operations to the in-memory computing

Today‘s applications execute many data intense operations in the application layer

many data intense operations in the application layer Application Layer Data Layer High performance apps delegate
many data intense operations in the application layer Application Layer Data Layer High performance apps delegate

Application Layer

Data Layer

in the application layer Application Layer Data Layer High performance apps delegate data intense operations to

High performance apps delegate data intense operations to the in-memory computing

In-Memory Computing Imperative: Avoid movement of detailed data Calculate first, then move results
In-Memory Computing Imperative:
Avoid movement of detailed data
Calculate first, then move results
In-Memory computing Delegation of data intense operations to the in-memory computing Traditional In-Memory Computing Mass

In-Memory computing Delegation of data intense operations to the in-memory computing

Traditional

intense operations to the in-memory computing Traditional In-Memory Computing Mass data Application Database Mass data

In-Memory Computing

to the in-memory computing Traditional In-Memory Computing Mass data Application Database Mass data © 2013 SAP
Mass data
Mass data

Application

Database

Mass data
Mass data
In memory computing - reasoning Decrease of DRAM prices Increase of computing power ( multicore)

In memory computing - reasoning

Decrease of DRAM prices

Increase of computing power ( multicore)

Upcoming NVM technologies

In memory computing
In memory
computing

Big data performance requirements

Advances in network technologies

Success of sensor technologies

Expectations of mobile users

Main memory technology Is the backbone of all future Engine developments

Transactional memories Slow improvements of memory bandwidth

SAP HANA Overview HANA Development February 2014
SAP HANA Overview
HANA Development
February 2014
SAP HANA Software component view SQL SQL Script MDX Other Text Analytics Planning + Consolidation

SAP HANA Software component view

SQL
SQL
SQL Script
SQL
Script
MDX
MDX
Other
Other
Text Analytics Planning + Consolidation Enterprise search Data quality , Genome …
Text Analytics
Planning + Consolidation
Enterprise search
Data quality , Genome …
Application Function Libraries Business Function Library Predictive Analysis Library
Application Function
Libraries
Business Function Library
Predictive Analysis Library
Parallel Calculation engine
Parallel Calculation engine
Relational Stores Row based Text, GIS, Graph, non SQL stores Columnar Managed Appliance
Relational Stores
Row based
Text, GIS, Graph,
non SQL stores
Columnar
Managed Appliance

Analytical and Special interfaces

Application logic extensions

Parallel data flow computing model

Multiple in-memory stores

Appliance Packaging

SAP HANA Deployment view Single host configuration Multi-node cluster configuration SAP HANA Appliance SAP HANA

SAP HANA Deployment view

Single host configuration

SAP HANA Deployment view Single host configuration Multi-node cluster configuration SAP HANA Appliance SAP HANA

Multi-node cluster configuration

Single host configuration Multi-node cluster configuration SAP HANA Appliance SAP HANA Database Name Server Index
Single host configuration Multi-node cluster configuration SAP HANA Appliance SAP HANA Database Name Server Index

SAP HANA Appliance

SAP HANA Database

Name Server

Index Server

Statistics Server

Preprocessor

XS Engine

SAP HANA Studio Repository

SAP Host Agent

Software Update Manager

Node 2 Index Server Preprocessor SAP Host Agent
Node 2
Index
Server
Preprocessor
SAP Host
Agent

Node n Index Server Preprocessor SAP Host Agent
Node n
Index
Server
Preprocessor
SAP Host
Agent
Shared persistency for fail-over and recovery
Shared persistency for fail-over and recovery

Maintains landscape information Holds data and executes all operations Collects performance data about HANA Text analysis pre-processor

Extended Application Services

Repository for HANA Studio updates

Enables remote start/stop

Manages SW updates for HANA

In-Memory computing Security implications Traditional Client Application S erver Database 3 tier architecture: •

In-Memory computing Security implications

Traditional

In-Memory computing Security implications Traditional Client Application S erver Database 3 tier architecture: •

Client

In-Memory computing Security implications Traditional Client Application S erver Database 3 tier architecture: •

Application

S erver

implications Traditional Client Application S erver Database 3 tier architecture: • Users exist in

Database

3 tier architecture:

Users exist in application server only

Authorization is handled by application server

DB is accessed with technical user

Security is handled by application server

In-Memory Computing

is handled by application server In-Memory Computing Client HANA 2 tier architecture: • Users log on

Client

is handled by application server In-Memory Computing Client HANA 2 tier architecture: • Users log on

HANA

2 tier architecture:

Users log on directly to HANA

Users exist in HANA

Authorization is handled by HANA

Security is handled by database

How do I use SAP HANA? Following data down the rabbit hole

How do I use SAP HANA?

Following data down the rabbit hole

How do I use SAP HANA? Following data down the rabbit hole
Storing data in SAP HANA At its heart, SAP HANA is a SQL DBMS… >

Storing data in SAP HANA

At its heart, SAP HANA is a SQL DBMS…

> CREATE SCHEMA test > CREATE TABLE test.myTable (a int) > INSERT INTO mytable VALUES
> CREATE SCHEMA test
> CREATE TABLE test.myTable (a int)
> INSERT INTO mytable VALUES (1)
Storing data in SAP HANA Applications writing directly into SAP HANA Real-time replication using SAP

Storing data in SAP HANA

Storing data in SAP HANA Applications writing directly into SAP HANA Real-time replication using SAP LT

Applications writing directly into SAP HANA

Real-time replication using SAP LT Replication Service

HANA Real-time replication using SAP LT Replication Service Message queue integration with Sybase CEP ][ ][
HANA Real-time replication using SAP LT Replication Service Message queue integration with Sybase CEP ][ ][

Message queue integration with Sybase CEP

][ ][ ][

Service Message queue integration with Sybase CEP ][ ][ ][ Data loaded from files using IMPORT
Service Message queue integration with Sybase CEP ][ ][ ][ Data loaded from files using IMPORT

Data loaded from files using IMPORT / INSERT

Data loaded at certain events using Business Objects Data Service

Storing data in SAP HANA SAP HANA uses a hybrid store to combine the benefits

Storing data in SAP HANA

SAP HANA uses a hybrid store to combine the benefits of row- and column-wise data handling.

Row

Column

benefits of row- and column-wise data handling. Row Column © 2013 SAP AG or an SAP
Storing data in SAP HANA Data Stores Persistency Layer Sav e Log Poi s nt

Storing data in SAP HANA

Data Stores
Data Stores
Persistency Layer Sav e Log Poi s nt
Persistency Layer
Sav
e Log
Poi
s
nt

SAP HANA has a safety net which ensures the durability of all data – the persistency layer .

the durability of all data – the persistency layer . Backup/ Restore Backu p © 2013

Backup/

Restore

of all data – the persistency layer . Backup/ Restore Backu p © 2013 SAP AG

Backu

p

Using data in SAP HANA SAP HANA speaks SQL and MDX – use Excel as

Using data in SAP HANA

Using data in SAP HANA SAP HANA speaks SQL and MDX – use Excel as your

SAP HANA speaks SQL and MDX – use Excel as your frontend if you like.

> SELECT a FROM test.myTable;
> SELECT a
FROM test.myTable;
Using data in SAP HANA You define views, to make data easily accessible to everyone.

Using data in SAP HANA

You define views, to make data easily accessible to everyone.

define views, to make data easily accessible to everyone. © 2013 SAP AG or an SAP
Using data in SAP HANA Attribute View T T T T T T T T

Using data in SAP HANA

Attribute

View T T T T T T T T T T T Table
View
T
T
T
T
T
T
T
T
T
T
T
Table

Calculation View

Analytic View

Views enable real real-time computing by transforming data on the fly.

Using data in SAP HANA Statement Processor Calculation Engine Data Stores Query SELECT … FROM

Using data in SAP HANA

Statement Processor Calculation Engine Data Stores
Statement Processor
Calculation Engine
Data Stores
Query SELECT … FROM … WHERE …
Query
SELECT …
FROM …
WHERE …

Execution plan

O p O O O p p p O p O O O p p
O
p
O
O
O
p
p
p
O
p
O
O
O
p
p
p
Persistency Layer Sav e Log Poi s nt
Persistency Layer
Sav
e Log
Poi
s
nt

Views

O p p p Persistency Layer Sav e Log Poi s nt Views © 2013 SAP
Using data in SAP HANA Operation R Procedure Calls Set Operations Calculations on Data Business

Using data in SAP HANA

Operation
Operation

R Procedure Calls

Set Operations

Calculations on Data

Business Function Calls

Predictive Analytics Algorithms

Operations can be all sorts of operations on data – not just basic SQL operations but also more complex logic

Main memory column store Architecture & Technology
Main memory column store Architecture & Technology

Main memory column store

Architecture & Technology

Main memory column store Architecture & Technology
DatabaseDatabase TechnologyTechnology RowstoreRowstore vsvs ColumnstoreColumnstore Row Store Column Store stores tables by

DatabaseDatabase TechnologyTechnology RowstoreRowstore vsvs ColumnstoreColumnstore

Row Store

Column Store

stores tables by column

stores tables by row Att1 Att2 Att3 Att4 Att5 Tuple 1 Tuple 2 Tuple 3
stores tables by row
Att1
Att2
Att3
Att4
Att5
Tuple 1
Tuple 2
Tuple 3
Tuple n
Att1 Att2 Att3 Att4 Att5 Tuple 1 Tuple 2 Tuple 3 Tuple n
Att1
Att2
Att3
Att4
Att5
Tuple 1
Tuple 2
Tuple 3
Tuple n

Application often processes single records at once many selects and /or updates of single records Application typically accesses the complete record Columns contain mainly distinct values Aggregations and fast searching not required Small number of rows (e.g. configuration tables)

Search and calculation on values of a few columns Big number of columns Big number of rows and columnar operations aggregate, scan, etc. High compression rates possible Most columns contain only few distinct values

DatabaseDatabase TechnologyTechnology Row and column based storage for a table (principle ) Column Store T

DatabaseDatabase TechnologyTechnology Row and column based storage for a table (principle)

Column Store

Table

Row Store

Country

Product

Sales

US

US

Alpha

3.000

US

Beta

1.250

JP

Alpha

700

UK

Alpha

450

 

US

Row 1

Alpha

3.000

 

US

Row 2

Beta

1.250

 

JP

Row 3

Alpha

700

UK

Row 4

Alpha

450

 

US

US

JP

UK

 

Alpha

Beta

Product

Alpha

Alpha

3.000

1.250

700

450

DatabaseDatabase TechnologyTechnology MultipleMultiple datadata storagestorage methods:methods: ColumnColumn StoreStore II

DatabaseDatabase TechnologyTechnology

MultipleMultiple datadata storagestorage methods:methods: ColumnColumn StoreStore II

Classical DB

Company

Region

Group

[CHAR50]

[CHAR30]

[CHAR5]

INTEL

USA

A

Siemens

Europe

B

Siemens

Europe

C

SAP

Europe

A

SAP

Europe

A

IBM

USA

A

HANA Column Store

0 INTEL Dictionary for attribute/ 0 A 1 Siemens column „Group“ 1 B 2 SAP
0 INTEL
Dictionary for attribute/
0 A
1 Siemens
column „Group“
1 B
2 SAP
0 Germany
2 C
3 IBM
1 USA
0
1
0
1
0
1
Index Vector
Stored in one memory chunk
1
0
2
=> data locality for fast scans
2
0
0
2
0
0
3
1
0
DatabaseDatabase TechnologyTechnology HowHow Data is Mapped to Memory conceptual view A 10 € B 35

DatabaseDatabase TechnologyTechnology

HowHow Data is Mapped to Memory

conceptual view

A

10

B

35

$

C

2

D

40

E

12

$

1. organize by row

€ D 40 € E 12 $ 1. organize by row mapping to memory A 10

mapping to memory

A 10 € B 35 $ C 2 € D 40 € E 12 $
A
10
B
35
$
C
2
D
40
E
12
$
memory
2.
organize by column
address
A
B C
D E
10 35
2 40
12
€ $
€ $
memory
address
©
2013 SAP AG or an SAP affiliate company. All rights reserved.
32
Column Table Structures A table is represented by one or more columns Each table column

Column Table Structures

A table is represented by one or more columns

Each table column is represented by data array (aka index vector) containing value IDs of values in a dictionary, dictionary, dictionary index and optionally inverted index

Column C1 Inverted Dictionary Data Index Vector Array Dict Index
Column C1
Inverted
Dictionary
Data
Index
Vector
Array
Dict
Index
Column C2 Inverted Dict Value Data Column C3 Index Array Vector Inverted Dict Dict Value
Column C2
Inverted
Dict Value
Data
Column C3
Index
Array
Vector
Inverted
Dict Dict Value
Data
Column
C4
Index
Array
Vector Index
Inverted
Dictionary Dict
Data
Index
Vector
Index
Array
Dict
Index
Column Structures and Terminology Dictionary and column data vector (aka index vector – n-bit compressed)

Column Structures and Terminology

Dictionary and column data vector (aka index vector – n-bit compressed)

Dictionary index (for unsorted dictionaries in delta)

Inverted index (optional; for fast lookups, e.g., for primary key)

1

5

2

3

3

2
2

4

3

5

4

6

1

7

6

8

2
2

9

0

10

4

11

1

12

2
2

13

2
2

14

0

0 Cupertino 1 San Jose 2 3 Dublin 4 Fremont 5 Oakland 6 San Francisco
0
Cupertino
1
San Jose
2
3
Dublin
4
Fremont
5
Oakland
6
San Francisco

Dictionary

Value Vector

Row

positions

(implicit)

Column Data Array, aka Index Vector

Row positions (implicit) Column Data Array, aka Index Vector Dictionary Index 0 9, 15 1 6,

Dictionary Index

0

9, 15

1

6, 11

2

3,8,12,13
3,8,12,13

3

2,4

4

5,10

5

1

6

7

Inverted Index

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: Dictionary Compression © 2013 SAP AG or an SAP

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: Dictionary Compression

ColumnColumn Store:Store: Dictionary Compression © 2013 SAP AG or an SAP affiliate company. All rights
ColumnColumn Store:Store: Dictionary Compression © 2013 SAP AG or an SAP affiliate company. All rights
BitwiseBitwise // bytewisebytewise compressioncompression ofof referencesreferences Option 1 : all references are 32 bit

BitwiseBitwise // bytewisebytewise compressioncompression ofof referencesreferences

Option 1 : all references are 32 bit integers

Non CPU intensive operations

High memory consumption – high memory bandwith needed

Very simple algorithm

Option 2 : references are byte compressed depending on dictionary size ( 1 byte in this example)

Minimal CPU intensive operations

Medium memory consumption

Option 3 : references are bit compressed depending on dictionary size ( 3 bit in this example)

High CPU consumption

Very good memory consumption

Complex effective algorithms

References

1 5 2 3 3 2 4 3 5 4 6 1 7 6 8
1 5
2 3
3 2
4 3
5 4
6 1
7 6
8 2
9 0
10 4
11 1
12 2
13 2
14 0

Column Data Array, aka Index Vector

Row

positions

(implicit)

Dictionary

Cupertino San Jose Palo Alto Dublin Fremont Oakland San Francisco
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
Architecture & Technology Bitwise decompression Native C and SSSE / AVX
Architecture & Technology Bitwise decompression Native C and SSSE / AVX

Architecture & Technology

Bitwise decompression

Native C and SSSE / AVX

Architecture & Technology Bitwise decompression Native C and SSSE / AVX
C++C++ samplesample codecode forfor decompressiondecompression ofof bitwisebitwise referencreferenceses (naïve(naïve

C++C++ samplesample codecode forfor decompressiondecompression ofof bitwisebitwise referencreferenceses (naïve(naïve coding)coding)

unsigned int decompressValue( unsigned int arrayPos, unsigned int bitWidth, long long *column)

{

 

unsigned int valueId; long long bitPos = arrayPos * bitWidth;

unsigned int integerPos = ((bitPos) >> 6); // divide by 64 and multiply by bitWidth unsigned int startBit = bitPos % 64; If ( startBit + bitWidth > 64)

{

int shift1 = startBit; int shift2 = 7-shift1; unsigned long mask=masks[bitWidth]; valueId = ((column[integerPos] >> startBit) | ((column[integerPos+1]) << shift2)); valueId = valueId & mask;

} else{

unsigned long mask = masks[bitWidth]; valueId = column[integerPos] >> startBit; unsigned int pos=0; valueId = valueId & mask;

}

return valueId;

}

C++C++ samplesample codecode forfor decompressiondecompression ofof bitwisebitwise referencreferenceses

C++C++ samplesample codecode forfor decompressiondecompression ofof bitwisebitwise referencreferenceses (optimized(optimized version)version)

while (outBuffer < 100000) { outBuffer[0] = data[0] & 0x7ful; outBuffer[1] = (data[0] >> 7) & 0x7ful; outBuffer[2] = (data[0] >> 14) & 0x7ful; outBuffer[3] = (data[0] >> 21) & 0x7ful; outBuffer[4] = (data[0] >> 28) & 0x7ful; outBuffer[5] = (data[0] >> 35) & 0x7ful; outBuffer[6] = (data[0] >> 42) & 0x7ful; outBuffer[7] = (data[0] >> 49) & 0x7ful; outBuffer[8] = (data[0] >> 56) & 0x7ful; outBuffer[9] = ((data[0] >> 63) & 0x7ful) | ((data[1] & 0x3ful) << 1); outBuffer[10] = (data[1] >> 6) & 0x7ful; outBuffer[11] = (data[1] >> 13) & 0x7ful; outBuffer[12] = (data[1] >> 20) & 0x7ful; outBuffer[13] = (data[1] >> 27) & 0x7ful; outBuffer[14] = (data[1] >> 34) & 0x7ful; outBuffer[15] = (data[1] >> 41) & 0x7ful; outBuffer[16] = (data[1] >> 48) & 0x7ful; outBuffer[17] = (data[1] >> 55) & 0x7ful; outBuffer[18] = ((data[1] >> 62) & 0x7ful) | ((data[2] & 0x1ful) << 2); outBuffer[19] = (data[2] >> 5) & 0x7ful;

Intel ® Advanced Vector Extensions (Intel ® AVX) Growth Future extensions Intel ® AVX2: 256-bit
Intel ® Advanced Vector Extensions (Intel ® AVX) Growth Future extensions Intel ® AVX2: 256-bit
Intel ® Advanced Vector Extensions
(Intel ® AVX) Growth
Future
extensions
Intel ® AVX2:
256-bit wide integer vectors
FMA (2x peak flops)
“Gather” Instructions
Half-float support, random
numbers
Intel ® Advanced Vector Extensions
2X peak flops: 256-bit floating-point vectors
Since 1999:
128-bit Vectors
2011
2012
2013
20??
Perfo rmance / core

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel ©

2013 SAP AG or an SAP affiliate company. All rights reserved.

®

Microarchitectures code name: Sandy Bridge, Ivy Bridge and Haswell

40

SSE/AVXSSE/AVX implementationimplementation :: seesee attachedattached PDFPDF © 2013 SAP AG or an SAP affiliate

SSE/AVXSSE/AVX implementationimplementation :: seesee attachedattached PDFPDF

Next Gen Intel® Xeon® Processor E7 Family “Ivy Bridge-EX” 3X Memory Capacity vs Prior Gen:

Next Gen Intel® Xeon® Processor E7 Family

“Ivy Bridge-EX”

3X Memory Capacity vs Prior Gen: Up to 12TB in 8S node SAP HANA Proof
3X
Memory Capacity vs Prior Gen:
Up to 12TB in 8S node
SAP HANA Proof of concept:
CRM with 6TB in 4S node
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to
change without notice.
2013 SAP AG or an SAP affiliate company. All rights reserved.
42

©

Intel ® AVX2 with Haswell Architecture Extends 128-bit integer vector instructions to 256-bit Including: Intel

Intel ® AVX2 with Haswell Architecture

Extends 128-bit integer vector instructions to 256-bit Including:

Intel ® SSE2, Intel Supplemental SSE3 and Intel SSE4

Floating Point Fused Multiply Add Double peak FLOPS

Enhanced vectorization Gather, Variable shifts, Powerful permutes

Intel AVX2 completes the 256-bit extensions started with Intel AVX: 256-bit integer, cross-lane permutes, gather,
Intel AVX2 completes the 256-bit extensions started with Intel AVX: 256-bit
integer, cross-lane permutes, gather, FMA

©

Intel

® Streaming SIMD Extensions (Intel ® SSE)

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel

®

Advanced Vector Extensions 2 (Intel

®

AVX2)

Intel ® Advanced Vector Extensions (Intel ® AVX)

43

Compression Building Blocks: Packed Bit-Fields Large number of integers, each with n number of bits

Compression Building Blocks:

Packed Bit-Fields

Large number of integers, each with n number of bits

Example: 17-bit per entry:

F E D C B A 9 8 7 6 5 4 3 2 1
F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
65537
31455
128
4711
100000
42
Unpack
128
4711
100000
42
31 different implementation
for each n from 1 to 32
17 bits
Source: Lemke, et al. Speeding up queries in column
stores: a case for compression, DaWaK'10
32 bits
Intel ® AVX2 Unpacking of Bit-Fields Pseudo Code Assembly vector load v from input array

Intel ® AVX2 Unpacking of Bit-Fields

Pseudo Code

Assembly

vector load v from input array

vmovdqu xmm8, xmmword ptr[rax+rcx*1+0x11] vinserti128 ymm9, ymm8, xmmword ptr [rax+rcx*1+0x19], 0x01

byte shuffle v

vpshufb ymm10, ymm9, ymm1

vector shift v

vpsrlvd ymm11, ymm10, ymm0

New variable shift instruction
New variable
shift instruction

vector and v

vpand ymm12, ymm11, ymm2

vector store v in output array

vmovdqu ymmword ptr [r8+0x20], ymm12

Double the number of data elements vs. Intel ® SSE Implementation takes advantage of new variable shift

Intel ® Streaming SIMD Extensions (Intel ® SSE)

©

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel

®

Advanced Vector Extensions 2 (Intel

®

AVX2)

45

Intel ® AVX2 Unpacking - Performance Intel AVX2 Intel SSE 4.1 4 3,5 3 2,5

Intel ® AVX2 Unpacking - Performance

Intel AVX2 Intel SSE 4.1 4 3,5 3 2,5 2 1,5 1 0,5 0 0
Intel AVX2
Intel SSE 4.1
4
3,5
3
2,5
2
1,5
1
0,5
0
0
5
10
15
20
25
30
Bit-Case #
decoded integers/cycle
Bit-field unpacking runs up to 1.6x faster on average with Intel ® AVX2
Bit-field unpacking runs up to 1.6x faster on average
with Intel ® AVX2

Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013

Intel ® Streaming SIMD Extensions (Intel ® SSE)

©

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel

®

Advanced Vector Extensions 2 (Intel

®

AVX2)

46

SAP HANA Complex Scan with Intel ® AVX2 Intel AVX2 scalar 1,8 1,6 1,4 1,2

SAP HANA Complex Scan with Intel ® AVX2

Intel AVX2 scalar 1,8 1,6 1,4 1,2 1 0,8 0,6 0,4 0,2 0 0 5
Intel AVX2
scalar
1,8
1,6
1,4
1,2
1
0,8
0,6
0,4
0,2
0
0
5
10
15
20
25
30
Bit-Case #
decoded integers/cycle
Complex scan operation in SAP HANA runs up to 1.9x faster on average with Intel
Complex scan operation in SAP HANA runs up to 1.9x faster on
average with Intel ® AVX2

Source: Willhalm et al. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013

©

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel

®

Advanced Vector Extensions 2 (Intel

®

AVX2)

47

Intel ® Transactional Synchronization Extensions Intel ® TSX: Instruction set extensions for IA ‡

Intel ® Transactional Synchronization Extensions

Intel ® TSX: Instruction set extensions for IA

Transactionally execute lock-protected critical sections Execute without acquiring lock expose hidden concurrency Hardware manages transactional updates – All or None

Other threads can’t observe intermediate transactional updates If lock elision cannot succeed, restart execution & acquire lock

Hardware support to enable lock elision

Focus on lock granularity optimizations Fine grain performance at coarse grain effort

Intel ® TSX Exposes Concurrency through Lock Elision
Intel ® TSX Exposes Concurrency through Lock Elision

Intel ® Architecture Instruction Set Extensions Programming Reference (http://software.intel.com/file/41604)

©

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel ® Transactional Synchronization Extensions (Intel ® TSX)

48

Intel ® TSX applied Coarse Grain Lock + Intel ® TSX Application with Grain Lock

Intel ® TSX applied

Coarse Grain Lock + Intel ® TSX Application with Grain Lock Coarse Scaling benefits of
Coarse Grain Lock +
Intel ® TSX
Application with
Grain Lock
Coarse
Scaling benefits of Intel ® TSX
Threads
Coarse Grain Lock
scaling
Same application with Finer Grain Locks Fine Grain Locks + Intel ® TSX Secondary benefits
Same application with
Finer Grain Locks
Fine Grain Locks +
Intel ® TSX
Secondary benefits of Intel ® TSX
Fine Grain Locks
Threads
Fine Grain Behavior at Coarse Grain Effort
scaling

©

2013 SAP AG or an SAP affiliate company. All rights reserved.

Intel® Transactional Synchronization Extensions (Intel ® TSX)

49

Inverted index Architecture & Technology
Inverted index Architecture & Technology

Inverted index

Architecture & Technology

Inverted index Architecture & Technology
InvertedInverted indexindex Inverted Index 0 Cupertino 9,14 1 San Jose 6,11 2 Palo Alto 3,8,12,13

InvertedInverted indexindex

Inverted Index

0 Cupertino

9,14

1 San Jose

6,11

2 Palo Alto

3,8,12,13

3 Dublin

2,4

4 Freemont

5,10

5 Oakland

1

6 San Francisco

7

References

1 5 2 3 3 2 4 3 5 4 6 1 7 6 8
1 5
2 3
3 2
4 3
5 4
6 1
7 6
8 2
9 0
10 4
11 1
12 2
13 2
14 0

Dictionary

Cupertino San Jose Palo Alto Dublin Fremont Oakland San Francisco
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco
InvertedInverted indexindex compressioncompression Properties of inverted index : • Rowlist of one index item is

InvertedInverted indexindex compressioncompression

Properties of inverted index :

Rowlist of one index item is read completely

Order inside the list is irrelevant

1)

Delta encoding on each row list

Reason : make the row list numbes smaller ( for better binary compression)

Example :

3,8,12,13 ->

3, 5, 4, 1

2)

Golomb encoding on top of the delta lists

Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

Rice coding (invented by Robert F. Rice ) denotes using a subset of the family of Golomb codes to produce a simpler (but possibly suboptimal) prefix code. Rice used this set of codes in an adaptive coding scheme; "Rice coding" can refer either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes

Rice codes convenient for use on a computer, since multiplication and division by 2 can be implemented more

efficiently in binary arithmetic.

InvertedInverted indexindex compressioncompression :: GolombGolomb RiceRice codingcoding To Golomb-code a number, find the

InvertedInverted indexindex compressioncompression :: GolombGolomb RiceRice codingcoding

To Golomb-code a number, find the quotient and remai nder of division by the divisor. Write the quotient in unary notation, then the remainder in truncated binary notation. In practice, you need a stop bit after the quotient: if the quotient is written as a sequence of zeroes, the stop bit is a one. The length of the remainder can be determined from the divisor.

A Golomb-Rice code is a Golomb code where the divisor is a power of two,

g and modulo.

enablin

an efficient implementation usin g

shifts and masks rather than division

InvertedInverted indexindex compressioncompression :: GolombGolomb codingcoding ExampleExample withwith

InvertedInverted indexindex compressioncompression :: GolombGolomb codingcoding ExampleExample withwith m(divisor)m(divisor) == 44

Value

Remainder

Code

0

0

0

1 00

1

0

1

1 01

2

0

2

1 10

3

0

3

1 11

4

1

0

0 1 00

5

1

1

0 1 01

6

1

2

0 1 10

7

1

3

0 1 11

8

2

0

00 1 00

9

2

1

00 1 01

10

2

2

00 1 10

11

2

3

00 1 11

12

3

0

000 1 00

13

3

1

000 1 01

14

3

2

000 1 10

15

3

3

000 1 11

InvertedInverted indexindex compressioncompression :: GolombGolomb RiceRice codingcoding Golomb code of parameter m for

InvertedInverted indexindex compressioncompression :: GolombGolomb RiceRice codingcoding

Golomb code of parameter m for positive integer n is given by coding n div m (quotient) in unary and n mod m (remainder) in binary. When m is power of 2, a simple realization also known as Rice code. l

E xamp e: n =

22

,

k

2

4

=

(m =

).

n = 22 = ‘10110’. Shift right n by k (= 2) bits. We get ‘101’. Output 5 (for ‘101’) ‘0’s followed by ‘1’. Then also output the last k bits of N. So, Golomb-Rice code for 22 is ‘00000110’.

Decoding is simple: count up to first 1. This gives us the number 5. Then read the next k (=2) bits - ‘10’ , and n = m x 5 + 2 (for ‘10’) = 20 + 2 = 22.

HANA Bluebook, p.53

© 2013 SAP AG or an SAP affiliate company. All rights reserved.

55

BlockBlock InvertedInverted indexindex Idea : • Better compression of the inverted index ( still 30-50%

BlockBlock InvertedInverted indexindex

Idea :

Better compression of the inverted index ( still 30-50% of complete column size)

Scan ( at least of small blocks) are extremely fast

Column encoding is good for small values

Inverted Block Index ( with blocksize = 4)

0 Cupertino

,

3 4

1 San Jose

2,3

2 Palo Alto

1,2,3,4

3 Dublin

1

4 Freemont

2,3

5 Oakland

1

6 San Francisco

2

Better compression cause of

Smaller numbers

Smaller lists ( if multiple hits in one block)

References

1 5 2 3 3 2 4 3 5 4 6 1 7 6 8
1
5
2
3
3
2
4
3
5
4
6
1
7
6
8
2
9
0
10
4
11
1
12
2
13
2
14
0

Dictionary

Cupertino San Jose Palo Alto Dublin Fremont Oakland San Francisco
Cupertino
San Jose
Palo Alto
Dublin
Fremont
Oakland
San Francisco

But : higher CPU costs and worse performance cause of additional block scan

Dictionary compression & index Architecture & Technology
Dictionary compression & index Architecture & Technology

Dictionary compression & index

Architecture & Technology

Dictionary compression & index Architecture & Technology
DatabaseDatabase TechnologyTechnology DictionaryDictionary definitiondefinition DEFINITION (STRING DICTIONARY). A

DatabaseDatabase TechnologyTechnology DictionaryDictionary definitiondefinition

DEFINITION (STRING DICTIONARY).

A string dictionary is a read-only data structure that implements at least the following two functions:

1)

Given a value ID id, extract(id) returns the corresponding string in the dictionary.

2)

Given a string str, locate(str) returns the unique value ID of str if str is in the

dictionary or the value ID of the first string greater than str otherwise.

An access to a string attribute value in a column-store database often corresponds to an extract-operation in a string dictionary. Thus, it is important that extract operations can be performed very fast.

A typical use case for the locate operation is a WHERE clause in an SQL statement that compares a string attribute against a string value. Here, only one locate operation is needed to execute the statement. Hence, the performance of the locate operation is not as critical as the extract performance.

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: Delta Compression For Strings © 2013 SAP AG or an

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: Delta Compression For Strings

ColumnColumn Store:Store: Delta Compression For Strings © 2013 SAP AG or an SAP affiliate company. All
DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: other string dictionary compression schemas •

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: other string dictionary compression schemas

• Huffman / Hu-Tucker Compression : Hu-Tucker compression is used only if the order preserving property is needed.

• Bit Compression ( each character is represented as a ( small ) number of bits – used if just a subset of characters is needed in the dictionary

• N-Gram Compression : Frequent 2-grams or 3-grams are replaced by 12 bit codes.

• Re-Pair Compression : Re-Pair Compression using either 12 bits or 16 bits to store a rule.

We can apply these compression schemes to two main dictionary data structures:

• Array : One class of dictionary implementations is based on a simple consecutive array containing the string data. Pointers to each string in this array are maintained in a separate array.

• Front Coding : The strings of a dictionary are divided into blocks, which are encoded using Front Coding. The resulting blocks are then stored in a consecutive array. Pointers to each block are maintained in a separate array. The prefix length values of one block are stored in a header at the beginning of the block.

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: string dictionary compression schemas • Inline Front

DatabaseDatabase TechnologyTechnology ColumnColumn Store:Store: string dictionary compression schemas

• Inline Front Coding : In order to improve sequential access, a Front Coding variant stores the prefix lengths interleaved with the string suffixes.

• Front Coding with Difference to First : In order to trade some space for speed, another Front Coding variant that stores the suffixes differing from the first string of a block instead of the difference to the previous string can be used. Hence decompression of a string essentially consists of two memcpys.

• Fixed Length Array: For very fast access to small dictionaries, an array implementation that does not need pointers to the string data can be used. For each string, the same amount of space is allocated in a consecutive array.

• Column-Wise Bit Compression : For columns with strings that all have the same length and a similar structure, a specific compression scheme and be used. Divide the dictionary into blocks. Then vertically partition each block into character columns, which are then bit compressed.

Dictionary Index : CEFS - Cache-Efficient Function Stores Context and Idea Wei Zhou, Ingo Müller,

Dictionary Index : CEFS - Cache-Efficient Function Stores

Context and Idea

Wei Zhou, Ingo Müller, Robert Schulze

Sorted (String) Dictionary 0 Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb
Sorted
(String) Dictionary
0 Abu Dhabi
1 Abuja
2 Accra
n-2
Yerevan
n-1
Zagreb
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a
Abu Dhabi 1 Abuja 2 Accra … n-2 Yerevan n-1 Zagreb CEFS: „Cache-Efficient Function Store“, a

CEFS: „Cache-Efficient Function Store“, a read-optimized data structure for dictionary indexing Idea 1: When we use only hashed values, we can save space and comparisons Idea 2: Tune for maximum cache-efficiency Recursive data structure of multiple levels, each comprising an array of buckets

CEFS - Cache-Efficient Function Stores Design and Conclusions Wei Zhou, Ingo Müller, Robert Schulze …

CEFS - Cache-Efficient Function Stores

Design and Conclusions

Wei Zhou, Ingo Müller, Robert Schulze

… Encoded signatures (tag) sigs slot1 slot2 slot3 slot4
Encoded signatures (tag)
sigs
slot1
slot2
slot3
slot4
Encoded signatures (tag) sigs slot1 slot2 slot3 slot4 Perfect hash function © 2013 SAP AG or
Encoded signatures (tag) sigs slot1 slot2 slot3 slot4 Perfect hash function © 2013 SAP AG or
Encoded signatures (tag) sigs slot1 slot2 slot3 slot4 Perfect hash function © 2013 SAP AG or

Perfect hash

function

sigs slot1 slot2 slot3 slot4 Perfect hash function © 2013 SAP AG or an SAP affiliate
CEFS - Cache-Efficient Function Stores Design and Conclusions Wei Zhou, Ingo Müller, Robert Schulze Perfect

CEFS - Cache-Efficient Function Stores

Design and Conclusions

Wei Zhou, Ingo Müller, Robert Schulze

Perfect hash function … Encoded signatures (tag) sigs slot1 slot2 slot3 slot4
Perfect hash
function
Encoded signatures (tag)
sigs
slot1
slot2
slot3
slot4
References Compression Architecture & Technology
References Compression Architecture & Technology

References Compression

Architecture & Technology

References Compression Architecture & Technology
DatabaseDatabase TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate

DatabaseDatabase TechnologyTechnology Compression of Value ID Sequence

TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate company. All
TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate company. All
TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate company. All
TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate company. All
TechnologyTechnology Compression of Value ID Sequence © 2013 SAP AG or an SAP affiliate company. All
DatabaseDatabase TechnologyTechnology CompressionCompression withwith runrun lengthlength encodingencoding Classical Row

DatabaseDatabase TechnologyTechnology

CompressionCompression withwith runrun lengthlength encodingencoding

Classical Row Store

Difficult to compress

Company

Region

Group

[CHAR50]

[CHAR30]

[CHAR5]

INTEL

USA

A

Siemens

Europe

B

Siemens

Europe

C

SAP

Europe

A

SAP

Europe

A

IBM

USA

A

NewDB Column Store:

Dictionary compressed

0 INTEL 1 Siemens 2 SAP 3 IBM
0 INTEL
1 Siemens
2 SAP
3 IBM
0 1 1 2 2 3
0
1
1
2
2
3

0 Europe

1 USA

0 A

1 B

2 C

1
1
0
0

0

1

0

2

0

0

0

0

1

0

NewDB Column Store:

Run length compressed*

0 INTEL 1 Siemens 2 SAP 3 IBM
0 INTEL
1 Siemens
2 SAP
3 IBM
0 Europe 1 USA
0 Europe
1 USA
0 A 1 B 2 C
0 A
1 B
2 C
1 x „0“ 1 x „1“ 1 x „0“ 2 x „1“ 4 x „0“
1
x „0“
1
x „1“
1 x „0“
2
x „1“
4
x „0“
1
x „1“
2
x „2“
1
x „1“
1
x „2“
1
x „3“
3
x „0“

* Note that there is a variety of compression methods and algorithms like run-length compression (see Comparison of Compression Algorithms`)

Decimal arithmetic Some interesting algorithmic challenges
Decimal arithmetic Some interesting algorithmic challenges

Decimal arithmetic

Some interesting algorithmic challenges

Decimal arithmetic Some interesting algorithmic challenges
DecimalDecimal arithmeticarithmetic Decimals are defined as numbers with precision and scale • Precision is the

DecimalDecimal arithmeticarithmetic

Decimals are defined as numbers with precision and scale

Precision is the number of decimal digits

Scale is the number of fraction digits

i.e. 965.23

There are 2 principal ways of implementing decimals

By internally use decfloat type

By internally use integers/ long / very long integers

could be represented with a decimal of precision=5, scale =2

Decfloat can be implemented as either a floating-point number or as a fixed-point number. In the fixed- point case, the denominator would be set to a fixed power of ten. In the floating-point case, a variable exponent would represent the power of ten to which the mantissa of the number is multiplied.

By using integer representation the big issue is rounding and overflow handling.

Example : a = 10.001 ( with p=5, s=3)

and b = 3.33 with (p=3, s=2)

We want to calculate a/b

what are the output type options how to calculate the division for a output type (p=5, s=3) with internal integer representation ?

Parallelization / Distribution Architecture & Technology
Parallelization / Distribution Architecture & Technology

Parallelization / Distribution

Architecture & Technology

Parallelization / Distribution Architecture & Technology
Database Technology: Exploit Multi-Core Architectures by parallelization of operations II A 10 € B 35

Database Technology: Exploit Multi-Core Architectures by parallelization of operations II

A 10 € B 35 $ C 2 € D 40 € E 12 $
A
10
B
35
$
C
2
D
40
E
12
$

Vertical concurrent processing on vertical partitions (disjoint set of columns)

Horizontal concurrent processing on horizontal partitions (disjoint subset of rows)

A 10 € B 35 $ C 2 € Server 1
A
10
B
35
$
C
2
Server 1
D 40 € E 12 $ Server 2
D
40
E
12
$
Server 2
B 35 $ C 2 € Server 1 D 40 € E 12 $ Server 2

split horizontally by blades

DatabaseDatabase TechnologyTechnology:: ParallelizationParallelization andand partitioningpartitioning overover

DatabaseDatabase TechnologyTechnology::

ParallelizationParallelization andand partitioningpartitioning overover multiplemultiple nodesnodes

SAP HANA HANA MSAP M anagement System Systemanagement Node 1 Product Group Color 10 A
SAP HANA
HANA MSAP M anagement System
Systemanagement
Node 1
Product
Group
Color
10
A red
Product
Group
Color
Select * from table
where Group = ‚A‘
20
B blue
10
A
red
30
A green
20
B
blue
30
A
green
40
A
red
Node 2
50
C
red
Product
Group
Color
60
A
red
40
A
red
50
C
red
60
A
red
DatabaseDatabase TechnologyTechnology:: ExampleExample forfor parallelizationparallelization inin aa ColumnColumn

DatabaseDatabase TechnologyTechnology::

ExampleExample forfor parallelizationparallelization inin aa ColumnColumn StoreStore

inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate
inin aa ColumnColumn StoreStore HANA Bluebook, p.14 © 2013 SAP AG or an SAP affiliate

HANA Bluebook, p.14

© 2013 SAP AG or an SAP affiliate company. All rights reserved.

73

Distributed In-Memory Computing Engine © 2013 SAP AG or an SAP affiliate company. All rights

Distributed In-Memory Computing Engine

Distributed In-Memory Computing Engine © 2013 SAP AG or an SAP affiliate company. All rights reserved.
DatabaseDatabase TechnologyTechnology ParallelizationParallelization andand horizontalhorizontal partitioningpartitioning

DatabaseDatabase TechnologyTechnology

ParallelizationParallelization andand horizontalhorizontal partitioningpartitioning overover multiplemultiple nodesnodes

SAP NewDB NewDB MSAP M anagement System Systemanagement Node 1 Product Group Color 10 A
SAP NewDB
NewDB MSAP M anagement System
Systemanagement
Node 1
Product
Group
Color
10
A red
Product
Group
Color
Select * from table
where Group = ‚A‘
20
B blue
10
A
red
30
A green
20
B
blue
30
A
green
40
A
red
Node 2
50
C
red
Product
Group
Color
60
A
red
40
A
red
50
C
red
60
A
red
Database Technology Exploit multi-core architectures by parallelization of operations A 10 € B 35 $

Database Technology Exploit multi-core architectures by parallelization of operations

A 10 € B 35 $ C 2 € D 40 € E 12 $
A
10
B
35
$
C
2
D
40
E
12
$

Vertical concurrent processing on vertical partitions (disjoint set of columns)

Horizontal concurrent processing on horizontal partitions (disjoint subset of rows)

A 10 € B 35 $ C 2 € Server 1
A
10
B
35
$
C
2
Server 1
D 40 € E 12 $ Server 2
D
40
E
12
$
Server 2
B 35 $ C 2 € Server 1 D 40 € E 12 $ Server 2

split horizontally by blades

Alternatives for Parallel Processing Standard Solution Alternative 1 Alternative 2 Serial Execution I n t

Alternatives for Parallel Processing

Standard Solution

Alternative 1

Alternative 2

Serial Execution

Solution Alternative 1 Alternative 2 Serial Execution I n t e r - O p e

Inter-Operator

Parallelism

I n t e r - O p e r a t o r Parallelism Intra-Operator

Intra-Operator

Parallelism

p e r a t o r Parallelism Intra-Operator Parallelism Todo: • Range-aware algorithms • Parallelism

Todo:

• Range-aware algorithms

• Parallelism controller

• Adjust plan generator

Todo:

• Parallel algorithms

• New internal APIs

• Parallelism controller

YediDB : very first steps I Class COLUMN Defines and holds the data for one

YediDB : very first steps I

Class COLUMN

Defines and holds the data for one column, which is mainly the dictionary and the reference vector. Dictionary is a seperate class. The reference vector is blocked into chunks of 1 million references. Each block is stored in the class REFERENCE_BLOCK

Class REFERENCE_BLOCK

Stores one block of references into the dictionary ( one block has the default size of 1 million references)

Class DICTIONARY

Stores the dictionary of a column in the member sortedData dictionary independant of the type of the column

Every column has a

YediDB : very first steps Class TABLE Stores the set of columns of a table,

YediDB : very first steps

Class TABLE

Stores the set of columns of a table, the table name and the maximal row number of this table

Class CSV

Handles the insert of an CSV file into YediDB

Test1.lua

Test program which should run after this first exercise

YediDB : first exercise Implement 4 functions in the YediDB sceleton 1) function DICTIONARY:addValueToSortedDict(value,

YediDB : first exercise

Implement 4 functions in the YediDB sceleton

1)

function DICTIONARY:addValueToSortedDict(value, pos)

2)

function DICTIONARY:searchDict(value)

3)

function COLUMN:fillColumnInitial( columnValues )

4

)

function COLUMN:searchRowIdsB ValueId(txId valueId highestRowNumber

y

,

,

References Ingo Müller, Peter Sanders, Robert Schulze, Wei Zhou. Retrieval and Perfect Hashing using Fingerprinting.

References

Ingo Müller, Peter Sanders, Robert Schulze, Wei Zhou. Retrieval and Perfect Hashing using Fingerprinting. SEA 2014, Copenhagen, Denmark, June/July 2014.

Ingo Müller, Cornelius Ratsch, Franz Faerber. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. EDBT 2014, Athens, Greece, March 2014.

Thomas Willhalm, Ismail Oukid, Ingo Müller, Franz Faerber. Vectorizing Database Column Scans with Complex Predicates. ADMS 2013, Riva del Garda, Italy, August 2013.

Jonathan Dees, Peter Sanders. Efficient Many-Core Query Execution in Main Memory Column-Stores.

ICDE 2013

B i

r s

b

ane,

A

t

li

us ra a,

A

pr

il 8 12

-

2013

 

,

 

,

Sikka, V., Färber, F., Lehner, W., Cha, S. K., Peh, T., & Bornhövd, C. (2012). Efficient transaction processing in SAP HANA database. SIGMOD Conference (p. 731).

Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., & Dees, J. (2012). The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 28-33.

Färber, F., Cha, S. K., Primsch, J., Bornhövd, C., Sigg, S., & Lehner, W. (2011). SAP HANA Database - Data Management for Modern Business Applications. SIGMOD Record, 40(4), 45-51.

Lemke, C., Sattler, K.-U., Faerber, F., & Zeier, A. (2010). Speeding up queries in column stores: a case for compression, 117-129.

More HANA publications at

http://scn.sap.com/docs/DOC-26787