Вы находитесь на странице: 1из 25

Inside PostgreSQL Shared Memory

BRUCE MOMJIAN,
ENTERPRISEDB
October, 2008

Abstract
POSTGRESQL is an open-source, full-featured relational database. This
presentation gives an overview of the shared memory structures used
by Postgres.
Outline

1. File storage format

2. Shared memory creation

3. Shared buffers

4. Row value access

5. Locking

6. Other structures

Inside PostgreSQL Shared Memory 1


File System /data

Postgres /data

Postgres

Postgres

Inside PostgreSQL Shared Memory 2


File System /data/base

Postgres /data /base

/global
Postgres /pg_clog
/pg_multixact
/pg_subtrans
Postgres /pg_tblspc
/pg_twophase
/pg_xlog
Inside PostgreSQL Shared Memory 3
File System /data/base/db

Postgres /data /base /16385 (production)

/1 (template1)
Postgres /16821 (test)
/17982 (devel)
/21452 (marketing)
Postgres

Inside PostgreSQL Shared Memory 4


File System /data/base/db/table

Postgres /data /base /16385 /24692 (customer)

/27214 (order)
Postgres /25932 (product)
/25952 (employee)
/27839 (part)
Postgres

Inside PostgreSQL Shared Memory 5


File System Data Pages

Postgres /data /base /16385 /24692


8k 8k 8k 8k

Postgres

Postgres

Inside PostgreSQL Shared Memory 6


Data Pages

Postgres /data /base /16385 /24692


8k 8k 8k 8k

Postgres

Postgres

Page Header Item Item Item

8K

Tuple

Tuple Tuple Special


Inside PostgreSQL Shared Memory 7
File System Block Tuple

Postgres /data /base /16385 /24692


8k 8k 8k 8k

Postgres

Page Header Item Item Item

Postgres

8K

Tuple

Tuple Tuple Special

Tuple
Inside PostgreSQL Shared Memory 8
File System Tuple

int4in(’9241’) ’Martin’
Tuple

textout()

Header Value Value Value Value Value Value

OID − object id of tuple (optional)

xmin − creation transaction id

xmax − destruction transaction id

cmin − creation command id

cmax − destruction command id

ctid − tuple id (page / item)

natts − number of attributes

infomask − tuple flags

hoff − length of tuple header

bits − bit map representing NULLs


Inside PostgreSQL Shared Memory 9
Tuple Header C Structures
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* inserting xact ID */
TransactionId t_xmax; /* deleting or locking xact ID */

union
{
CommandId t_cid; /* inserting or deleting command ID, or both */
TransactionId t_xvac; /* VACUUM FULL xact ID */
} t_field3;
} HeapTupleFields;

typedef struct HeapTupleHeaderData


{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;

ItemPointerData t_ctid; /* current TID of this or newer tuple */

/* Fields below here must match MinimalTupleData! */

uint16 t_infomask2; /* number of attributes + various flags */

uint16 t_infomask; /* various flag bits, see below */


uint8 t_hoff; /* sizeof header incl. bitmap, padding */

/* ^ − 23 bytes − ^ */

bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */


/* MORE DATA FOLLOWS AT END OF STRUCT */
} HeapTupleHeaderData;
Inside PostgreSQL Shared Memory 10
Shared Memory Creation

k()
for
postmaster postgres postgres

Program (Text) Program (Text) Program (Text)

Data Data Data

Shared Memory Shared Memory Shared Memory

Stack Stack Stack

Inside PostgreSQL Shared Memory 11


Shared Memory

PROC Lightweight Locks XLOG Buffers


Proc Array Lock Hashes CLOG Buffers
LOCK Subtrans Buffers
Auto Vacuum PROCLOCK Two−Phase Structs
Btree Vacuum Multi−XACT Buffers
Free Space Map Statistics
Background Writer Synchronized Scan Shared Invalidation

Buffer Descriptors

Shared Buffers

Semaphores
Inside PostgreSQL Shared Memory 12
Shared Buffers

Buffer Descriptors Pin Count − prevent page replacement

LWLock − for page changes

8k 8k 8k
Shared Buffers

read()

Page Header Item Item Item


write()

Postgres /data /base /16385 /24692


8K
8k 8k 8k 8k

Tuple
Postgres
Tuple Tuple Special

Postgres

Inside PostgreSQL Shared Memory 13


HeapTuples

8k 8k 8k
Shared Buffers

Page Header Item Item Item

8K

Tuple

Tuple Tuple Special

HeapTuple
int4in(’9241’) ’Martin’
Tuple

textout()

Header Value Value Value Value Value Value Postgres


C pointer
OID − object id of tuple (optional)

xmin − creation transaction id

xmax − destruction transaction id

cmin − creation command id

cmax − destruction command id

ctid − tuple id (page / item)

natts − number of attributes

infomask − tuple flags

hoff − length of tuple header

bits − bit map representing NULLs

Inside PostgreSQL Shared Memory 14


Finding A Tuple Value in C
Datum
nocachegetattr(HeapTuple tuple,
int attnum,
TupleDesc tupleDesc,
bool *isnull)
{
HeapTupleHeader tup = tuple−>t_data;
Form_pg_attribute *att = tupleDesc−>attrs;
{
int i;
/*
* Note − This loop is a little tricky. For each non−null attribute,
* we have to first account for alignment padding before the attr,
* then advance over the attr based on its length. Nulls have no
* storage and no alignment padding either. We can use/set
* attcacheoff until we reach either a null or a var−width attribute.
*/
off = 0;
for (i = 0;; i++) /* loop exit is at "break" */
{
if (HeapTupleHasNulls(tuple) && att_isnull(i, bp))
continue; /* this cannot be the target att */
if (att[i]−>attlen == −1)
off = att_align_pointer(off, att[i]−>attalign, −1,
tp + off);
else
/* not varlena, so safe to use att_align_nominal */
off = att_align_nominal(off, att[i]−>attalign);
if (i == attnum)
break;
off = att_addlength_pointer(off, att[i]−>attlen, tp + off);
}
}
return fetchatt(att[attnum], tp + off);
}
Inside PostgreSQL Shared Memory 15
Value Access in C

#define fetch_att(T,attbyval,attlen) \
( \
(attbyval) ? \
( \
(attlen) == (int) sizeof(int32) ? \
Int32GetDatum(*((int32 *)(T))) \
: \
( \
(attlen) == (int) sizeof(int16) ? \
Int16GetDatum(*((int16 *)(T))) \
: \
( \
AssertMacro((attlen) == 1), \
CharGetDatum(*((char *)(T))) \
) \
) \
) \
: \
PointerGetDatum((char *) (T)) \
)
Inside PostgreSQL Shared Memory 16
Test And Set Lock
Can Succeed Or Fail

1 1

0/1

0 1
Success Failure
Was 0 on exchange Was 1 on exchange
Lock already taken
Inside PostgreSQL Shared Memory 17
Test And Set Lock
x86 Assembler

static __inline__ int


tas(volatile slock_t *lock)
{
register slock_t _res = 1;
/*
* Use a non−locking test before asserting the bus lock. Note that the
* extra test appears to be a small loss on some x86 platforms and a small
* win on others; it’s by no means clear that we should keep it.
*/
__asm__ __volatile__(
" cmpb $0,%1 \n"
" jne 1f \n"
" lock \n"
" xchgb %0,%1 \n"
"1: \n"
: "+q"(_res), "+m"(*lock)
:
: "memory", "cc");
return (int) _res;
}

Inside PostgreSQL Shared Memory 18


Spin Lock
Always Succeeds

1 1

0/1 Sleep of increasing duration

0 1
Success Failure
Was 0 on exchange Was 1 on exchange
Lock already taken

Spinlocks are designed for short-lived locking operations, like


access to control structures. They are not be used to protect
code that makes kernel calls or other heavy operations.
Inside PostgreSQL Shared Memory 19
Light Weight Locks

Sleep On Lock

PROC Lightweight Locks XLOG Buffers


Proc Array Lock Hashes CLOG Buffers
LOCK Subtrans Buffers
Auto Vacuum PROCLOCK Two−Phase Structs
Btree Vacuum Multi−XACT Buffers
Free Space Map Statistics
Background Writer Synchronized Scan Shared Invalidation

Buffer Descriptors

Shared Buffers

Semaphores

Light weight locks attempt to acquire the lock, and go to


sleep on a semaphore if the lock request fails. Spinlocks
control access to the light weight lock control structure.
Inside PostgreSQL Shared Memory 20
Database Object Locks

PROC PROCLOCK LOCK

Lock Hashes

Inside PostgreSQL Shared Memory 21


Proc

PROC

empty used used empty used empty

Proc Array

Inside PostgreSQL Shared Memory 22


Other Shared Memory Structures

PROC Lightweight Locks XLOG Buffers


Proc Array Lock Hashes CLOG Buffers
LOCK Subtrans Buffers
Auto Vacuum PROCLOCK Two−Phase Structs
Btree Vacuum Multi−XACT Buffers
Free Space Map Statistics
Background Writer Synchronized Scan Shared Invalidation

Buffer Descriptors

Shared Buffers

Semaphores
Inside PostgreSQL Shared Memory 23
Conclusion

Pink Floyd: Wish You Were Here


Inside PostgreSQL Shared Memory 24