Вы находитесь на странице: 1из 35

Having worked for several years as an Oracle DBA, I decided to have a look at the PostgreSQL

database and see how it functions in comparison to the Oracle Database.

The “Enterprise DB” graphical installation of PostgreSQL 9.3 is quite easy and rather fast. Under
Linux you run the graphical installer, dialog boxes lead you through the installation process. You
enter the specific information of your system and at the end of the PostgreSQL installation, the
Stack Builder package is invoked if you need to install applications, drivers, agents or utilities.

You can download the Enterprise DB utility using the following URL:
http://www.enterprisedb.com/downloads/postgres-postgresql-downloads

I have installed PostgreSQL 9.3 using Enterprise DB as described below:

Choose Next.
Specify the installation directory where PostgreSQL 9.3 will be installed.
Select the directory that will store the data.
Provide a password to the PostgreSQL database user.
Select a port number.
Choose the locale for the new database cluster.
PostgreSQL is now ready to be installed.
You can choose to launch or not the Stack Builder – if not, the installation process will begin.

If you encounter any problem during the installation phase, the log files are generated in /tmp.

Under Linux, a shell script named uninstall-postgresql is created in the PostgreSQL home
directory to de-install the software.

The installation phase is very quick, your PostgreSQL cluster database is ready to use.
Furthermore, the Enterprise DB installation creates the automatic startup file in
/etc/init.d/postgresql-9.3 to start PostgreSQL in case of a server reboot.

Once the Enterprise DB installation is processed, a database storage area is initialized on disk (a
database cluster). After the installation, this database cluster will contain a database named
postgres and will be used by utilities or users:

postgres=# list List of databases Name |


Owner | Encoding | Collate | Ctype | Access privileges-----------+-
---------+----------+------------+------------+-------------postgres |
postgres | UTF8 | en_US.utf8 | en_US.utf8 |template0 | postgres |
1UTF8 | en_US.utf8 | en_US.utf8 |
=c/postgres + | | | | |
postgres=CTc/postgrestemplate1 | postgres | UTF8 | en_US.utf8 |
en_US.utf8 |
postgres=CTc/postgres+ | | | |
| =c/postgres

By default, a new database is created by cloning the system standard base named template1. The
template0 allows you to create a database containing only pre-defined standard objects.

The sqlplus oracle equivalent command in PostgreSQL is psql. As you will see in the document,
the PostgreSQL commands begin with the sign. The “?” command lists every possibility.

For example, the following commands connects to the psi database:

1-bash-3.2$ psql -d psi


2Password:psql.bin (9.3.4)
3Type "help" for help.No entry for terminal type "xterm";
4using dumb terminal settings.
5psi=# q

If you do not want the system to ask for a password, you simply have to create a .pgpass file in
the postgres home directory with the 0600 rights and the following syntax:

1
2 -bash-3.2$ more .pgpass
3 localhost:5432:PSI:postgres:password
4
-bash-3.2$ su - postgres
5 Password:
6 -bash-3.2$ psql -d psi
7 psql.bin (9.3.4)
8 Type "help" for help.
9 No entry for terminal type "xterm";
using dumb terminal settings.
10psi=#
11psi-# q
12

At first you probably need to create a database. As an Oracle DBA, I was wondering about some
typical problems such as character set or default tablespace. With PostgreSQL, it is quite easy to
create a database.

As the locale en_US.utf8 has been chosen during the installation phase to be used by the cluster
database, every database you will create will use it.

When you create a database you can specify a default tablespace and an owner. At first we create
a tablespace:

1postgres=# create tablespace psi location '/u01/postgres/data/psi';


2CREATE TABLESPACE

The tablespace data is located in /u01/postgres/data/psi:


1-bash-3.2$ ls
2PG_9.3_201306121
3-bash-3.2$ ls PG_9.3_201306121/
416526
-bash-3.2$ ls PG_9.3_201306121/16526/
5
12547 12587_vm 12624 12663 12728 12773
6
12547_fsm 12589 12625 12664 12728_fsm 12774
7
812664_vm 12730 12774_vm 12627 12666 12731 12776

Then we create the database:

1postgres=# create database psi owner postgres tablespace psi;


2CREATE DATABASE

We can list all databases with the list command:

postgres=# list
1 List of databases
2 Name | Owner | Encoding | Collate | Ctype | Access privileges
3-----------+----------+----------+------------+------------+-------------
4postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
psi | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
5template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
6=c/postgres
7| | | | | postgres=CTc/postgres
8template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
postgres=CTc/postgres+ | | | |
9
| =c/postgres

Now, we can connect to the psi database and create objects, the syntax is quite similar to Oracle:

1postgres=# c psi
2You are now connected to database "psi" as user "postgres".

We create a table and an index:

1psi=# create table employe (name varchar);


2CREATE TABLE
3psi=# create index employe_ix on employe (name);
4CREATE INDEX

We insert values in it:

1psi=# insert into employe values ('bill');


2INSERT 0 1

We reconnect to the psi database:


1-bash-3.2$ psql -d psi
2Password:
3psql.bin (9.3.4)
4Type "help" for help.
5No entry for terminal type "xterm";
using dumb terminal settings.
6

The following command lists the tables:

1
2 psi=# dt[+]
3 List of relations
Schema | Name | Type | Owner | Size | Description
4 --------+---------+-------+----------+-------+-------------
5 public | employe | table | postgres | 16 kB |
6 (1 row)
7 psi=# select * from employe;
8 name
------
9 bill
10(1 row)
11

The d+ postgreSQL command is the equivalent of the Oracle desc command:

1psi=# d+ employe
2 Table "public.employe"
3 Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+----------+--------------+---------
4----
5name | character varying | | extended | |
6Indexes:
7 "employe_ix" btree (name)
8 Has OIDs: no

Obviously we also have the possibility to create a schema and create objects in this schema.

Let’s create a schema:

1psi=# create schema psi;


2CREATE SCHEMA

Let’s create a table, insert objects in it and create a view:

psi=# create table psi.salary (val integer);


1 CREATE TABLE
2 psi=# insert into psi.salary values (10000);
3 INSERT 0 1
4 psi=# select * from psi.salary;
val
5 -------
6 10000
7 psi=# create view psi.v_employe as select * from psi.salary;
CREATE VIEW
8
9
10

If we list the tables we can only see the public objects:

1psi=# d
2 List of relations
3Schema | Name | Type | Owner
4--------+---------+-------+----------
5public | employe | table | postgres
(1 row)
6

If we modify the search path, all schemas are visible:

1
psi=# set search_path to psi,public;
2SET
3psi=# d
4 List of relations
5 Schema | Name | Type | Owner
6 --------+---------+-------+----------
psi | salary | table | postgres
7public | employe | table | postgres
8

Oracle DBA’s are familiar with sql commands – e. g. to get the table list of a schema by typing
select table_name, owner from user_tables, etc.
What is the equivalent query in postgreSQL?

PostgreSQL uses a schema named information_schema available in every database. The owner
of this schema is the initial database user in the cluster. You can drop this schema, but the space
saving is negligible.

You can easily query the tables of this schema to get precious informations about your database
objects:
Here is a list of the schemas tables:

1psi=# select table_name, table_schema from information_schema.tables where


2table_schema in ('public','psi');
table_name | table_schema
3------------+--------------
4employe | public
5salary | psi

We can display the database character set:


1psi=# select character_set_name from information_schema.character_sets;
2character_set_name
3--------------------
4UTF8

We can display schema views:

1psi=# select table_name from information_schema.views where


2table_schema='psi';
table_name
3------------
4v_employe

Using the information_schema schema helps us to display information about a lot of different
database objects (tables, constraints, sequences, triggers, table_privileges …)

Like in Oracle you can run a query from the SQL or the UNIX prompt. For example, if you want
to know the index name of the table employe, you shoud use the index.sql script:

1
2 select
3 t.relname as table_name,
4 i.relname as index_name,
a.attname as column_name
5 from
6 pg_class t,pg_class i,
7 pg_index ix,pg_attribute a
8 wheret.oid = ix.indrelid
9 and i.oid = ix.indexrelid
and a.attrelid = t.oid
10and a.attnum = ANY(ix.indkey)
11and t.relkind = 'r'
12and t.relname = 'employe'
13order byt.relname,i.relname;
14

If you want to display the employee index from the SQL prompt, you run:

1 psi=# i index.sql
2 table_name | index_name | column_name
3 ------------+------------+-------------
4 employe | employe_ix | name
5
If you want to run the same query from the UNIX prompt:
6
7 -bash-3.2$ psql -d psi -a -f index.sql
8 Password:
9 table_name | index_name | column_name
10------------+------------+-------------
11employe | employe_ix | name
12

However, typing an SQL request might be interesting, but – as many Oracle DBA – I like using
an administration console because I think it increases efficiency.

I have discovered pgAdmin, an administration tool designed for Unix or Windows systems.
pgAdmin is easy to install on a PostgreSQL environment and enables many operations for the
administration of a cluster database.

pgAdmin3 is installed in the home directory of the user postgre – in my case in /opt/postgres/9.3.

To successfully enable pgAdmin3, it is necessary to correctly initialize the


LD_LIBRARY_PATH variable:

export
1LD_LIBRARY_PATH=/opt/PostgreSQL/9.3/lib:/opt/PostgreSQL/9.3/pgAdmin3/lib

The pgadmin3 console:

As you can see, you can administer every database object (tables, functions, sequences, triggers,
views…).

You can visualize the table creation scripts:


You can edit / change / modify the privileges of an object:
You also have the possibility to create scripst for the database creation:

Or even to backup the database:


This tool seems to be very powerful, even if for the moment, I did not find any performance tool
available like in Cloud Control 12c.

Conclusion

Discovering PostgreSQL as an Oracle DBA, I realized how close the two products are. The
PostgreSQL database has a lot of advantages such as the easy installation, the general usage and
the price (because it’s free!).

For the processing of huge amounts of data, Oracle certainly has advantages, nevertheless the
choice of a RDBMS always depends on what your application business needs are.

===========================================*====================

PostgreSQL

Database Server Processes


The database server program postgres are all of the server processes. There are no separately
named processes like in Oracle for the different duties within the database environment. If you
were to look at the process list (ps) the name of the processes would be postgres. However, on
most platforms, PostgreSQL modifies its command title so that individual server processes can
readily be identified. You may need to adjust the parameters used for commands such as ps and
top to show these updated titles in place of the process name ("postgres").

The processes seen in a process list can be some of the following:

 Master process - launches the other processes, background and session processes.
 Writer process - background process that coordinates database writes, log writes and
checkpoints.
 Stats collector process - background process collecting information about server activity.
 User session processes.

The server processes communicate with each other using semaphores and shared memory to
ensure data integrity throughout concurrent data access.

PostgreSQL Database Cluster


Within a server, one or more Oracle instances can be built. The databases are separate from one
another usually sharing only the Oracle listener process. PostgreSQL has the concept of a
database cluster. A database cluster is a collection of databases that is stored at a common file
system location (the "data area"). It is possible to have multiple database clusters, so long as they
use different data areas and different communication ports.

The processes along with the file system components are all shared within the database cluster.
All the data needed for a database cluster is stored within the cluster's data directory, commonly
referred to as PGDATA (after the name of the environment variable that can be used to define it).
The PGDATA directory contains several subdirectories and configuration files.

The following are some of the cluster configuration files:

 postgresql.conf - Parameter or main server configuration file.


 pg_hba.conf - Client authentication configuration file.
 pg_ident.conf - Map from OS account to PostgreSQL account file.

The cluster subdirectories:

 base - Subdirectorycontaining per-database subdirectories


 global - Subdirectory containing cluster-wide tables
o pg_auth - Authorization file containing user and role definitions.
o pg_control - Control file.
o pg_database - Information of databases within the cluster.
 pg_clog - Subdirectory containing transaction commit status data
 pg_multixact - Subdirectory containing multitransaction status data (used for shared row locks)
 pg_subtrans - Subdirectory containing subtransaction status data
 pg_tblspc - Subdirectory containing symbolic links to tablespaces
 pg_twophase - Subdirectory containing state files for prepared transactions
 pg_xlog - Subdirectory containing WAL (Write Ahead Log) files

By default, for each database in the cluster there is a subdirectory within PGDATA/base, named
after the database's OID (object identifier) in pg_database. This subdirectory is the default
location for the database's files; in particular, its system catalogs are stored there. Each table and
index is stored in a separate file, named after the table or index's filenode number, which can be
found in pg_class.relfilenode.

Several components that Oracle DBAs usually equate to one database are shared between
databases within a PostgreSQL cluster, including the parameter file, control file, redo logs,
tablespaces, accounts, roles, and background processes.

Tablespaces and Object Data Files


PostgreSQL introduced tablespace management in version 8.0. The physical representation of a
tablespace within PostgreSQL is simple: it is a directory on the file system, and the mapping is
done via symbolic links.

When a database is created, the default tablespace is where by default all of the database objects
are stored. In Oracle this would be similar to the System, User, and Temporary tablespaces. If no
default tablespace is defined during creation, the data files will go into a subdirectory of the
PGDATA/base. Preferably the location of the system catalog information and the application
data structures would reside in separately managed tablespaces. This is available.

As in Oracle, the definition of a PostgreSQL table determines which tablespace the object
resides. However, there exists no size limitation except physical boundaries placed on the device
by the OS.

The individual table's data is stored within a file within the tablespace (or directory). The
database software will split the table across multiple datafiles in the event the table's data
surpasses 1 GB.

Since version 8.1, it's possible to partition a table over separate (or the same) tablespaces. This is
based on PostgreSQL's table inheritance feature, using a capability of the query planner referred
to as constraint exclusion.

There exists no capacity for separating out specific columns (like LOBs) into separately defined
tablespaces. However, in addition to the data files that represent the table (in multiples of 1 GB)
there is a separation of data files for columns within a table that are TOASTed. The PostgreSQL
storage system called TOAST (The Oversized-Attribute Storage Technique) automatically stores
values larger than a single database page into a secondary storage area per table. The TOAST
technique allows for data columns up to 1 GB in size.
As in Oracle, the definition of an index determines which tablespace it resides within. Therefore,
it is possible to gain the performance advantage of separating the disks that a table's data versus
its indexing reside, relieving I/O contention during data manipulation.

In Oracle there exists temporary tablespaces where sort information and temporary evaluation
space needed for distinct statements and the like are used. PostgreSQL does not have this
concept of a temporary tablespace; however it does require storage to be able to perform these
activities as well. Within the "default" tablespace of the database (defined at database creation)
there is a directory called pgsql_tmp. This directory holds the temporary storage needed for the
evaluation. The files that get created within the directory exist only while the SQL statement is
executing. They grow very fast, and are most likely not designed for space efficiency but rather
speed. Be aware that disk fragmentation could result from this, and there needs to be sufficient
space on the disk to support the user queries. With the release of 8.3, there are definitions of
temporary tablespaces using the parameter temp_tablespaces.

REDO and Archiving


PostgreSQL uses Write-Ahead Logging (WAL) as its approach to transaction logging. WAL's
central concept is that changes to data files (where tables and indexes reside) must be written
only after those changes have been logged, that is, when log records describing the changes have
been flushed to permanent storage. If we follow this procedure, we do not need to flush data
pages to disk on every transaction commit, because we know that in the event of a crash we will
be able to recover the database using the log: any changes that have not been applied to the data
pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)

PostgreSQL maintains its (WAL) in the pg_xlog subdirectory of the cluster's data directory.

WAL was introduced into PostgreSQL in version 7.1. To maintain database consistency in case
of a failure, previous releases forced all data modifications to disk before each transaction
commit. With WAL, only one log file must be flushed to disk, greatly improving performance
while adding capabilities like Point-In-Time Recovery and transaction archiving.

A PostgreSQL system theoretically produces an indefinitely long sequence of WAL records. The
system physically divides this sequence into WAL segment files, which are normally 16MB
apiece. The system normally creates a few segment files and then "recycles" them by renaming
no-longer-needed segment files to higher segment numbers. If you were to perform a listing of
the pg_xlog directory there would always be a handful of files changing names over time.

To add archiving of the WAL files there exists a parameter within the parameter file where a
command is added to execute the archival process. Once this is done, Operation System "on-
line" backups even become available by executing the pg_start_backup and the pg_stop_backup
commands, which suspend and resume writing to the datafiles while continuing to write the
transactions to the WAL files and executing the archival process.

Inclusion of WAL archiving and the on-line backup commands were added in version 8.0.
Rollback or Undo
It is interesting how the dynamic allocation of disk space is used for the storage and processing
of records within tables. The files that represent the table grow as the table grows. It also grows
with transactions that are performed against it. In Oracle there is a concept of rollback or undo
segments that hold the information for rolling back a transaction. In PostgreSQL the data is
stored within the file that represents the table. So when deletes and updates are performed on a
table, the file that represents the object will contain the previous data. This space gets reused but
to force recovery of used space, a maintenance process called vacuum must be executed.

Server Log File


Oracle has the alert log file. PostgreSQL has the server log file. A configuration option would
even have the connection information we normally see within the Oracle's listener.log appear in
PostgreSQL's server log. The parameters within the server configuration file (postgresql.conf)
determine the level, location, and name of the log file.

To help with the maintenance of the server log file (it grows rapidly), there exists functionality
for rotating the server log file. Parameters can be set to determine when to rotate the file based
on the size or age of the file. Management of the old files is then left to the administrator.

Applications
The command initdb creates a new PostgreSQL database cluster.

The command psql starts the terminal-based front-end to PostgreSQL or SQL command prompt.
Queries and commands can be executed interactively or through files. The psql command prompt
has several attractive features:

 Thorough on-line help for both the psql commands and the SQL syntax.
 Command history and line editing.
 SQL commands could exist on multiple lines and are executed only after the semi-colon (;).
 Several SQL commands separated by semi-colons could be entered on a single line.
 Flexible output formatting.
 Multiple object description commands that are superior to Oracle's DESCRIBE.

Depending on the security configurations of the environments, connections can be established


locally or remotely through TCP/IP. Due to these separate security connections passwords may
or may not be required to connect.

The command pg_ctl is a utility for displaying status, starting, stopping, or restarting the
PostgreSQL database server (postgres). Although the server can be started through the postgres
executable, pg_ctl encapsulates tasks such as redirecting log output, properly detaching from the
terminal and process group, and providing options for controlled shutdown.
The commands pg_dump and pg_restore are utilities designed for exporting and importing the
contents of a PostgreSQL database. Dumps can be output in either script or archive file formats.
The script file format creates plain-text files containing the SQL commands required to
reconstruct the database to the state it was at the time it was generated. The archive file format
creates a file to be used with pg_restore to rebuild the database.

The archive file formats are designed to be portable across architectures. Historically, any type of
upgrade to the PostgreSQL software would require a pg_dump of the database prior to the
upgrade. Then a pg_restore after the upgrade. Now, for minor releases (i.e., the third decimal –
8.2.x) upgrades can be done in place. However, changing versions at the first or second decimal
still requires a pg_dump/pg_restore.

There exists a graphical tool called pgAdmin III developed separately. It is distributed with the
Linux and Windows versions of PostgreSQL. Connection to a database server can be established
remotely to perform administrative duties. Because the tool is designed to manage all aspects of
the database environment, connection to the database must be through a super user account.

The pgAdmin III tool has the following standard attractive features:

 Intuitive layout
 Tree structure for creating and modifying database objects
 Reviewing and saving of SQL when altering or creating objects

===========================================*+++++++=============

PostgreSQL Basics by Example


Connecting to a database

$ psql postgres # the default database


$ psql database_name

Connecting as a specific user

$ psql postgres john


$ psql -U john postgres

Connecting to a host/port (by default psql uses a unix socket)

$ psql -h localhost -p 5432 postgres

You can also explicitly specify if you want to enter a password -W or not -w

$ psql -w postgres
$ psql -W postgres
Password:
Once you’re inside psql you can control the database. Here’s a couple of handy commands

postgres=# \h # help on SQL commands


postgres=# \? # help on psql commands, such as \? and \h
postgres=# \l # list databases
postgres=# \c database_name # connect to a database
postgres=# \d # list of tables
postgres=# \d table_name # schema of a given table
postgres=# \du # list roles
postgres=# \e # edit in $EDITOR

At this point you can just type SQL statements and they’ll be executed on the database you’re
currently connected to.

User Management
Once your application goes into production, or basically anywhere outside of your dev machine,
you’re going to want to create some users and restrict access.

We have two options for creating users, either from the shell via createuser or via SQL CREATE
ROLE

$ createuser john
postgres=# CREATE ROLE john;

One thing to note here is that by default users created with CREATE ROLE can’t log in. To allow
login you need to provide the LOGIN attribute

postgres=# CREATE ROLE john LOGIN;


postgres=# CREATE ROLE john WITH LOGIN; # the same as above
postgres=# CREATE USER john; # alternative to CREATE ROLE which
adds the LOGIN attribute

You can also add the LOGIN attribute with ALTER ROLE

postgres=# ALTER ROLE john LOGIN;


postgres=# ALTER ROLE john NOLOGIN; # remove login

You can also specify multiple attributes when using CREATE ROLE or ALTER ROLE, but bare in
mind that ALTER ROLE doesn’t change the permissions the role already has which you don’t
specify.

postgres=# CREATE ROLE deploy SUPERUSER LOGIN;


CREATE ROLE
postgres=# ALTER ROLE deploy NOSUPERUSER CREATEDB; # the LOGIN privilege is
not touched here
ALTER ROLE
postgres=# \du deploy
List of roles
Role name | Attributes | Member of
-----------+------------+-----------
deploy | Create DB | {}

There’s an alternative to CREATE ROLE john WITH LOGIN, and that’s CREATE USER which
automatically creates the LOGIN permission. It is important to understand that users and roles are
the same thing. In fact there’s no such thing as a user in PostgreSQL, only a role with LOGIN
permission

postgres=# CREATE USER john;


CREATE ROLE
postgres=# CREATE ROLE kate;
CREATE ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
darth | Superuser, Create role, Create DB, Replication | {}
john | | {}
kate | Cannot login | {}

You can also create groups via CREATE GROUP (which is now aliased to CREATE ROLE), and then
grant or revoke access to other roles.

postgres=# CREATE GROUP admin LOGIN;


CREATE ROLE
postgres=# GRANT admin TO john;
GRANT ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
admin | | {}
darth | Superuser, Create role, Create DB, Replication | {}
john | | {admin}
kate | Cannot login | {}
postgres=# REVOKE admin FROM john;
REVOKE ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
admin | | {}
darth | Superuser, Create role, Create DB, Replication | {}
john | | {}
kate | Cannot login | {}

===========================================---===================
PostgreSQL is probably the most advanced database in the open source relational database
market. It was first released in 1989, and since then, there have been a lot of enhancements.
According to db-engines, it is the fourth most used database at the time of writing.

In this blog, we will discuss PostgreSQL internals, its architecture, and how the various
components of PostgreSQL interact with one another. This will serve as a starting point and
building block for the remainder of our Become a PostgreSQL DBA blog series.

PostgreSQL Architecture
The physical structure of PostgreSQL is very simple. It consists of shared memory and a few
background processes and data files. (See Figure 1-1)

Figure 1-1. PostgreSQL structure

Shared Memory
Shared Memory refers to the memory reserved for database caching and transaction log caching.
The most important elements in shared memory are Shared Buffer and WAL buffers

Shared Buffer

The purpose of Shared Buffer is to minimize DISK IO. For this purpose, the following principles
must be met

 You need to access very large (tens, hundreds of gigabytes) buffers quickly.
 You should minimize contention when many users access it at the same time.
 Frequently used blocks must be in the buffer for as long as possible
WAL Buffer

The WAL buffer is a buffer that temporarily stores changes to the database. The contents stored
in the WAL buffer are written to the WAL file at a predetermined point in time. From a backup
and recovery point of view, WAL buffers and WAL files are very important.

PostgreSQL Process Types


PostgreSQL has four process types.

1. Postmaster (Daemon) Process


2. Background Process
3. Backend Process
4. Client Process

Postmaster Process

The Postmaster process is the first process started when you start PostgreSQL. At startup,
performs recovery, initialize shared memory, and run background processes. It also creates a
backend process when there is a connection request from the client process. (See Figure 1-2)

Figure 1-2. Process relationship diagram

If you check the relationships between processes with the pstree command, you can see that the
Postmaster process is the parent process of all processes. (For clarity, I added the process name
and argument after the process ID)
Background Process

The list of background processes required for PostgreSQL operation are as follows. (See Table
1-1)

Process Role
logger Write the error message to the log file.
checkpointer When a checkpoint occurs, the dirty buffer is written to the file.
writer Periodically writes the dirty buffer to a file.
wal writer Write the WAL buffer to the WAL file.
Fork autovacuum worker when autovacuum is enabled.It is the responsibility of
Autovacuum
the autovacuum daemon to carry vacuum operations on bloated tables on
launcher
demand
archiver When in Archive.log mode, copy the WAL file to the specified directory.
DBMS usage statistics such as session execution information ( pg_stat_activity
stats collector
) and table usage statistical information ( pg_stat_all_tables ) are collected.

Backend Process

The maximum number of backend processes is set by the max_connections parameter, and the
default value is 100. The backend process performs the query request of the user process and
then transmits the result. Some memory structures are required for query execution, which is
called local memory. The main parameters associated with local memory are:

1. work_mem Space used for sorting, bitmap operations, hash joins, and merge joins. The
default setting is 4 MB.
2. Maintenance_work_mem Space used for Vacuum and CREATE INDEX . The default
setting is 64 MB.
3. Temp_buffers Space used for temporary tables. The default setting is 8 MB.

Client Process
Client Process refers to the background process that is assigned for every backend user
connection.Usually the postmaster process will fork a child process that is dedicated to serve a
user connection.

Database Structure
Here are some things that are important to know when attempting to understand the database
structure of PostgreSQL.

Items related to the database

1. PostgreSQL consists of several databases. This is called a database cluster.


2. When initdb () is executed, template0 , template1 , and postgres databases are created.
3. The template0 and template1 databases are template databases for user database creation
and contain the system catalog tables.
4. The list of tables in the template0 and template1 databases is the same immediately after
initdb (). However, the template1 database can create objects that the user needs.
5. The user database is created by cloning the template1 database.

Items related to the tablespace

1. The pg_default and pg_global tablespaces are created immediately after initdb().
2. If you do not specify a tablespace at the time of table creation, it is stored in the
pg_dafault tablespace.
3. Tables managed at the database cluster level are stored in the pg_global tablespace.
4. The physical location of the pg_default tablespace is $PGDATA\base.
5. The physical location of the pg_global tablespace is $PGDATA\global.
6. One tablespace can be used by multiple databases. At this time, a database-specific
subdirectory is created in the table space directory.
7. Creating a user tablespace creates a symbolic link to the user tablespace in the
$PGDATA\tblspc directory.

Items related to the table

1. There are three files per table.


2. One is a file for storing table data. The file name is the OID of the table.
3. One is a file to manage table free space. The file name is OID_fsm .
4. One is a file for managing the visibility of the table block. The file name is OID_vm .
5. The index does not have a _vm file. That is, OID and OID_fsm are composed of two
files.

Other Things to Remember...

The file name at the time of table and index creation is OID, and OID and pg_class.relfilenode
are the same at this point. However, when a rewrite operation ( Truncate , CLUSTER , Vacuum
Full , REINDEX , etc.) is performed, the relfilenode value of the affected object is changed, and
the file name is also changed to the relfilenode value. You can easily check the file location and
name by using pg_relation_filepath ('< object name >'). template0, template1, postgres database

Running Tests
If you query the pg_database view after initdb() , you can see that the template0 , template1 , and
postgres databases have been created.

 Through the datistemplate column, you can see that the template0 and template1
databases are database for template for user database creation.
 The datlowconn column indicates whether the database can be accessed. Since the
template0 database can’t be accessed, the contents of the database can’t be changed
either.
 The reason for providing two databases for the templateis that the template0 database is
the initial state template and the template1 database is the template added by the user.
 The postgres database is the default database created using the template1 database. If you
do not specify a database at connection time, you will be connected to the postgres
database.
 The database is located under the $PGDATA/base directory. The directory name is the
database OID number.

Create User Database


The user database is created by cloningthe template1 database. To verify this, create a user table
T1 in the template1 database. After creating the mydb01 database, check that the T1 table exists.
(See Figure 1-3.)
Figure 1-3. Relationship between Template Database and User Database

pg_default tablespace

If you query pg_tablespace after initdb (), you can see that the pg_default and pg_global
tablespaces have been created.

The location of the pg_default tablespace is $PGDATA\base. There is a subdirectory by database


OID in this directory. (See Figure 1-4)
Figure 1-4. Pg_default tablespace and database relationships from a physical configuration
perspective

pg_global tablespace

The pg_global tablespace is a tablespace for storing data to be managed at the 'database cluster'
level.

 For example, tables of the same type as the pg_database table provide the same
information whether they are accessed from any database. (See Figure 1-5)
 The location of the pg_global tablespace is $PGDATA\global.

Figure 1-5. Relationship between pg_global tablespace and database

Create User Tablespace


1postgres=# create tablespace myts01 location '/data01';

The pg_tablespace shows that the myts01 tablespace has been created.
Symbolic links in the $PGDATA/pg_tblspc directory point to tablespace directories.

Connect to the postgres and mydb01 databases and create the table.

If you look up the /data01 directory after creating the table, you will see that the OID directory
for the postgres and mydb01 databases has been created and that there is a file in each directory
that has the same OID as the T1 table.
How to Change Tablespace Location

PostgreSQL specifies a directory when creating tablespace. Therefore, if the file system where
the directory is located is full, the data can no longer be stored. To solve this problem, you can
use the volume manager. However, if you can’t use the volume manager, you can consider
changing the tablespace location. The order of operation is as follows.
Note: Tablespaces are also very useful in environments that use partition tables. Because you
can use different tablespaces for each partition table, you can more flexibly cope with file system
capacity problems.

What is Vacuum?
Vacuum does the following:

1. Gathering table and index statistics


2. Reorganize the table
3. Clean up tables and index dead blocks
4. Frozen by record XID to prevent XID Wraparound

#1 and #2 are generally required for DBMS management. But #3 and #4 are necessary because
of the PostgreSQL MVCC feature

Download the Whitepaper Today

PostgreSQL Management & Automation with ClusterControl


Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Differences Between Oracle MySQL and PostgreSQL


The biggest difference I think is the MVCC model and the existence of a shared pool. This is
also considered a feature of PostgreSQL. (See Table 1-2)

Item ORACLE PostgreSQL


MVCC model UNDO Store previous
Implementation method Segment record within block
Shared Pool exists it does not exist

Differences in the MVCC Model

To increase concurrency, you must follow the principle that "read operations do not block write
operations and write operations should not block read operations". To implement this principle, a
Multi Version Concurrency Control (MVCC) is required. ORACLE uses UNDO segments to
implement MVCC. On the other hand, PostgreSQL uses a different way to store previous records
in a block. It uses the transaction XID and xmin and xmax pseudo columns for transaction row
versioning.

Shared Pool

PostgreSQL does not provide a shared pool. This is somewhat embarrassing for users familiar
with ORACLE. Shared Pool is a very important and essential component in
ORACLE.PostgreSQL provides the ability to share SQL information at the process level instead
of the Shared Pool. In other words, if you execute the same SQL several times in one process, it
will hard-parse only once.

Thanks for reading and stay tuned for the next installment in this blog series.

Оценить