Вы находитесь на странице: 1из 6

The Design of POSTGRES — A research report

based on review of article by Stonebraker and Rowe


(1986)
Arnold Doba

Abstract

POSTGRES database management system came about as an improvement to INGRES, an legendary database
management system. The motivation for its introduction lies in the shortcomings of INGRES particularly with
respect to support for complex objects, alerts and rule processing, extendibility, inefficient storage system as
well as absence of some basic and advanced data types. This report presents a review of the design and
implementation of POSTGRES, as proposed by Stonebraker and Rowe, and how such design eliminates
shortcomings of INGRES to come up with a better system.

Introduction

INGRES database management system(DBMS) was one of the first relational database
systems, having been implemented in the 1975-1977 era. While the DBMS well implemented
the relational model, it was still shortcoming when applied to complex disciplines such as
engineering and computer graphics where complex objects need to be modeled and
represented. At the same time, the code for the system was so congested and unclean making
it rather difficult and time-consuming to edit it to achieve reasonable improvements.
Consequently, Stonebraker and Rowe (1986) proposed a whole new, improved system
POSTGRES (PostINGRES), to counter those shortcomings and even come up with new and
better features. This report summarizes that design and how it was implemented.

The first part of this report provides a background, thus goes into detail on how POSTGRES
was designed and implemented and the rationale behind such design and implementation
decisions. Next is the methodology, which covers the tools and techniques used for
conducting this research, and justification of such. After that, we look at the results of this
research, dwelling mainly on how it influenced other researchers and contemporary DBMSs.
This is followed by a discussion of the findings and possible way forward. Lastly, we present
the overall summary of the research report
Background

The next sub-section covers shortcomings of INCRES, that is improvements brought about
by the design of POSTGRES. After that, we look at the implementation decisions of the
designs

Key Design Decisions for POSTGRES


Better Support for complex objects
The first challenge with INGRES relates to complex objects. INGRES represented a
conventional relational model and in such a model, storage of geometric objects such as
polygons, lines and circles for example would require a number of tables to store the
properties of each object. However, in order to display the object, additional information in
the form of position, colour, scale factor and another factors is required and that. This poses
challenge of inefficiency when access since a number of tables have to be joined impacting
on speed. To counter that, POSTGRES's POSTQUEL, a new and improved query language,
provides a way to store a geometric object and all its properties in a single table leading to
better and faster access.
Alerts, Trigger and Rule Processing
POSTGRES was also designed to support active databases and rules, which form the basis of
most applications. A database alerter is therefore needed that will send a bring attention to a
certain state. Triggers on the other hand can be used to propagate updates in the database to
maintain consistency. Many expert systems are also essentially a set of rules than data values.
A point of sale system for example can be set to provide discount based on quantity bought
and price. This mean discount is not kept as data values but set of rules which can be fired as
necessary and as triggered by a certain action(trigger) in this case sale of a particular item.
POSTGRES was built to support this.
Improved storage system
POSTGRES uses both optical and magnetic disks. The design recognises that optical disks
have slower access characteristics compared to magnetic. However, optical disks are cheaper
and more. To balance between cost and performance, optical disks re used for an archival
role and to store historic or more permanent data while magentic disks are used for storing
indexes and more recent data. A daemon process then runs in the background to move data
between the two disks as required.
Improved query language (POSTQUEL)
The design of POSTGRES also ships with POSTQUEl, an improvement of QUEl, which is
the query language for INGRES. POSTQUEL include everything in QUEL but also goes a
step further to achieve flexibility with respect support of the following:
• more predefined data types
• complex objects
• time varying data/snapshots
• iterative queries, alerters, triggers and rules
Time varying data or snapshots support retrieval of historical data. Iterative queries and rules
makes it possible to mode real business scenarios without necessarily having to store the
actual data values.
Programming interface
The interface was designed to allow flexibility and improve performance. All the
POSTQUEL commands for instance are accessible from code. In addition , POSTGRES also
employs the concept of portal used to retrieve or execute commands. Portal acts as a data
buffer facilitating quicker access. POSTQUEL queries are translated into portal command for
accessing or updating data as required. On top of that, POSTGRES also supports query
compilation and fast path access., Compilation allows queries to be pre-compiled leading to
better speeds, but it is not enough on its own, hence fast-path access. Fats path avoids
overhead by sending only the essential part of the query to the backend in binary format.
Better crush recovery code
The design of POSTGRES reduces the amount of code in the DBMS written to support crash
recovery unlike most. POSTGRESS allows user-defined access methods which makes it
imperative to have a simple and easily extendible model for crash recovery. To achieve that,
POSTGRES log is treated as normal data managed by the DBMS thereby simplifying the
recovery code and simultaneously provide support for access to the historical data.

Methodology
This report is primarily a product of secondary data, particularly a research papers published
on ResearchGate, an online digital library where the paper under review is also published.
Firstly, a thorough analysis of the paper by under review was done taking note of key
concepts raised. Other papers citing or referencing this paper were then systematically
reviewed. Actual papers to review were selected by taking at least 5 papers from periods
1886-1995, 1996-2005, 2006-2015 and 2016-2020, taking into account that the paper under
review was published in 1986. Those papers were the reviewed to get an idea of how the
paper influenced other DBMS systems and other ideas just after release, some years after
release as well as how it is currently influencing researches and ideas in databases field.

Results

A review of literature citing this paper led to a number of discoveries. Firstly, it was
discovered that this design of POSTGRES , albeit more than 30 years old, is still in use today,
as a database for real world applications and as a support database for other researches.
However, the design, as expected, has seen some evolution. POSTQUEL, the query language,
for instance, has evolved to PostgreSql. In 2019, Miranda et al came up with an open source
framework for geospartial feature engineering which rode on postgres for managing data and
patterns. Yildrim and Watson (2020) used capitalised on postgres ability to store complex
objects in their proposal for a Cloud-Aware Distributed Object Storage System to Retrieve
Large Data via HTML5-Enabled Web Browsers(CADOS). The databse was used to store
object and stripe information.

The research has also influenced a number of relational database concepts and techniques,
mostly as we know them today and in some cases with minor to medium alterations.
Versioning for instance, as we know it in almost all relational databases, POSTGRES
included, is a result of this research. Just as noted by Yue et al(2020) and Shen et al (2003).
Yu and Sarwat(2016) also built on POSTGRES indexing power and design to propose their
own fast, and scalable database indexing approach. Transaction control also builds on top of
transaction management proposed in this research(Sippu and Soisalon-Soininen, 2014).

It was also evident that the research influenced other relational and non relational DBMSs
and models. In 2003 for instance, Viqueira proposed an extension of the relational model for
the management of spatial and spatio-temporal data which effectively was built on top of the
design of POSTGRES. Shen at al (2003) also tried to ride on POSTGRES design to come up
with an environment for large scale scientific computations.
Lastly, it was noted that researchers are still taking a leaf from POSTGRES design and
applying to contemporary technologies. Influence on contemporary technologies. Wang et al
(2018), for instance, heavily used POSTGRES transaction management, version management
and indexing techniques to come up with storage engine for blockchain and forkable
applications. Javed et al(2017) capitalised on POSTGRES alert and trigger ideas to develop a
big data processing pipeline based Apache Kafka and Flink. The trigger would determine
how and when to move to a next step in the pipeline.

Discussion

The findings above show that the design of POSTGRES still has a huge bearing on today's
relational DBMS. The relational model itself for example hasn't really changed. However, it
probably imperative note that not every application today requires the relational model and
concepts. Big data for instance, which makes up the huge chunk of data we have today, tend
to be unstructured or semi structured making relational model not the best fit for modeling it.
We can still benefit from indexing and version management ideas. However, for feature such
as transaction management, the we derive little benefit since big data doesn't necessarily have
to be transnational but eventually consistent. Judging from this, a reasonable way forward is
enjoy the relational concepts whilst also finding ways to tailor some of them to suit non-
relational applications.

Summary

The design of POSTGRES by Stonebraker and Rowe(1986), being one of the first relational
database systems greatly influenced relational databases as we know them today.
POSTGRES itself is still hugely operational, with the query language having evolved from
POSTQUEL to PostgreSql. The design also influenced and improved version and transaction
management, indexing, and storage design for POSTGRES database and also many other
relational databases today. These changes were and improvements were achieved without
changing the traditional relational model which has been proven to work. Lastly,
contemporary researchers are actually trying to combine some of these concepts with current
technologies such blockchain and streaming technologies like Apache Kafka to solve
problems
References

Aberer, K. and Che, D. (1997). Rule-Based Generation of Logical Query Plans with
Controlled Complexity, .
Aghayev, A.; Weil, S.; Kuchnik, M.; Nelson, M.; Ganger, G. and Amvrosiadis, G. (2019).
File systems unfit as distributed storage backends: lessons from 10 years of Ceph
evolution, : 353-369.
Javed, M.; Lu, X. and Panda, D. (2017). Characterization of Big Data Stream Processing
Pipeline: A Case Study using Flink and Kafka, : 1-10.
Miranda, L.; Samson, M.; Orden, I.; Silmaro, B.; Guzman, I. and Sy, S. (2019). Geomancer:
An Open-Source Framework for Geospatial Feature Engineering, .
Shen, X.; Liao, W.; Choudhary, A.; Memik, G.; Thiruvathukal, G. and Singh, A. (2003). A
Novel Application Development Environment for Large-Scale Scientific Computations, .
Sippu, S. and Soisalon-Soininen, E. (2014a). . In: (Ed.), Concurrency Control by
Versioning,.
Sippu, S. and Soisalon-Soininen, E. (2014b). . In: (Ed.), Transaction Rollback and Restart
Recovery, .
Stonebraker, M. and Rowe, L. (1994). The design of POSTGRES, ACM SIGMOD Record
15.
Viqueira, J. (2003). Formal extension of the relational model for the management of spatial
and spatio-temporal data, .
Wang, S.; Dinh, T.; Lin, Q.; Xie, Z.; Cai, Q.; Chen, G.; Fu, W.; Ooi, B. and Ruan, P. (2018).
ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications,
Proceedings of the VLDB Endowment 11.
Yu, J. and Sarwat, M. (2016). Hippo: A Fast, yet Scalable, Database Indexing Approach, .
Yue, C.; Xie, Z.; Chen, G.; Ooi, B.; Wang, S. and Xiao, X. (2020). Analysis of Indexing
Structures for Immutable Data (Full Version), .
Yıldırım, A. and Watson, D. (2016). . In: (Ed.), A Cloud-Aware Distributed Object Storage
System to Retrieve Large Data via HTML5-Enabled Web Browsers, .

Вам также может понравиться