Вы находитесь на странице: 1из 9

BST

3107:
DATABASE
SYSTEMS
II
Topic 2
Distributed Database
A distributed database is a database that consists of two or more data files located
at different sites on a computer network. It is alternatively described as a collection
of multiple logically related database distributed over a computer network, and a
distributed database management system as a software system that manages a
distributed database while making the distribution transparent to the user.
Note: The distributed database management system is a software use to manage a
DDB, and which makes the distribution transparent to the user.
In a pure distributed database, the system manages a single copy of all data and
supporting database objects. A key feature of the distributed database is that
different users can access the data without interfering with one another.
Unlike a parallel system in which the processors are tightly coupled and constitute a
single database system, a distributed database system consists of loosely-coupled
sites that share no physical components. The database systems that run on each
site are independent of each other, and transactions within the database may
access data at one or more sites. The key features of a distributed database system
include:
(i)
They assume relational data model.
(ii)
There is replication The system maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
(iii)
There is fragmentation A relation is partitioned into several fragments stored
in distinct sites.
NB: A relation represents a table or an entity that contains different attributes. An
entity constructs a relation (table).
To ensure that databases in distributed systems remain up-to-date, two processes
are employed, replication and duplication.
In replication, specialized software that looks for changes in the distributive
database are used. Once the changes have been identified, the replication process
makes all the databases look the same. Replication is alternatively described as the
operation of copying and maintaining database objects in multiple databases
belonging to a distributed system. Replication can be complex and time-consuming,
and also requires a lot of time and computer resources.
Advantages of replication include:
(i) Availability - Failure of site containing a given relation does not result in
unavailability of the relation, as replicas exist.
(ii) Parallelism - Queries on a relation may be processed by several nodes in
parallel.
(iii)Reduced data transfer - A relation is available locally at each site
containing its replica.
The disadvantages of replication are:
1

(iv)
(v)

Increased cost of updates - Each replica of a given relation must be


updated.
Increased complexity of concurrency control - Concurrent updates to
distinct replicas may lead to inconsistent data unless special concurrency
control mechanisms are implemented. A solution to this is choosing one copy
as primary copy and applying concurrency control operations on it.

In duplication, one database is identified as a master, and this is them duplicated


all over. The duplication process is carried out at set times to ensure that each
distributed location has the same data. In duplication, users may only change the
master database, to ensure that local data will not be overwritten.
Homogenous and Heterogeneous Database Management Systems
A homogeneous distributed database has identical software and hardware
running all databases instances, and may appear through a single interface as if it
were a single database. In a homogeneous distributed database, all sites have
identical software and are aware of each other. They also agree to cooperate in
processing user requests. A homogeneous distributed database management
system appears to the user as a single system.
Homogeneous systems are much easier to design and manage. The approach
provides incremental growth, making the addition of a new site to the DDBMS easy,
and allows increased performance by exploiting the parallel processing capability of
multiple sites. In homogeneous distributed database:
(i) All sites have identical software. The operating system used, at each location

are same or compatible.


(ii) All sites aware of each other and agree to cooperate in processing user

requests.
(iii) The database appears to user as a single system.
(iv) The data structures used at each location must be same or compatible.
(v) The database application (or database management system) used at each
location are same or compatible.
A heterogeneous distributed database may have different hardware, operating
systems, database management systems, and even data models for different
databases. Different computers and operating systems, database applications or
data models may be used at each of the locations. In a heterogeneous distributed
database, different sites may use different schema and software. Sites in a
heterogeneous system may not be aware of each other, and may provide only
limited facilities for cooperation in transaction processing. One location may for
example have the latest relational database management technology, while another
location may store data using conventional files or old version of database
management system. The heterogeneous system is often not technically or
economically feasible.

Heterogeneous system usually result when individual sites have implemented their
own database and integration is considered at a later stage. In a heterogeneous
system, translations are required to allow communication between different DBMSs.
To provide DBMS transparency, users must be able to make requests in the
language of the DBMS at their local site. The system then has the task of locating
the data and performing any necessary translation. Data may be required from
another site that may have:
(i)

Different hardware.

(ii)

Different DBMS products.

(iii)

Different hardware and different DBMS products

Federated Databases
A federated database is a system in which several databases appear to function as
a single entity. Each component database in the system is completely self-sustained
and functional. When an application queries the federated database, the system
figures out which of its component databases contains the data being requested
and passes the request to it. A federated database may be composed of a
heterogeneous collection of databases. In a homogeneous environment, federated
databases can help distribute the load of very large databases.

The federated database system distributes queries to the appropriate component


database; the goal of the system is to ensure that a typical query will need to use
only one component, thus drastically reducing the number of rows that need to be
searched.
Federated databases have several drawbacks. Each component database is a
potential point of failure, and latency from any one server will delay an entire call.

Data Fragmentation
Data fragmentation occurs when a piece of data in memory is broken up into many
pieces that are not close together. Also known as sharding or partitioning, data
fragmentation involves splitting a data set into smaller fragments (or shards), and
distributing them across a large number of machines. Data fragmentation is carried
out by specialized software, and automatically breaks data up into fragments for
storage in different storage equipment, possibly in different locations, based on the
3

sharding policies in place. Fragments are logical data units stored at various sites in
a distributed database system.

Fragmentation enhances availability, as it is much faster to retrieve small data


fragments rather than larger ones, which significantly improves response times. It
Permits a number of transactions to execute concurrently, since they will access
different portions of a relation. Above all, the process facilitates Parallel execution of
a single query (intra-query concurrency). The challenge, however, is semantic data
control (especially integrity enforcement) becoming more difficult.
Fragmentation aims to improve reliability, performance, balanced storage capacity
and costs, communication costs, and security.
There are three types of fragmentation:
(i) Horizontal: partitions a relation along its tuples
(ii) Vertical: partitions a relation along its attributes
(iii)Mixed/hybrid: a combination of horizontal and vertical
Horizontal
and
Vertical
Management Systems
Consider the following relation:
No.
Customer
Name
1
Okello
2
Kamau
3
Chepyegon

Fragmentation
Town
Nairobi
Nakuru
Kisumu

in

Distributed

Payment
Type
Credit Card
Cash
Cash

Database

Gender
Male
Male
Female

Horizontal fragmentation divides the relation into tuples called rows.


Fragment 1:
No.
1
2
Fragment 2:
No.
3

Customer
Name
Okello
Kamau

Town

Customer
Name
Chepyegon

Town

Nairobi
Nakuru

Kisumu

Payment
Type
Credit Card
Cash

Gender

Payment
Type
Cash

Gender

Male
Male

Female

Vertical Fragmentation divides the relation into attributes called columns.


Fragment 1:
No.
Customer
Town
Gender
Name
1
Okello
Nairobi
Male
2
Kamau
Nakuru
Male
4

3
Fragment 2:
No.
1
2
3

Chepyegon

Kisumu

Customer
Name
Okello
Kamau
Chepyegon

Payment
Type
Credit Card
Cash
Cash

Female

Advantages of Fragmentation
Usage
Applications generally work with views rather than entire relations. Therefore, for
data distribution, it is appropriate to work with subsets of relation as the unit of
distribution.

Efficiency
Fragmentation ensures data is stored close to where it is most frequently used. In
addition, data that is not needed by local applications is not stored.

Parallelism
With fragments as the unit of distribution, a transaction can be divided into several
sub-queries that operate on fragments. This increases the degree of concurrency or
parallelism in the system.

Security
Data not required by local applications is not stored, and consequently not available
to unauthorized users.

Disadvantages of Fragmentation
Performance

The performance of global application that requires data from several fragments
located at different sites may be slower.

Integrity
Integrity control may be more difficult if data and functional dependencies are
fragmented and located at different sites.

Data Transparency
Data transparency is the degree to which system user may remain unaware of the
details of how and where the data items are stored in a distributed system.
Important aspects of transparency with regard to distributed systems include
fragmentation transparency, replication transparency and location transparency.
The levels of transparency featured in a distributed database system include:
o Distribution or Network transparency
Location transparency
Naming transparency
o Replication transparency
o Fragmentation transparency
Vertical fragmentation
Horizontal fragmentation
Advantages and Disadvantages of Distributed Database Systems
Advantages
Reflects Organizational Structure
Many organisations are naturally distributed over several locations. It is natural for
databases used in such an application to be distributed over these locations. The
company headquarters may wish to make global inquiries involving the access of
data at all or a number of branches.

Improved Share-ability and Local Autonomy


The geographical distribution of an organisation can be reflected in the distribution
of the data - users at one site can access data stored at other sites. Data can be
placed at the site close to the users who normally use that data. In this way, users
have local control of the data, and they can consequently establish and enforce
local policies regarding the use of this data. A global database administrator (DBA)
is responsible for the entire system.

Improved Availability
In a centralized DBMS, a computer failure terminates the applications of the DBMS.
However, a failure at one site of a DDBMS, or a failure of a communication link
making some sites inaccessible, does not make the entire system in opera bite. If a
single node fails, the system may be able to reroute the failed node's requests to
another site.
Improved Reliability
As data may be replicated so that it exists at more than one site, the failure of a
node or a communication link does not necessarily make the data inaccessible.

Improved Performance
As the data is located near the site of 'greatest demand', and given the inherent
parallelism of distributed DBMSs, speed of database access may be better than that
achievable from a remote centralized database. Furthermore, since each site
handles only a part of the entire database, there may not be the same contention
for CPU and I/O services as characterized by a centralized DBMS.

Economics
It is generally accepted that it costs much less to create a system of smaller
computers with the equivalent power of a single large computer. This makes it more
cost-effective for corporate divisions and departments to obtain separate
computers. It is also much more cost-effective to add workstations to a network
than to update a mainframe system.

The second potential cost saving occurs where database are geographically remote
and the applications require access to distributed data. In such cases, owing to the
relative expense of data being transmitted across the network as opposed to the
cost of local access, it may be much more economical to partition the application
and perform the processing locally at each site.

Modular Growth
In a distributed environment, it is much easier to handle expansion. New sites can
be added to the network without affecting the operations of other sites. This
7

flexibility allows an organisation to expand relatively easily. Adding processing and


storage power to the network can usually handle the increase in database size. In a
centralized DBMS, growth may entail changes to both hardware (the procurement of
a more powerful system) and software (the procurement of a more powerful or more
configurable DBMS).

(ii)
(iii)
(iv)
(v)
(vi)
(vii)

To achieve the advantages of a distributed database, the database management


system must have these additional functions:
(i) Keeping track of data distribution, fragmentation and replication.
Distributed query processing.
Distributed transaction management.
Replicated data management.
Distributed data recovery.
Security.
Distributed catalog management.
Disadvantages of DDBMS
Complexity
A distributed DBMS that hides the distributed nature from the user and provides an
acceptable level of performance, reliability, availability is inherently more complex
than a centralized DBMS.

Cost
Increased complexity means that we can expect the procurement and maintenance
costs for a DDBMS to be higher than those for a centralized DBMS. Furthermore, a
distributed

DBMS requires additional hardware to establish a network between sites. There are
ongoing communication costs incurred with the use of this network. There are also
additional labor costs to manage and maintain the local DBMSs and the underlying
network.

Security
In a centralised system, access to the data can be easily controlled. However, in a
distributed DBMS not only does access to replicated data have to be controlled in
multiple locations, but also the network itself has to be made secure. In the past,
networks were regarded as an insecure communication medium. Although this is
8

still partially true, significant developments have been made to make networks
more secure.

Integrity Control More Difficult


Database integrity refers to the validity and consistency of stored data. Integrity is
usually expressed in terms of constraints, which are consistency rules that the
database is not permitted to violate. Enforcing integrity constraints generally
requires access to a large amount of data that defines the constraints. In a
distributed DBMS, the communication and processing costs that are required to
enforce integrity constraints are high as compared to centralized system.

Lack of Standards
Although distributed DBMSs depend on effective communication, standard
communication and data access protocols are only beginning to appear. This lack of
standards has significantly limited the potential of distributed DBMSs. There are also
no tools or methodologies to help users convert a centralized DBMS into a
distributed DBMS
Database Design More Complex
Besides the normal difficulties of designing a centralised database, the design of a
distributed database has to take account of fragmentation of data, allocation of
fragmentation to specific sites, and data replication.

Вам также может понравиться