Вы находитесь на странице: 1из 27

A.1.1 Outline the differences between data and information.

What is Data: Data is a raw and unorganized fact that required to be processed to make it
meaningful. Data can be simple at the same time unorganized unless it is organized. Generally,
data comprises facts, observations, perceptions numbers, characters, symbols, image, etc.

Data is always interpreted, by a human or machine, to derive meaning. So, data is meaningless.
Data contains numbers, statements, and characters in a raw form.

What is Information: Information is a set of data which is processed in a meaningful way


according to the given requirement. Information is processed, structured, or presented in a
given context to make it meaningful and useful.

It is processed data which includes data that possess context, relevance, and purpose. It also
involves manipulation of raw data.

Information assigns meaning and improves the reliability of the data. It helps to ensure
undesirability and reduces uncertainty. So, when the data is transformed into information, it
never has any useless details.

1
Data Vs. Information

Parameters Data Information


Description Qualitative Or Quantitative It is a group of data which carries
Variables which helps to develop news and meaning.
ideas or conclusions.
Etymology Data comes from a Latin word, Information word has old French and
datum, which means "To give middle English origins. It has referred
something." Over a time "data" to the "act of informing.". It is mostly
has become the plural of datum. used for education or other known
communication.
Format Data is in the form of numbers, Ideas and inferences
letters, or a set of characters.
Represented in It can be structured, tabular data, Language, ideas, and thoughts based
graph, data tree, etc. on the given data.
Meaning Data does not have any specific It carries meaning that has been
purpose. assigned by interpreting data.
Interrelation Information that is collected Information that is processed.
Feature Data is a single unit and is raw. It Information is the product and group
alone doesn't have any meaning. of data which jointly carry a logical
meaning.
Dependence It never depends on Information It depended on Data.
Measuring Measured in bits and bytes. Measured in meaningful units like
unit time, quantity, etc.
Support for It can't be used for decision It is widely used for decision making.
Decision making
making
Contains Unprocessed raw factors Processed in a meaningful way
Knowledge It is low-level knowledge. It is the second level of knowledge.
level
Characteristic Data is the property of an Information is available for sale to the
organization and is not available public.
for sale to the public.
Dependency Data depends upon the sources Information depends upon data.
for collecting data.
Example Ticket sales on a band on tour. Sales report by region and venue. It
gives information which venue is
profitable for that business.
Significance Data alone has no significance. Information is significant by itself.
Meaning Data is based on records and Information is considered more
observations and, which are reliable than data. It helps the
stored in computers or researcher to conduct a proper
remembered by a person. analysis.

2
Usefulness The data collected by the Information is useful and valuable as
researcher, may or may not be it is readily available to the researcher
useful. for use.
Dependency Data is never designed to the Information is always specific to the
specific need of the user. requirements and expectations
because all the irrelevant facts and
figures are removed, during the
transformation process.

DIKW (Data Information Knowledge Wisdom)

DIKW is the model used for discussion of data, information, knowledge, wisdom and their
interrelationships. It represents structural or functional relationships between data, information,
knowledge, and wisdom.

Example:

KEY DIFFERENCE

• Data is a raw and unorganized fact that is required to be processed to make it


meaningful whereas Information is a set of data that is processed in a meaningful way
according to the given requirement.
• Data does not have any specific purpose whereas Information carries a meaning that
has been assigned by interpreting data.
• Data alone has no significance while Information is significant by itself.
• Data never depends on Information while Information is dependent on Data.

3
• Data measured in bits and bytes, on the other hand, Information is measured in
meaningful units like time, quantity, etc.
• Data can be structured, tabular data, graph, data tree whereas Information is language,
ideas, and thoughts based on the given data.

A.1.2 Outline the differences between an information system and a database.

Define Database
The database is a collection of inter-related data which is used to retrieve, insert and delete the
data efficiently. It is also used to organize the data in the form of a table, schema, views, and
reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc. Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


Database management system is a software which is used to manage the database. For example:
MySQL, Oracle, etc are a very popular commercial database which is used in different
applications.
DBMS provides an interface to perform various operations like database creation, storing data
in it, updating data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
DBMS allows users the following tasks:
• Data Definition: It is used for creation, modification, and removal of definition that
defines the organization of data in the database.
• Data Updation: It is used for the insertion, modification, and deletion of the actual data
in the database.
• Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
• User Administration: It is used for registering and monitoring users, maintain data
integrity, enforcing data security, dealing with concurrency control, monitoring
performance and recovering information corrupted by unexpected failure.
Characteristics of DBMS
• It uses a digital repository established on a server to store and manage the information.
• It can provide a clear and logical view of the process that manipulates data.
• DBMS contains automatic backup and recovery procedures.
• It contains ACID properties which maintain data in a healthy state in case of failure.
• It can reduce the complex relationship between data.
• It is used to support manipulation and processing of data.
• It is used to provide security of data.
• It can view the database from different viewpoints according to the requirements of the
user.

4
Advantages of DBMS
• Controls database redundancy: It can control data redundancy because it stores all the
data in one single database file and that recorded data is placed in the database.
• Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
• Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
• Reduce time: It reduces development time and maintenance need.
• Backup: It provides backup and recovery subsystems which create automatic backup
of data from hardware and software failures and restores the data if required.
• multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
• Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
• Size: It occupies a large space of disks and large memory to run them efficiently.
• Complexity: Database system creates additional complexity and requirements.
• Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due
to electric failure or database corruption then the data may be lost forever.

Differences between DBMS and File System

DBMS File System


DBMS is a collection of data. In DBMS, File system is a collection of data. In this
the user is not required to write the system, the user has to write the procedures
procedures. for managing the database.
DBMS gives an abstract view of data File system provides the detail of the data
that hides the details. representation and storage of data.
DBMS provides a crash recovery File system doesn't have a crash
mechanism, i.e., DBMS protects the user mechanism, i.e., if the system crashes
from the system failure. while entering some data, then the content
of the file will lost.
DBMS provides a good protection It is very difficult to protect a file under the
mechanism. file system.
DBMS contains a wide variety of File system can't efficiently store and
sophisticated techniques to store and retrieve the data.
retrieve the data.
DBMS takes care of Concurrent access In the File system, concurrent access has
of data using some form of locking. many problems like redirecting the file
while other deleting some information or
updating some information.

5
DBMS Architecture
• The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
• The client/server architecture consists of many PCs and a workstation which are
connected via the network.
• DBMS architecture depends upon how users are connected to the database to get
their request done.

Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
• In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
• Any changes done here will directly be done on the database itself. It doesn't provide a
handy tool for end users.
• The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
• The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
• The user interfaces and application programs are run on the client-side.
• The server side is responsible to provide the functionalities like: query processing
and transaction management.
• To communicate with the DBMS, client-side application establishes a connection
with the server side.

6
3-Tier Architecture
• The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
• The application on the client-end interacts with an application server which further
communicates with the database system.
• End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
• The 3-Tier architecture is used in case of large web application.

7
Three schema Architecture
The three schema architecture is also called ANSI/SPARC architecture or three-
level architecture.
This framework is used to describe the structure of a specific database system.
The three schema architecture is also used to separate the user applications and
physical database.
The three schema architecture contains three-levels. It breaks the database down
into three different categories.
The three-schema architecture is as follows:

8
In the above diagram:
• It shows the DBMS architecture.
• Mapping is used to transform the request and response between various database
levels of architecture.
• Mapping is not good for small DBMS because it takes more time.
• In External / Conceptual mapping, it is necessary to transform the request from
external level to conceptual schema.
• In Conceptual / Internal mapping, DBMS transform the request from the conceptual
to internal level.
1. Internal Level
The internal level has an internal schema which describes the physical storage
structure of the database.
The internal schema is also known as a physical schema.
It uses the physical data model. It is used to define that how the data will be stored
in a block.
The physical level is used to describe complex low-level data structures in detail.
2. Conceptual Level
The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
The conceptual schema describes the structure of the whole database.
The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
In the conceptual level, internal details such as an implementation of the data
structure are hidden.
Programmers and database administrators work at this level.

9
3. External Level
At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
An external schema is also known as view schema.
Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
The view schema describes the end user interaction with database systems.

Data model Schema and Instance


The data which is stored in the database at a particular moment of time is called an
instance of the database.
The overall design of a database is called schema.
A database schema is the skeleton structure of the database. It represents the logical
view of the entire database.
A schema contains schema objects like table, foreign key, primary key, views,
columns, data types, stored procedure, etc.
A database schema can be represented by using the visual diagram. That diagram
shows the database objects and relationship with each other.
A database schema is designed by the database designers to help programmers
whose software will interact with the database. The process of database creation is
called data modeling.

A schema diagram can display only some aspects of a schema like the name of record type,
data type, and constraints. Other aspects can't be specified through the schema diagram. For
example, the given figure neither show the data type of each data item nor the relationship
among various files.
In the database, actual data changes quite frequently. For example, in the given figure, the
database changes whenever we add a new grade or add a student. The data at a particular
moment of time is called the instance of the database.

10
Data Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.

There are two types of data independence:


1. Logical Data Independence
Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.
Logical data independence is used to separate the external level from the conceptual
view.
If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
Logical data independence occurs at the user interface level.

2. Physical Data Independence


Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal
levels.
Physical data independence occurs at the logical interface level.

11
A.1.5: Define the term “database transaction”
A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from
A's account to B's account. This very simple and small transaction involves several low-level
tasks.

A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
A.1.7. Explain the importance of the ACID properties of a database transaction
ACID Properties
A transaction is a very small unit of a program and it may contain several low level tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and
Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.

12
Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of
the transaction or after the execution/abortion/failure of the transaction.

Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after
the execution of the transaction as well.

Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified data. If a transaction commits but the system fails before
the data could be written on to the disk, then that data will be updated once the system springs
back into action.

Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.

What is a Database?
A database is a collection of related data which represents some aspect of the real world. A
database system is designed to be built and populated with data for a certain task.

What is DBMS?
Database Management System (DBMS) is a software for storing and retrieving users' data
while considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request for data from an application and
instructs the operating system to provide the specific data. In large systems, a DBMS helps
users and other third-party software to store and retrieve data.

DBMS allows users to create their own databases as per their requirement. The term “DBMS”
includes the user of the database and other application programs. It provides an interface
between the data and the software application.

Define Normalisation
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are:
• Insertion
• Update and

13
• deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like
this:
emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to
two departments of the company. If we want to update the address of Rick then we have to
update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data.
Normalization
Here are the most commonly used normal forms:
• First normal form(1NF)
• Second normal form(2NF)
• Third normal form(3NF)
• Boyce & Codd normal form (BCNF)
First normal form (1NF)
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values.
It should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees.
It creates a table that looks like this:
emp_id emp_name emp_address emp_mobile

14
101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur
9900012222

103 Ron Chennai 7778881212

9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in
the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:
• Table is in 1NF (First normal form)
• No non-prime attribute is dependent on the proper subset of any candidate key of table.
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.
teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

15
Candidate Keys: {teacher_id, subject}
Non-prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non-prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age

111 38

222 38

333 40
teacher_subject table:
teacher_id subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry
Now the tables comply with Second normal form (2NF).
Transitive dependency in DBMS
A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies. For e.g.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).
Book Author Author_age

Game of Thrones George R. R. Martin 66

16
Harry Potter J. K. Rowling 49

Dying of the Light George R. R. Martin 66

{Book} ->{Author} (if we know the book, we knows the author name)

{Author} does not ->{Book}

{Author} -> {Author_age}

Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that
makes sense because if we know the book name we can know the author’s age.

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime attribute on any super key should be
removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
• X is a super key of table
• Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they
create a table named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}

17
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent
on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table:
emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999


employee_zip table:
emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y,
X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

18
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality

1001 Austrian

1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600


emp_dept_mapping table:
emp_id emp_dept

1001 Production and planning

1001 stores

1002 design and technical support

1002 Purchasing department


Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.

A.2.1: Define the terms: database management system (DBMS) and relational database
management system (RDBMS).

DBMS:

19
RDBMS:

A.2.2: Outline the functions and tools of a DBMS.

A.2.3: Describe how a DBMS can be used to promote data security.

A.2.4: Define the term schema.

A.2.5: Identify the characteristics of the three levels of the schema: conceptual, logical,
physical.

A.2.6: Outline the nature of the data dictionary.

A.2.7: Explain the importance of a data definition language in implementing a data


model.

A.2.8: Explain the importance of data modelling in the design of a database.

A.2.9: Define the following database terms: table, record, field, primary key, secondary
key, foreign key, candidate key, composite primary key, join.

A.3 Further aspects of database management (10 hours)


A.3.1 Explain the role of a database administrator.
Typically there are three types of users for a DBMS. They are :
1. The End User who uses the application. Ultimately, this is the user who actually puts
the data in the system into use in business. This user need not know anything about the
organization of data in the physical level. She also need not be aware of the complete
data in the system. She needs to have access and knowledge of only the data she is
using.
2. The Application Programmer who develops the application programs. She has more
knowledge about the data and its structure since she has manipulate the data using her
programs. She also need not have access and knowledge of the complete data in the
system.
3. The Database Administrator (DBA) who is like the super-user of the system. Individual
in the association who controls the plan and the utilisation of the database alludes as
DBA
The role of the DBA is very important and is defined by the following functions.
• Defining the Schema The DBA defines the schema which contains the structure of the
data in the application. The DBA determines what data needs to be present in the system

20
ad how this data has to be represented and organized. As per this construction, database
will be produced to store required information for an association.
• Defining Storage Structure and Access Method: The DBA chooses how the information
is to be spoken to in the put away database.
• Liaising with Users The DBA needs to interact continuously with the users to
understand the data in the system and its use. The DBA figures out which client needs
access to which part of the database
• Defining Security & Integrity Checks The DBA finds about the access restrictions to
be defined and defines security checks accordingly. Data Integrity checks are also
defined by the DBA.
• Defining Backup / Recovery Procedures The DBA also defines procedures for backup
and recovery. Defining backup procedures includes specifying what data is to backed
up, the periodicity of taking backups and also the medium and storage place for the
backup data.
• Monitoring Performance The DBA has to continuously monitor the performance of the
queries and take measures to optimize all the queries in the application.
• Assistance to Application Programmers: The DBA gives help to application software
engineers to create application programs.
Above listed DBA's Roles are the same we learned from our text books. Practically Role of a
DBA differ from association to association. In any case, the essential Roles that numerous
DBAs are accused of are listed below :
1. The First one is Install and configure Oracle software.
2. Taking Back up and planning for recovery of the database.
3. database schema designing and make required fundamental database objects.
4. Design the database schema and create any necessary database objects.
5. Define ideal application SQL.
6. Execute Guarantee database security to defend the information.
7. Work intimately with application engineers and framework chairmen to guarantee all
database needs are being met.
8. Apply patches to fixes or upgrade the database as per requirements.

A.3.2 Explain how end-users can interact with a database


Categories of End Users in DBMS
End users are basically those people whose jobs require access to the database for querying,
updating and generating reports. The database primarily exists for their use. There are several
categories of end users these are as follows:
1. Casual End Users –These are the users who occasionally access the database but they require
different information each time. They use a sophisticated database query language basically to
specify their request and are typically middle or level managers or other occasional browsers.

21
These users learn very few facilities that they may use repeatedly from the multiple facilities
provided by DBMS to access it.
2. Naive or parametric end users –These are the users who basically make up a sizeable portion
of database end users. The main job function revolves basically around constantly querying
and updating the database for this we basically use a standard type of query known as canned
transaction that have been programmed and tested. These users need to learn very little about
the facilities provided by the DBMS they basically have to understand the users’ interfaces of
the standard transaction designed and implemented for their use. The following tasks are
basically performed by Naive end users:
• The person who is working in the bank will basically tell us the account balance and
post-withdrawal and deposits.
• Reservation clerks for airlines, railway, hotels, and car rental companies basically check
availability for a given request and make the reservation.
• Clerks who are working at receiving end for shipping companies enter the package
identifies via barcodes and descriptive information through buttons to update a central
database of received and in transit packages.
3. Sophisticated end users –These users basically include engineers, scientist, business
analytics and others who thoroughly familiarize themselves with the facilities of the DBMS in
order to implement their application to meet their complex requirement. These users try to learn
most of the DBMS facilities in order to achieve their complex requirements.
4. Standalone users –These are those users whose job is basically to maintain personal
databases by using a ready-made program package that provides easy to use menu-based or
graphics-based interfaces, An example is the user of a tax package that basically stores a variety
of personal financial data of tax purposes. These users become very proficient in using a
specific software package.
A.3.3 Describe different methods of database recovery.
Database Recovery Techniques in DBMS

Database systems, like any other computer system, are subject to failures but the data stored in
it must be available as and when required. When a database fails it must possess the facilities
for fast recovery. It must also have atomicity i.e. either transactions are completed successfully
and committed (the effect is recorded permanently in the database) or the transaction should
have no effect on the database.

There are both automatic and non-automatic ways for both, backing up of data and recovery
from any failure situations. The techniques used to recover the lost data due to system crash,
transaction errors, viruses, catastrophic failure, incorrect commands execution etc. are database
recovery techniques. So to prevent data loss recovery techniques based on deferred update and
immediate update or backing up data can be used.

Recovery techniques are heavily dependent upon the existence of a special file known as a
system log. It contains information about the start and end of each transaction and any updates

22
which occur in the transaction. The log keeps track of all transaction operations that affect the
values of database items. This information is needed to recover from transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T starts the
execution.
read_item(T, X): This log entry records that transaction T reads the value of database item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T changes the
value of the database item X from old_value to new_value. The old value is sometimes known
as a before an image of X, and the new value is known as an afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to the database
successfully and its effect can be committed (recorded permanently) to the database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before which the
DBMS was in consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will not abort
(terminate without completing). Once committed, the transaction is permanently recorded in
the database. Commitment always involves writing a commit entry to the log and writing the
log to disk. At the time of a system crash, item is searched back in the log for all transactions
T that have written a start_transaction(T) entry into the log but have not written a commit(T)
entry yet; these transactions may have to be rolled back to undo their effect on the database
during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log entry
write_item(T, x, old_value, new_value) and setting the value of item x in the database to old-
value.There are two major techniques for recovery from non-catastrophic transaction failures:
deferred updates and immediate updates.
Deferred update – This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails before reaching its commit
point, it will not have changed the database in any way so UNDO is not needed. It may be
necessary to REDO the effect of the operations that are recorded in the local transaction
workspace, because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm
Immediate update – In the immediate update, the database may be updated by some operations
of a transaction before the transaction reaches its commit point. However, these operations are
recorded in a log on disk before they are applied to the database, making recovery still possible.
If a transaction fails to reach its commit point, the effect of its operation must be undone i.e.
the transaction must be rolled back hence we require both undo and redo. This technique is
known as undo/redo algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to
disk. A collection of in-memory buffers called the DBMS cache is kept under control of DBMS
for holding these buffers. A directory is used to keep track of which database items are in the

23
buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not modified else 1
if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is constructed,
where the ith entry points to the ith database page on the link. When a transaction began
executing the current directory is copied into a shadow directory. When a page is to be
modified, a shadow page is allocated in which changes are made and when it is ready to become
durable, all pages that refer to original are updated to refer new replacement page.
Some of the backup techniques are as follows :
Full database backup – In this full database including data and database, Meta information
needed to restore the whole database, including full-text catalogues are backed up in a
predefined time series.
Differential backup – It stores only the data changes that have occurred since last full database
backup. When same data has changed many times since last full database backup, a differential
backup stores the most recent version of changed data. For this first, we need to restore a full
database backup.
Transaction log backup – In this, all events that have occurred in the database, like a record of
every single statement executed is backed up. It is the backup of transaction log entries and
contains all transaction that had happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to perform a backup from a transaction
log if the data files are destroyed and not even a single committed transaction is lost.

Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second.
The durability and robustness of a DBMS depends on its complex architecture and its
underlying hardware and system software. If it fails or crashes amid transactions, it is expected
that the system would follow some sort of algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t
go any further. This is called transaction failure where only a few transactions or processes are
hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some code error
or any internal error condition.
• System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system
aborts an active transaction.

System Crash: There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply may cause
the failure of underlying hardware or software failure.

24
Examples may include operating system errors.
Disk Failure: In early days of technology evolution, it was a common problem where hard-disk
drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided
into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of
information.
• Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-
disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity: When a system crashes, it may have several transactions being
executed and various files opened for them to modify the data items. Transactions are made of
various operations, which are atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained, that is, either all the operations are
executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be rolled
back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining
the atomicity of a transaction −
• Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.
Log-based recovery works as follows −
• The log file is kept on a stable storage media.
• When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
• When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>

25
It reads Tn has changed the value of X, from V1 to V2.
• When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
• Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
• Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions: When more than one transaction are being executed
in parallel, the logs are interleaved. At the time of recovery, it would become hard for the
recovery system to backtrack all logs, and then start recovering. To ease this situation, most
modern DBMS use the concept of 'checkpoints'.
Checkpoint: Keeping and maintaining logs in real time and in real environment may fill out all
the memory space available in the system. As time passes, the log file may grow too big to be
handled at all. Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before which the
DBMS was in consistent state, and all the transactions were committed.
Recovery: When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −

• The recovery system reads the logs backwards from the end to the last checkpoint.
• It maintains two lists, an undo-list and a redo-list.
• If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
• If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before saving
their logs.

26
27

Вам также может понравиться