Вы находитесь на странице: 1из 35

Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information.

Mostly data represents recordable facts. Data aids in producing information, which is based on facts. For example, if we have data about marks obtained by all students, we can then conclude about toppers and average marks.

A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.

Characteristics

Traditionally, data was organized in file formats. DBMS was a new concept then, and all the research was done to make it overcome the deficiencies in traditional style of data management. A modern DBMS has the following characteristics −

Real-world entity − A modern DBMS is more realistic and uses real-world entities to design its architecture. It uses the behaviour and attributes too. For example, a school database may use students as an entity and their age as an attribute.

Relation-based tables − DBMS allows entities and relations among them to form tables. A user can understand the architecture of a database just by looking at the table names.

Isolation of data and application − A database system is entirely different than its data. A database is an active entity, whereas data is said to be passive, on which the database works and organizes. DBMS also stores metadata, which is data about data, to ease its own process.

Less redundancy − DBMS follows the rules of normalization, which splits a relation when any of its attributes is having redundancy in values. Normalization is a mathematically rich and scientific process that reduces data redundancy.

Consistency − Consistency is a state where every relation in a database remains consistent. There exist methods and techniques, which can detect attempt of leaving database in inconsistent state. A DBMS can provide greater consistency as compared to earlier forms of data storing applications like file-processing systems.

Query Language − DBMS is equipped with query language, which makes it more efficient to retrieve and manipulate data. A user can apply as many and as different filtering options as required to retrieve a set of data. Traditionally it was not possible where file- processing system was used.

ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally shortened as ACID). These concepts are applied on transactions, which manipulate data in a database. ACID properties help the database stay healthy in multi-transactional environments and in case of failure.

Multiuser and Concurrent Access − DBMS supports multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on

transactions when users attempt to handle the same data item, but users are always unaware of them.

Multiple views − DBMS offers multiple views for different users. A user who is in the Sales department will have a different view of database than a person working in the Production department. This feature enables the users to have a concentrate view of the database according to their requirements.

Security − Features like multiple views offer security to some extent where users are unable to access data of other users and departments. DBMS offers methods to impose constraints while entering data into the database and retrieving the same at a later stage. DBMS offers many different levels of security features, which enables multiple users to have different views with different features. For example, a user in the Sales department cannot see the data that belongs to the Purchase department. Additionally, it can also be managed how much data of the Sales department should be displayed to the user. Since a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to break the code.

Users

A typical DBMS has users with different rights and permissions who use it for different purposes. Some users retrieve data and some back it up. The users of a DBMS can be broadly categorized as follows

users of a DBMS can be broadly categorized as follows – • Administrators − Administrators maintain

Administrators − Administrators maintain the DBMS and are responsible for administrating the database. They are responsible to look after its usage and by whom it should be used. They create access profiles for users and apply limitations to maintain isolation and force security. Administrators also look after DBMS resources like system license, required tools, and other software and hardware related maintenance.

Designers − Designers are the group of people who actually work on the designing part of the database. They keep a close watch on what data should be kept and in what format. They identify and design the whole set of entities, relations, constraints, and views.

End Users − End users are those who actually reap the benefits of having a DBMS. End users can range from simple viewers who pay attention to the logs or market rates to sophisticated users such as business analysts.

3-tier Architecture

A 3-tier architecture separates its tiers from each other based on the complexity of the users and how they use the data present in the database. It is the most widely used architecture to design a DBMS.

Database (Data) Tier − At this tier, the database resides along with its query processing languages. We also have the relations that define the data and their constraints at this level.

Application (Middle) Tier − At this tier reside the application server and the programs that access the database. For a user, this application tier presents an abstracted view of the database. End-users are unaware of any existence of the database beyond the application. At the other end, the database tier is not aware of any other user beyond the application tier. Hence, the application layer sits in the middle and acts as a mediator between the end-user and the database.

User (Presentation) Tier − End-users operate on this tier and they know nothing about any existence of the database beyond this layer. At this layer, multiple views of the database can be provided by the application. All views are generated by applications that reside in the application tier.

Data models define how the logical structure of a database is modelled. Data Models are fundamental entities to introduce abstraction in a DBMS. Data models define how data is connected to each other and how they are processed and stored inside the system.

Entity-Relationship Model

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among them. While formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general attributes and constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

Entities and their attributes.

Relationships among entities.

These concepts are explained below.

based on − • Entities and their attributes. • Relationships among entities. These concepts are explained

Entity

an ER Model is a real-world entity having properties

called attributes. Every attribute is defined by its set of values called domain. For example, in a school database, a student is considered as an entity. Student has various attributes like name, age, class, etc.

An

entity

in

Relationship − The logical association among entities is called relationship. Relationships are mapped with entities in various ways. Mapping cardinalities define the number of associations between two entities.

Mapping cardinalities −

o

one to one

o

one to many

o

many to one

o

many to many

Relational Model

The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-array relation.

logic and defines a table as an n-array relation . The main highlights of th is

The main highlights of this model are −

Data is stored in tables called relations.

Relations can be normalized.

In normalized relations, values saved are atomic values.

Each row in a relation contains a unique value.

Each column in a relation contains values from a same domain.

Database Schema

A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data.

A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database, which can be depicted by means of schema diagrams. It’s the database designers who design the schema to help programmers understand the database and make it useful.

A

programmers understand the database and make it useful. A database schema can be divided broadly into

database schema can be divided broadly into two categories −

Physical Database Schema − This schema pertains to the actual storage of data and its form of storage like files, indices, etc. It defines how the data will be stored in a secondary storage.

Logical Database Schema − This schema defines all the logical constraints that need to be applied on the data stored. It defines tables, views, and integrity constraints.

Database Instance

A database instance is a state of operational database with data at any given time. It

contains a snapshot of the database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a valid state, by diligently following all the validations, constraints, and conditions that the database designers have imposed.

Database schema is the skeleton of database. It is designed when the database doesn't

exist at all. Once the database is operational, it is very difficult to make any changes to it.

A database schema does not contain any data or information.

The ER model defines the conceptual view of a database. It works around real- world entities and the associations among them. At view level, the ER model is considered a good option for designing databases.

Entity

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise, a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.

Types of Attributes

Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a student's phone number is an atomic value of 10 digits.

Composite attribute − Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first name and last name.

Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average salary in a department should not be saved directly in the database, instead it can be derived. For another example, age can be derived from date of birth.

Single-value attribute − Single-value attributes contain single value. For example Social Security Number.

Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person can have more than one phone number, email address, etc.

These attribute types can come together in a way like −

simple single-valued attributes

simple multi-valued attributes

composite single-valued attributes

composite multi-valued attributes

Entity-Set and Keys

Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.

For example, the roll number of a student makes him/her identifiable among students.

Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity set.

Candidate Key − A minimal super key is called a candidate key. An entity set may have more than one candidate key.

Primary Key − A primary key is one of the candidate keys chosen by the database designer to uniquely identify the entity set.

Relationship

The association among entities is called a relationship. For example, an employee works at a department, a student enrols in a course. Here, Works at and Enrols are called relationships.

Relationship Set

A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes. These attributes are called descriptive attributes.

Degree of Relationship

The number of participating entities in a relationship defines the degree of the relationship.

Binary = degree 2

Ternary = degree 3

n-ary = degree

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of other set via relationship set.

One-to-one − One entity from entity set A can be associated with at most one entity of entity set B and vice versa.

with at most one entity of entity set B and vice versa. • One-to-many − One

One-to-many − One entity from entity set A can be associated with more than one entity of entity set B however an entity from entity set B, can be associated with at most one entity.

entity set B, can be associated with at most one entity. • Many-to-one − More than

Many-to-one − More than one entity from entity set A can be associated with at most one entity of entity set B, however an entity from entity set B can be associated with more than one entity from entity set A.

entity of entity set B, however an entity from entity set B can be associated with

Many-to-many − One entity from A can be associated with more than one entity from B and vice versa.

associated with more than one entity from B and vice versa. Entity Entities are represented by

Entity

Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.

Rectangles are named with the entity set they represent. Attributes Attributes are the properties of entities.

Attributes

Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse represents one attribute and is directly connected to its entity (rectangle).

and is directly connected to its entity (rectangle). If the attributes are composite , they are

If the attributes are composite, they are further divided in a tree like structure. Every node is then connected to its attribute. That is, composite attributes are represented by ellipses that are connected with an ellipse.

Multivalued attributes are depicted by double ellipse. Derived attributes are depicted by dashed ellipse.

Multivalued attributes are depicted by double ellipse.

Multivalued attributes are depicted by double ellipse. Derived attributes are depicted by dashed ellipse.

Derived attributes are depicted by dashed ellipse.

Multivalued attributes are depicted by double ellipse. Derived attributes are depicted by dashed ellipse.

A weak entity set is one which does not have any primary key associated with it.

one which does not have any primary key associated with it. Relationship Relationships are represented by

Relationship

Relationships are represented by diamond-shaped box. Name of the relationship is written inside the diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.

Binary Relationship and Cardinality

A relationship where two entities are participating is called a binary relationship. Cardinality is the number of instance of an entity from a relation that can be associated with the relation.

One-to-one − When only one instance of an entity is associated with the relationship, it is marked as '1:1'. The following image reflects that only one instance of each entity should be associated with the relationship. It depicts one-to-one relationship.

with the relationship. It depicts one-to-one relationship. • One-to-many − When more than one instance of

One-to-many − When more than one instance of an entity is associated with a relationship, it is marked as '1:N'. The following image reflects that only one instance of entity on the left and more than one instance of an entity on the right can be associated with the relationship. It depicts one-to-many relationship.

with the relationship. It depicts one-to-many relationship. • Many-to-one − When more than one instance of

Many-to-one − When more than one instance of entity is associated with the relationship, it is marked as 'N:1'. The following image reflects that more than one instance of an entity

on the left and only one instance of an entity on the right can be associated with the relationship. It depicts many-to-one relationship.

with the relationship. It depicts many-to-one relationship. • Many-to-many − The following image reflects that

Many-to-many − The following image reflects that more than one instance of an entity on the left and more than one instance of an entity on the right can be associated with the relationship. It depicts many-to-many relationship.

with the relationship. It depicts many-to-many relationship. Generalization As mentioned above, the process of

Generalization

As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all the generalized entities, is called generalization. In generalization, a number of entities are brought together into one generalized entity based on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all be generalized as Birds.

sparrow, crow and dove can all be generalized as Birds. Specialization Specialization is the opposite of

Specialization

Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-groups based on their characteristics. Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These properties are common in all persons, human beings. But in a company, persons can be identified as employee, employer, customer, or vendor, based on what role they play in the company.

Similarly, in a school database, persons can be specialized as teacher, student, or a staff,

Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role they play in school as entities.

Inheritance

We use all the above features of ER-Model in order to create classes of objects in object-oriented programming. The details of entities are generally hidden from the user; this process known as abstraction.

Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the attributes of higher-level entities.

entities to inherit the attributes of higher-level entities. For example, the attributes of a Person class

For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-level entities such as Student or Teacher.

Relational data model is the primary data model, which is used widely around the world for data storage and processing. This model is simple and it has all the properties and capabilities required to process data with storage efficiency.

Concepts

Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and columns, where rows represents records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is called a tuple.

Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation instances do not have duplicate tuples.

Relation schema − A relation schema describes the relation name (table name), attributes, and their names.

Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the relation (table) uniquely.

Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Constraints

Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity Constraints. There are three main integrity constraints −

Key constraints

Domain constraints

Referential integrity constraints

Key Constraints

There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of attributes is called keyfor that relation. If there are more than one such minimal subsets, these are called candidate keys.

Key constraints force that −

in

a relation

attributes.

with a key attribute, no two tuples can have identical values

a key attribute can not have NULL values.

for

key

Key constraints are also referred to as Entity Constraints.

Domain Constraints

Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For example,

age cannot be less than zero and telephone numbers cannot contain a digit outside

0-9.

Referential integrity Constraints

Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation that can be referred in other relation.

Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then that key element must exist.

SQL is a programming language for Relational Databases. It is designed over relational

algebra and tuple relational calculus. SQL comes as a package with all major distributions of RDBMS.

Data Definition Language

CREATE - Creates new databases, tables and views from RDBMS.

ALTER - Modifies database schema. ALTER performs

1) Add, drop, modify table columns 2) Add and drop constraints 3) Enable and Disable constraints

DROP - Drops commands, views, tables, and databases from RDBMS.

RENAME - Renames new databases, tables and views from RDBMS.

TRUNCATE - Removes all the data from the table but retains Structure.

DDL

Command

SYNTAX

EXAMPLE

CREATE

TABLE

CREATE TABLE <table_name>

(column_name1

datatype,

column_name2

datatype,

CREATE TABLE employee ( id number(5), name char(20), dept char(10), age number(2),

 

column_namen

salary number(10), location char(10) );

datatype

);

CREATE TABLE <table_name>

CREATE TABLE employee

SELECT

SELECT *

<Columns_list>

FROM emp;

FROM <table-name>

[WHERE condition]

 

ALTER TABLE table_name ADD column_name datatype;

 

ALTER TABLE employee ADD experience

number(3);

ALTER

ALTER TABLE table_name DROP column_name;

ALTER TABLE employee DROP location;

TABLE

ALTER TABLE table_name MODIFY column_name datatype;

ALTER TABLE employee MODIFY salary

number(15,2);

DROP

DROP TABLE

DROP TABLE employee

TABLE

table_name

RENAME

TABLE

RENAME old_table_name To

RENAME employee TO my_emloyee;

new_table_name

;
;

TRUNCATE

TRUNCATE TABLE table_name

 

TRUNCATE TABLE employee

TABLE

Data Manipulation Language

SQL is equipped with data manipulation language (DML). DML modifies the database instance by inserting, updating and deleting its data. DML is responsible for all forms data modification in a database. SQL contains the following set of commands in its DML section −

INSERT INTO/VALUES - The INSERT Statement is used to add new rows of data to a table.

UPDATE/SET/WHERE - The UPDATE Statement is used to modify the existing rows in a table.

DELETE FROM/WHERE - The DELETE Statement is used to delete rows from a table.

DML

 

SYNTAX

EXAMPLE

Command

 
 

INSERT INTO TABLE_NAME VALUES (value1, value2, value3, valueN);

INSERT INTO employee (id, name, dept, age, salary location) VALUES (105, 'ABC', 'MBA', 25,

33000);

INSERT

INSERT INTO employee VALUES (106, 'XYZ', 'CSE', 23, 30000);

INSERT INTO

INSERT INTO employee SELECT * FROM temp_employee;

table_name [(column1, column2, columnN)] SELECT column1,

column2,

columnN

 

FROM table_name [WHERE condition];

 

UPDATE table_name SET column_name1 =

UPDATE employee SET location ='Mysore' WHERE id = 101;

value1,

UPDATE

column_name2 =

 

value2,

UPDATE employee SET salary = salary + (salary * 0.2);

[WHERE condition]

DELETE

DELETE FROM table_name [WHERE condition];

DELETE FROM employee WHERE id = 100;

DELETE FROM

employee;
employee;

employee;

Data Retrieval Language

SELECT/FROM/WHERE -SQL SELECT statement is used to query or retrieve data from a table in the database. A query may retrieve information from specified columns or from all of the columns in the table.

DRL

SYNTAX

EXAMPLE

Command

 

SELECT column_list

SELECT * FROM employee;

FROM table-name

[WHERE Clause]

SELECT id,name,dept FROM employee WHERE dept='MBA';

[GROUP BY clause]

[HAVING clause]

SELECT

[ORDER BY clause];

SELECT dept, SUM (salary) FROM employee GROUP BY dept;

SELECT dept, SUM (salary) FROM employee GROUP BY dept HAVING SUM (salary) >

25000

SELECT name, salary FROM employee ORDER BY salary;

SQL AGGREGATE(GROUP) Functions: Group functions are built-in SQL functions

that operate on groups of rows and return one value for the entire group. These functions are: COUNT, MAX, MIN, AVG, SUM, DISTINCT

GROUP Function

Example

COUNT

SELECT COUNT (*) FROM employee

WHERE dept = 'MBA';

DISTINCT

SELECT DISTINCT dept FROM

employee;

MIN

SELECT MIN (salary) FROM employee;

MAX

SELECT MAX (salary) FROM employee;

SUM

SELECT SUM (salary) FROM employee;

AVG

SELECT AVG (salary) FROM employee;

Transaction Control Language

COMMIT

ROLLBACK

SQL Integrity Constraints

Integrity Constraints are used to apply business rules for the database tables.

Constraints can be defined in two ways 1) The constraints can be specified immediately after the column definition. This is called column-level definition. 2) The constraints can be specified after all the columns are defined. This is called table-level definition.

The constraints available in SQL are Primary Key, Foreign Key, Unique,

Not Null and Check.

SYNTAX

 

EXAMPLE

column name datatype [CONSTRAINT]

CREATE TABLE employee

 

(

id number(5) PRIMARY KEY,

 

name char(20) NOT NULL, dept char(10) REFERENCES depts(deptid),

age number(2), salary number(10) CHECK salary > 0,

location char(10) );

UNIQUE

 
 

CREATE TABLE depts

 

deptid number(5), dname char(20)

(

);

ALTER TABLE depts ADD CONSTRAINT dept_id_pk PRIMARY KEY (deptid)

SQL JOINS

A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. Different types of Joins are:

SELF JOIN

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL JOIN

SELF JOIN: Self join is nothing but joining the table by itself.

Syntax:

SELECT Column_name1,Column_name2….Column_name ‘n’

FROM Table A,Table B

WHERE A.Column_name = B.Column_name;

Example:

SELECT E.Employee_id,E.Name as ‘Employee Name’,F.Name as ‘Manager Name’

FROM Employee E,Employee F

WHERE E.Emp_jd=F.Mgr_id;

INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as the condition satisfies. This keyword will create the result-set by combining all rows from both the tables where the condition satisfies i.e value of the common field will be same.

LEFT JOIN: This join returns all the rows of the table on the left side of the join and matching rows for the table on the right side of join. The rows for which there is no matching row on right side, the result-set will contain null. LEFT JOIN is also known as LEFT OUTER JOIN

RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and matching rows for the table on the left side of join. The rows for which there is no matching row on left side, the result-set will contain null. RIGHT JOIN is also known as RIGHT OUTER JOIN.

FULL JOIN: FULL JOIN creates the result-set by combining result of both LEFT JOIN and RIGHT JOIN. The result-set will contain all the rows from both the tables. The rows for which there is no matching, the result-set will contain NULL values.

SYNTAX

FIGURE

SELECT table1.column1,

SELECT table1.column1,

table1.column2,

table2.column1,

FROM table1

INNER JOIN table2

ON table1.matching_column =

table2.matching_column;

SELECT table1.column1,

SELECT table1.column1,

table1.column2,

table2.column1,

FROM table1

LEFT JOIN table2

ON table1.matching_column =

table2.matching_column;

SELECT table1.column1,

SELECT table1.column1,

table1.column2,

table2.column1,

FROM table1

RIGHT JOIN table2

ON table1.matching_column =

table2.matching_column;

SELECT table1.column1,

SELECT table1.column1,

table1.column2,

table2.column1,

FROM table1

FULL JOIN table2

ON table1.matching_column =

table2.matching_column;

Example of JOINS:

JOIN

EXAMPLE

SELF JOIN

SELECT E.Employee_id,E.Name as ‘Employee Name’,

F.Name as ‘Manager Name’

FROM Employee E,Employee F

WHERE E.Emp_jd=F.Mgr_id;

INNER JOIN

Type 1 : Using Where Clause

SELECT E.Employee_name,D.Department_name

FROM Employee E,Department D

WHERE E.Emp_no=D.Emp_no;

Type 2 : Using Join Clause

SELECT E.Employee_name,D.Department_name

FROM Employee E JOIN Department D

Join Clause

ON E.Emp_no=D.Emp_no;

On Condition

Type 3: Using Natural Join

SELECT E.Employee_name,D.Department_name,E.Salary

FROM Employee E NATURAL JOIN Department D;

natural join clause

Type 4:

SELECT E.Employee_name,D.Department_name,E.Salary

FROM Employee E Join Department D

join

USING(Emp_no);

using condition

LEFT JOIN

SELECT a.Employee_name,b.Department_Name

(or)

FROM Employee a LEFT JOIN Department b

ON a.Department_ID=b.Department_ID;

LEFT OUTER

(or)

JOIN

SELECT a.Employee_name,b.Department_Name

FROM Employee a, Department b

ON a.Department_ID=b.Department_ID(+);

RIGHT JOIN

SELECT a.Employee_name,b.Department_Name

(or)

FROM Employee a RIGHT JOIN Department b

ON a.Department_ID=b.Department_ID;

RIGHT

(or)

OUTER JOIN

 

SELECT a.Employee_name,b.Department_Name

FROM Employee a, Department b

ON a.Department_ID(+)=b.Department_ID;

FULL JOIN

SELECT a.Employee_name,b.Department_Name

(or)

FROM Employee a FULL JOIN Department b

ON a.Department_ID=b.Department_ID;

FULL OUTER

JOIN

STORED PROCEDURES:

A stored procedure is a set of Structured Query Language (SQL) statements with an assigned name, which are stored in a relational database management system as a group, so it can be reused and shared by multiple programs.

Stored procedures can access or modify data in a database. A stored procedure in Oracle follows the basic PL/SQL block structure, which consists of declarative, executable and exception-handling parts. Declarative Part is an optional part which contains declarations of

types, cursors, constants, variables, exceptions, and nested subprograms. Executable

Part is a mandatory part and contains statements that perform the designated action. Exception-handling is an optional part that contains the code that handles run-time errors.

SYNTAX:

CREATE [OR REPLACE] PROCEDURE procedure_name

[(parameter_name [IN | OUT | IN OUT] type [,

{IS | AS}

BEGIN

< procedure_body >

END procedure_name;

Where,

])]

procedure-name specifies the name of the procedure.

[OR REPLACE] option allows the modification of an existing procedure.

The

the

parameters. IN represents the value that will be passed from outside and OUT represents the parameter that will be used to return a value outside of the procedure.

optional

parameter

list

contains

name,

mode

and

types

of

procedure-body contains the executable part.

The AS

keyword is

procedure.

used instead of the

Example of Stored Procedure:

IS

CREATE OR REPLACE PROCEDURE greetings

AS
AS
BEGIN
BEGIN

dbms_output.put_line('Hello World!');

END;
END;
/
/

We execute the procedure as

SQL:/>EXECUTE greetings;

Output we get as :

Hello World

PL/SQL procedure successfully completed.

NORMALIZATION:

keyword for creating a standalone

Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data.

It divides larger tables to smaller tables and links them using relationships.

Database Normal Forms

and links them using relationships . Database Normal Forms 1NF (First Normal Form) Rules • Each

1NF (First Normal Form) Rules

Each table cell should contain a single value.

Each record needs to be unique.

Before we proceed for 2NF we need to know about Primary key, Non-primary attribute, candidate key and Functional dependency.

A KEY is a value used to identify a record in a table uniquely. A KEY could be a single column or combination of multiple columns. Columns in a table that are NOT used to identify a record uniquely are called non-key columns.

A Primary Key is a single column value used to identify a database record uniquely. It has

following attributes-

A primary key cannot be NULL

A primary key value must be unique

The primary key values should rarely be changed

The primary key must be given a value when a new record is inserted.

A Composite Key is a primary key composed of multiple columns used to identify a record

uniquely.

Functional Dependency:

Functional dependency (FD) is a set of constraints between two attributes in a relation.

An,

then those two tuples must have to have same values for attributes B1, B2,

Functional dependency says that if two tuples have same values for attributes A1, A2,

,

, Bn.

Functional dependency is represented by an arrow sign () that is, XY, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

A transitive functional dependency is when changing a non-key column, might cause any of

the other non-key columns to change.

2NF (Second Normal Form) Rules

Rule 1- Be in 1NF

Rule 2- Single Column Primary Key i.e. every non-prime attribute should be fully functionally dependent on prime key attribute.

3NF (Third Normal Form) Rules

Rule 1- Be in 2NF

Rule 2- Has no transitive functional dependencies- Transitive functional dependency of non-prime attribute on any super key should be removed.

A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

X is a super key of table

Y is a prime attribute of table

Boyce- Codd Normal Form (BCNF)

Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has

more than one Candidate Key. Sometimes is BCNF is also referred as 3.5 Normal Form.

4NF (Fourth Normal Form) Rules

If no database table instance contains two or more, independent and multi-valued data describing the relevant entity, then it is in 4th Normal Form.

5NF (Fifth Normal Form) Rules

A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.

6NF (Sixth Normal Form) Proposed

6th Normal Form is not yet standardized.

Eg: of Normal Forms

Example of 1NF:

Suppose a company wants to store the names and contact details of its employees. It creates a table that looks like this:

emp_id

emp_name

emp_address

emp_mobile

101

Herschel

New Delhi

8912312390

     

8812121212

102

Jon

Kanpur

9900012222

103

Ron

Chennai

7778881212

     

9990000123

104

Lester

Bangalore

8123450987

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id

emp_name

emp_address

emp_mobile

101

Herschel

New Delhi

8912312390

102

Jon

Kanpur

8812121212

102

Jon

Kanpur

9900012222

103

Ron

Chennai

7778881212

104

Lester

Bangalore

9990000123

104

Lester

Bangalore

8123450987

Example of 2NF:

Suppose a school wants to store the data of teachers and the subjects they teach. They create a table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

teacher_id

subject

teacher_age

111

Maths

38

111

Physics

38

222

Biology

38

333

Physics

40

333

Chemistry

40

Candidate Keys: {teacher_id, subject} Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:

teacher_details table:

teacher_id

teacher_age

111

38

222

38

333

40

teacher_subject table:

teacher_id

subject

111

Maths

111

Physics

222

Biology

333

Physics

333

Chemistry

Now the tables comply with Second normal form (2NF).

Example of 3NF:

Suppose a company wants to store the complete address of each employee, they create a table named employee_details that looks like this:

emp_id

emp_name

emp_zip

emp_state

emp_city

emp_district

1001

John

282005

UP

Agra

Dayal Bagh

1002

Ajeet

222008

TN

Chennai

M-City

1006

Lora

282007

TN

Chennai

Urrapakkam

1101

Lilly

292008

UK

Pauri

Bhagwan

1201

Steve

222999

MP

Gwalior

Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…etc Candidate Keys: {emp_id} Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:

employee table:

emp_id

emp_name

emp_zip

1001

John

282005

1002

Ajeet

222008

1006

Lora

282007

1101

Lilly

292008

1201

Steve

222999

employee_zip table:

 

emp_zip

emp_state

emp_city

emp_district

282005

UP

Agra

Dayal Bagh

222008

TN

Chennai

M-City

282007

TN

Chennai

Urrapakkam

292008

UK

Pauri

Bhagwan

222999

MP

Gwalior

Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one department. They store the data like this:

emp_id

emp_nationality

emp_dept

dept_type

dept_no_

of_emp

1001

Austrian

Production and planning

D001

200

1001

Austrian

stores

D001

250

1002

American

design and technical support

D134

100

1002

American

Purchasing department

D134

600

Functional dependencies in the table above:

emp_id -> emp_nationality emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To make the table comply with BCNF we can break the table in three tables as:

emp_nationality table:

emp_id

emp_nationality

1001

Austrian

1002

American

emp_dept table:

emp_dept

dept_type

dept_no_of_emp

Production and planning

D001

200

stores

D001

250

design and technical support

D134

100

Purchasing department

D134

600

emp_dept_mapping table:

emp_id

emp_dept

1001

Production and planning

1001

stores

1002

design and technical support

1002

Purchasing department

Functional dependencies:

emp_id -> emp_nationality emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:

For first table: emp_id For second table: emp_dept For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

TRANSACTION MANAGEMENT

A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

ACID Properties

A transaction is a very small unit of a program and it may contain several lowlevel

must

maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

tasks. A

transaction

in

a

database

system

Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in a database where a transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.

Consistency − The database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. If the database was in a consistent state before the execution of a transaction, it must remain consistent after the execution of the transaction as well.

Isolation − In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

Durability − The database should be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.

States of Transactions

A transaction in a database can be in one of the following states −

springs back into action. States of Transactions A transaction in a database can be in one

Active − In this state, the transaction is being executed. This is the initial state of every transaction.

Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.

Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A failed transaction can no longer proceed further.

Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back all its write operations on the database to bring the database back to its original state where it was prior to the execution of the transaction. Transactions in this state are called aborted. The database recovery module can select one of the two operations after a transaction aborts −

o

Re-start the transaction

o

Kill the transaction

Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system.

Deadlock :

In a multi-process system, deadlock is an unwanted situation that arises in a shared resource environment, where a process indefinitely waits for a resource that is held by another process.

For example, assume a set of transactions {T0, T1, T2,

complete its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is waiting for resource Z, which is held by T0. Thus, all the processes wait

for each other to release resources. In this situation, none of the processes can finish their task. This situation is known as a deadlock.

T0 needs a resource X to

,Tn}.

Two-Phase Locking 2PL

This locking protocol divides the execution phase of a transaction into three parts. In the first part, when the transaction starts executing, it seeks permission for the locks it requires. The second part is where the transaction acquires all the locks. As soon as the transaction releases its first lock, the third phase starts. In this phase, the transaction cannot demand any new locks; it only releases the acquired locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired

Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and the second phase is shrinking, where the locks held by the transaction are being released.

To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an exclusive lock.

Data Warehousing - Overview

A data warehouses provides us generalized and consolidated data in multidimensional view. Along with generalized and consolidated view of data, a data warehouses also provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining.

A data warehouse is a database, which is kept separate from the organization's operational database.

There is no frequent updating done in a data warehouse.

It possesses consolidated historical data, which helps the organization to analyze its business.

A data warehouse helps executives to organize, understand, and use their data to take strategic decisions.

Data warehouse systems help in the integration of diversity of application systems.

A data warehouse system helps in consolidated historical data analysis.

Metadata

Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata. For example, the index of a book serves as a

metadata for the contents in the book. In other words, we can say that metadata is the summarized data that leads us to the detailed data.

In terms of data warehouse, we can define metadata as following −

Metadata is a road-map to data warehouse.

Metadata in data warehouse defines the warehouse objects.

Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.

Data Warehouse Features

The key features of a data warehouse are discussed below −

Subject Oriented − A data warehouse is subject oriented because it provides information around a subject rather than the organization's ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.

Integrated − A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.

Time Variant − The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.

Non-volatile − Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.

Data Warehouse Applications

Data warehouse helps business executives to organize, analyze, and use their data for decision making. Data warehouses are widely used in the following fields −

Financial services

Banking services

Consumer goods

Retail sectors

Controlled manufacturing

Types of Data Warehouse

Information processing, analytical processing, and data mining are the three types of data warehouse applications.

Information Processing − A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.

Analytical Processing − A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.

Data Mining − Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.

Functions of Data Warehouse Tools and Utilities

The following are the functions of data warehouse tools and utilities −

Data Extraction − Involves gathering data from multiple heterogeneous sources.

Data Cleaning − Involves finding and correcting the errors in data.

Data Transformation − Involves converting the data from legacy format to warehouse format.

Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.

Refreshing − Involves updating from data sources to warehouse.