Вы находитесь на странице: 1из 52

Introduction to

Databases
Name : Akanksha Sharma
Database Concepts
Data ?
Data refers to a collection of natural phenomena descriptors, including the
results of experience, observation or experiment, or a set of premises. This may
consist of numbers, words, or images, particularly as measurements or
observations of a set of variables.
Information ?
Information is a quality of a message from a sender to one or more receivers.
Information is always about something like size of a parameter, occurrence of
an event, etc. Information does not have to be accurate. It may be a truth or a
lie, or just the sound of a falling tree.
Knowledge?
Knowledge is used to mean the confident understanding of a subject with the
ability to use it for a specific purpose if appropriate .
DBMS
DBMS is a set of software programs that control the organization,
storage, management, and retrieval of data in a database.

DBMS includes:
A modeling language to define the schema of each database
hosted in the DBMS, according to the DBMS data model.
Data structures (fields, records, files and objects) optimized to deal
with very large amounts of data stored on a permanent data storage
device.
A database query language and report writer to allow users to
interactively interrogate the database, analyze its data and update it
according to the users privileges on data.
A transaction mechanism, that would guarantee the ACID
properties, in order to ensure data integrity, despite concurrent user
accesses (concurrency control), and faults (fault tolerance).
Typical Database Applications
Traditional (Employee, student, product database)
Online Shopping
Search Engines
Data Warehousing (OLAP)
Data Mining
Geographical Information Systems

Data-Level Models
Flat File Structure
A database with a single table is called a flat file structure. A flat-file structure is good only
for extremely simple databases and not practical for most business applications. Many
spreadsheets include some database features like sorting entries and counting or
summarizing entries that meet certain criteria.
Hierarchical Data Model
The hierarchical data model is set up like a "forest" or collection of tree structures. The
hierarchical data model is a special case of the network data model. This data model is
very efficient for certain kinds of applications where the data to be modeled is also like a
tree. The best-known hierarchical database management system is IBM's IMS.
Network Data Model
The network data model is similar to the entity-relationship model with all relationships
restricted to be binary, many-one relationships. This restriction allows a simple directed
graph model to be used. The network data model is fast, but it is difficult to conceptualize
complex data structures using this model. An example of a network database management
system is IDMS.
Relational Data Model
Relational model is based on predicate logic and set theory. You have sets of statements
of fact, and the underlying system can determine new sets of facts .The real power comes
from your complete control over determining new facts. All relationships between facts are
explicit in the database, and the command language can use and manipulate them. The
mathematics behind the model make this manipulation feasible.
RDBMS
Vendor Global
Revenue
Oracle 7,312
IBM 3,483
Microsoft 3,052
Sybase 524
NCR
Teradata
457
Relation: Two dimensional table
The relation itself corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains
values for a fixed number of attributes. Relations are
sometimes referred to as flat files, because of their
resemblance to an unstructured sequence of records. Each
tuple in a relation must be unique -- that is, there can be no
duplicates.

Attribute: Table column
Other commonly used terms for attribute are 'property' and
'field.' The set of permissible values for each attribute is called
the domain for that attribute.

Tuple: Table row
A tuple is an instance of an entity or relationship or whatever
is represented by the relation.

Key: A single attribute or combination of attributes whose
values uniquely identify the tuples of the relation. That is,
each row has a different value for the key attribute(s). The
relational model requires that every relation have a key and
that for any tuple in the relation, the key fields have non-null
values -- no two tuples may have the same key value and
every tuple must have a value for the key attribute.
Case Study : Oracle
Case Study : Oracle
Oracle Database Fundamentals
Oracle stores each data item in its own field . In Oracle, the fields relating to a particular person, thing, or event are
bundled together to form a single, complete unit of data, called a record . Each record is made up of a number of
fields. No two fields in a record can have the same field name. Oracle stores records relating to each other in a
table. A table consists of a number of records . Each field occupies one column and each record occupies one row
. Different tables are created for the various groups of information. Every table in Oracle has a field or a
combination of fields that uniquely identifies each record in the table. When a field in one table matches the
primary key of another table, the field is referred to as a foreign key. When a foreign key exists in a table, the
foreign key's table is sometimes referred to as a lookup table .

Creating Database Tables
- create table tablename (columnname type, columnname type ...);
- describe department;
- alter table employee add ("Joining Date" date);
- alter table employee modify (Phone number);
- alter table tablename drop column columnname;

Inserting Data
- insert into tablename (columnname, columnname, ...) values (somevalue, somevalue, ...);

Selecting Data
- select columnname, columnname... from tablename;
- select "First Name"||' '||'Last Name' 'Full Name' from employee where deptid=1 and salary>5000;

Updating Data
- update tablename set columnname=somevalue where conditions;
- delete from tablename where conditions;
- drop table tablename;
Case Study : Microsoft
RDBMS Concepts
Using Oracle PL/SQL
Basic Structure of PL/SQL
Variables and Types
Simple PL/SQL Programs
Control Flow in PL/SQL
Basic Structure
PL/SQL stands for Procedural Language/SQL. It extends SQL by adding constructs found
in procedural languages, resulting in a structural language that is more powerful than SQL.
The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which
can be nested within each other and each block performs a logical action in he program. A
block has the following structure:

DECLARE
/* Declarative section: variables, types, and local subprograms. */
BEGIN
/* Executable section: procedural and SQL statements go here. */
/* This is the only section of the block that is required. */
EXCEPTION
/* Exception handling section: error handling statements go here. */
END;

Only the executable section is required. The other sections are optional.
The only SQL statements allowed in a PL/SQL program are SELECT, INSERT, UPDATE,
DELETE and several other data manipulation statements plus some transaction control.
Variables
The DECLARE section defines and (optionally) initialises
variables. If not initialised specifically they default to NULL.

DECLARE
number1 NUMBER(2);
number2 NUMBER(2) := 17;
text VARCHAR2(12) := 'Hello world';
text DATE := SYSDATE; -- current date and time
BEGIN
SELECT street_number
INTO number1
FROM address
WHERE name = 'Smith';
END;

Symbol := is the assignment operator to store a value in a variable.
The major datatypes in PL/SQL include NUMBER, INTEGER,
CHAR, VARCHAR2, DATE, TIMESTAMP, TEXT etc.
Simple Program in PL/SQL
CREATE TABLE T1(
e INTEGER,
f INTEGER
);

DELETE FROM T1;

INSERT INTO T1 VALUES(1, 3);

INSERT INTO T1 VALUES(2, 4);

/* Above is plain SQL; below is the PL/SQL program. */

DECLARE
a NUMBER;
b NUMBER;
BEGIN
SELECT e,f INTO a,b FROM T1 WHERE e>1;
INSERT INTO T1 VALUES(b,a);
END;
.
run;
Control Flow in PL/SQL
An IF statement looks like:

IF <condition> THEN <statement_list> ELSE <statement_list> END
IF;

The ELSE part is optional. If you want a multiway branch, use:
IF <condition_1> THEN ...
ELSIF <condition_2> THEN ...
... ...
ELSIF <condition_n> THEN ...
ELSE ...
END IF;

Control Flow in PL/SQL
Loops are created with the following:

LOOP

<loop_body> /* A list of statements. */

END LOOP;

At least one of the statements in <loop_body> should be an EXIT
statement of the form
EXIT WHEN <condition>;
The loop breaks if <condition> is true.
Examples
CONDITIONAL :

DECLARE
i NUMBER := 1;
BEGIN
LOOP
INSERT INTO T1 VALUES(i,i);
i := i+1;
EXIT WHEN i>100;
END LOOP;
END;
.
run;
LOOPING :

DECLARE
a NUMBER;
b NUMBER;
BEGIN
SELECT e,f INTO a,b FROM T1
WHERE e>1;
IF b=1 THEN
INSERT INTO T1 VALUES(b,a);
ELSE
INSERT INTO T1
VALUES(b+10,a+10);
END IF;
END;
.
run;
Joins
CROSS JOIN (Cartesian product) is the simplest join;
INNER JOIN (sometimes called the "EQUI-JOIN")
where tables are combined based on a common
column;
OUTER JOIN which involves combining all rows of
one table with only matching rows from the other
table;
SELF JOIN which is a table joined to itself.
Cross Join
A cross join returns the cartesian product of the sets of records
from the two joined tables. If A and B are two sets, then cross
join = A B.

Examples :

Explicit
SELECT *
FROM employee CROSS JOIN department

Implicit
SELECT *
FROM employee, department;


Inner Joins
An equi-join, also known as an equijoin, is a specific
type of comparator-based join, or theta join, that
uses only equality comparisons in the join-predicate.
Using other comparison operators (such as <)
disqualifies a join as an equi-join.
Example
SELECT *
FROM employee
INNER JOIN department
ON employee.DepartmentID =
department.DepartmentID
Natural join


Outer Joins
Left outer join
The result of a left outer join for tables A and B always contains
all records of the "left" table (A), even if the join-condition does
not find any matching record in the "right" table (B). This means
that a left outer join returns all the values from the left table, plus
matched values from the right table (or NULL in case of no
matching join predicate).

Example

SELECT *
FROM employee LEFT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID

Outer Joins
Right outer join
Every record from the "right" table (B) will appear in the joined
table at least once. If no matching row from the "left" table (A)
exists, NULL will appear in columns from A for those records that
have no match in A. A right outer join returns all the values from
the right table and matched values from the left table (NULL in
case of no matching join predicate).

Example

SELECT *
FROM employee RIGHT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID

Self Join
A self-join is simply a
normal SQL join that joins
one table to itself. This is
accomplished by using
table name aliases to give
each "instance" of the table
a separate name.


Example
SELECT E1.EmployeeName AS Employee,
E2.EmployeeName AS Manager
FROM Employees AS E1 INNER JOIN Employees AS E2
ON E1.ManagerID = E2.EmployeeID

Employees
EmployeeID EmployeeName ManagerID
61 Sue Smith (null)
62 David Jones 61
63 Troy Parker 61
64 Claire Smith-Jones 63
65 Grover Rivers 63
Normalization
Introduction

Entity: The word entity is the general name for the information that is to be stored within a
single table. Information about the entities is known as attributes.

Primary key: A primary key uniquely identifies a row of data found within a table. When
multiple attributes are used to derive a primary key, this key is known as a concatenated
primary key.

Relationship:
one-to-one (1:1) - A one-to-one relationship signifies that each instance of a given entity
relates to exactly one instance of another entity.
one-to-many (1:M) - A one-to-many relationship signifies that each instance of a given
entity relates to one or more instances of another entity.
many-to-many (M:N) - A many-to-many relationship signifies that many instances of a
given entity relate to many instances of another entity.

Foreign key: A foreign key forms the basis of a 1:M relationship between two tables. The
foreign key can be found within the M table, and maps to the primary key found in the 1
table.
The Three Normal Forms
First Normal Form
A table is in first normal form (1NF) if there are no repeating
groups.

How to Normalize ?
- Remove the repeating group of attributes to form a new entity
- Add to it the original key
The Three Normal Forms
Second Normal Form
A table is in Second Normal Form(2NF) if it is in 1NF and each
non-key field is functionally dependent on the entire primary key.

How to Normalize ?
- Examine tables with a composite key (a key made up of two
parts)
- For each non-key attribute, determine if its key is the first part,
or the second part, or if neither then the answer is both parts
- Remove the partial key and its dependents to form a new table
The Three Normal Forms
Third Normal Form
A table is in Third Normal Form(3NF) if it is in 2NF and there are
no transitive dependencies.

How to Normalize ?
- Identify any dependencies between non-key attributes within
each table
- Remove them to form a new table
- Promote one of the attributes to be the key of the new table
- This becomes the Foreign Key link in the original table (shown
with a *).
Example
Department: ( DepartmentName, SupervisorNumber ) SupervisorNumber is a
foreign key
Supervisor: ( SupervisorNumber, SupervisorName )
EmployeeDepartment: ( DepartmentName, EmployeeNumber, StartDate )
Employee: ( EmployeeNumber, EmployeeName )
EmployeeProject: ( EmployeeNumber, ProjectNumber, StartDate )
Project: ( ProjectNumber, ProjectName )

To check whether these tables are in NF you must answer the following questions
1. Does the table contain any repeating groups?
If not, and the table has a primary key then it is First normal form (1NF)
2. Does the table contain any partial dependencies?
If not, and it is in 1NF then it is in 2NF
3. Does the table contain any transitive dependencies or derived attributes?
If not, and it is in 2NF then it is in 3NF
Indexing
Types of Single-level Ordered Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
Multilevel Indexes
Types of Single-Level Indexes
Primary Index

Defined on an ordered data file

The data file is ordered on a key field

Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor

A similar scheme can use the last record in a block.

A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.

Primary
index on
the
ordering
key field
Types of Single-Level Indexes
Clustering Index

Defined on an ordered data file

The data file is ordered on a non-key field unlike primary
index, which requires that the ordering field of the data file
have a distinct value for each record.

Includes one index entry for each distinct value of the field;
the index entry points to the first data block that contains
records with that field value.

It is another example of nondense index where Insertion and
Deletion is relatively straightforward with a clustering index.
A clustering index on the
DEPTNUMBER ordering
nonkey field of an
EMPLOYEE file.
Clustering index
with a separate
block cluster for
each group of
records that
share the same
value for the
clustering field.
Types of Single-Level Indexes
Secondary Index
A secondary index provides a secondary means of accessing a
file for which some primary access already exists.
The secondary index may be on a field which is a candidate key
and has a unique value in every record, or a nonkey with
duplicate values.
The index is an ordered file with two fields.
The first field is of the same data type as some
nonordering field of the data file that is an indexing field.
The second field is either a block pointer or a record
pointer. There can be many secondary indexes (and
hence, indexing fields) for the same file.
Includes one entry for each record in the data file; hence, it is a
dense index

A dense
secondary
index (with
block pointers)
on a
nonordering
key field of a
file.
A secondary index (with recored pointers) on a nonkey field
implemented using one level of indirection so that index entries are
of fixed length and have unique field values.
Multi-Level Indexes
Because a single-level index is an ordered file, we can
create a primary index to the index itself ; in this case, the
original index file is called the first-level index and the
index to the index is called the second-level index.
We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
A multi-level index can be created for any type of first-level
index (primary, secondary, clustering) as long as the first-
level index consists of more than one disk block
A two-level
primary index
resembling
ISAM
(Indexed
Sequential
Access
Method)
organization.
Data Warehousing &
Business Intelligence
Introduction
A Data Warehouse (DW) is a
subject-oriented,
integrated,
nonvolatile,
time-variant
collection of data in support of management's decisions.

Business intelligence (BI) BI systems provide managers with
-Actionable information and knowledge
-At the right time
-At the right location
-In the right form

The knowledge derived from analyzing an organizations information
Technologies for gathering, storing, analyzing and providing access to data
to help enterprise users make better business decisions


Quotes
Orders
Inventory Leads
Operational
Data
Warehouse
Customers Products
Regions
Time
Focus is on Subject Areas rather than Applications
Characteristics of a DW
On-Line Transaction
Processing (OLTP)
Database management systems are typically used for on-line
transaction processing.
OLTP applications normally automate clerical data processing
tasks of an organization, like data entry and enquiry, transaction
handling, etc. (access, read, update)
Database is current, and consistency and recoverability are
critical. Records are accessed one at a time.
OLTP Operations:
- are structured and repetitive
- require detailed and up-to-date data
- are short, atomic and isolated transactions
On-Line Analytical
Processing (OLAP)
On-line analytical processing is essential for decision support.
OLAP is supported by data warehouses.
Data warehouse consolidation of operational databases.
Owing to the hierarchical nature of the dimensions, OLAP
operations view the data flexibly from different perspectives
(different levels of abstractions).
OLAP operations:
- roll-up: Increase the level of abstraction
- Drill-down: Decrease the level of abstraction
- Slice and dice: Selection and projection
- Pivot: Re-orient the multi-dimensional view
- Drill-through: Links to the raw data
DW - Benefits
Increase customer profitability
Cost effective decision making
Manage customer and business partner relationships
Manage risk, assets and liabilities
Integrate inventory, operations and manufacturing
Reduction in time to locate, access, and analyze information
(Link multiple locations and geographies)
Identify developing trends and reduce time to market
Strategic advantage over competitors
Warehouse Architecture
Operational
Systems/Data
Select
Extract
Transform
Integrate
Maintain
Data
Preparation
Middleware/API
Data
Warehouse
Metadata
EIS /DSS
Query Tools
OLAP/ROLAP
Web Browsers
Enterprise Data Warehouse
Data
Cleansing
Tools
Source
Databases
Central
Metadata
ETL Tool
Data
Modeling
Tool
Data Access and
Analysis Tools
-Managed Query
-Desktop OLAP
-ROLAP
-MOLAP
- Data Mining

Central
Warehouse
(RDBMS)
Warehouse
Admin Tool
Local meta
data
RDBMS
ROLAP
Engine
Architected
Datamarts
Warehouse Databases
MDDB
DW Architecture Components
Data Warehouse Is Not Just About Data... But Tools Too
DW/BI Tools
ETL Tools
Extract, Transform, and Load (ETL) is a process in data
warehousing that involves -
1. Extracting data from outside sources,
2. Transforming it to fit business needs (which can include
quality levels), and ultimately
3. Loading it into the end target, i.e. the data warehouse.

DW are typically fed asynchronously by a variety of sources
which all serve a different purpose, resulting in e.g. different
reference data. ETL is a key process to bring heterogeneous and
asynchronous source extracts to a homogeneous environment.
Typically the known ETL tools are intended to use in batch mode,
pulling large volumes from different platforms and systems at
schedule times and transforming and integrating the data until it
fits the format to be loaded into a (corporate) multi-dimensional
data warehouse.

DW/BI Tools
Data mart - a subset of the data warehouse in which only a focused
portion of the data warehouse information is kept
Other technical components of business intelligence include tools
such as
Data mining
Automatic exception detection with proactive alerting and
automatic recipient determination
Automatic learning
Data-mining tool - a software tool you use to query information in a
data warehouse

DW/BI Tools
BO-Designer
It lets you to create the semantic layer i.e. UNIVERSE which isolates end
users from the technical issues of the database structure
To create, manage and distribute universes for a particular group of BO and
WebIntelligence users.

The Building Blocks of a Designer are -
Classes: Logical grouping of objects

Objects: Most refined component of the Universe. An Object maps to Data
or a derivation of data in the database
Dimension: Parameters for the analysis (Ex. City)
Detail: Description of a dimension (Ex. Phone #)
Measure: Numeric information by which Dimension object can be
measured (Ex. Sales Revenue)

ODS Development Case Study
Oracle Apps
Seibel
Excel Files
Flat Files
ODS
Teradata
Intermediate
Tables
EDW
Source Systems
ETL Process Target DW
End-to-End Process Diagram
Thanks

Вам также может понравиться