Вы находитесь на странице: 1из 20

What is DBMS?

Database Management System is a set of computer programs that controls the creation,
maintenance, and the use of a database.

What is a Schema?
A description of data in terms of data model is called a schema. In the relational model,
the schema for a relation specifies its name, the name of each field (or attribute or
column), and the type of each filed. Example for student information in a university
database may be stored in a relation with the following schema:

Student (Sid: string, name: string, login: string, age: integer, gpa: real)

What is DDL?
A data definition language (DDL) is used to define the external and conceptual schemas.

What is a Database?
A database is a collection of data.
Data in the database:
is integrated
Can be shared
Can be concurrently accessed
The database systems are designed to:

• Define structures for the storage of data


• Provide mechanisms for the manipulation of data
• Ensure the safety of the data stored, despite system crashes or attempts at
unauthorized access
• Share data among the different users

In short, database systems are designed to manage large volumes of data.

The first general-purpose DBMS, designed by Charles Bachman at General


Electric in the early 1960s, was called the Integrated Data store. In the late 1960s, IBM
developed the information Management System (IMS) DBMS.

File System Interface versus DBMS Interface


In the traditional file approach, data is stored in flat files which are maintained by
the file system, under the operating systems control. The end users use the application
programs to perform specific tasks. All application programs go through the file system
to access the data stored in these flat files.

In the DBMS approach, all requests to use the data stored in the database are
handled by the DBMS. The end user can use either the application programs or the
standard SQL to access the data.

Flat Files: A flat file is a file containing records that has no structured interrelationship.
Files used in programming fundamentals projects were essentially flat files.

SQL: (Structured Query Language). A language used by relational databases to query,


update and manage data.

The data in the database can be shared. Sharing means individual pieces of data in the
database can be shared among different users.

Points to Remember:
Disadvantages of the traditional file approach:

• Data Security – Data easily accessible by all and therefore not secure
• Data Redundancy – Same data is duplicated in two or more files which may lead
to update anomalies
• Data Isolation – All the related data is not available in one file. Thus writing a
new application program is difficult
• Program / Data Dependence – Application programs are data dependent. It is
impossible to change the physical representation (how the data is physically
represented in storage) or access technique (how it is physically accessed) without
affecting the application.
• Lack of Flexibility – Only pre-determined request for information can be met. It
is not flexible to satisfy unanticipated queries.
• Concurrent Access Anomalies – Same piece of data is allowed to be updated
simultaneously which leads to inconsistencies.

DBMS ensures the following


• Application programs and queries are data-independent. They do not depend on
any one particular physical representation of data in secondary storage of access
technique
• Allows for sharing of data among different users. Users are also able to access the
database concurrently without facing the issues of inconsistent data.
• Controls redundancy and inconsistency
• Provides secure access to that database
• Enforces integrity constraints (also known as business rules) by preventing the
entity of invalid information into the database.
• Enables backup and recovery from system crashes.

Queries: - A query is essentially a request that a user makes on the database.

Integrity Constraints: A set of rules to ensure the correctness and accuracy of data.

Types of Databases
There are two generic database architectures: centralized and distributed.

Centralized: All data is located at a single site. Allows for greater control over accessing
and updating data

Distributed: The database is stored on several computers from personal computers up to


mainframe systems. Computers in a distributed system communicate with one another
through various communication media such as high speed networks or telephone lines.
Distributed databases are geographically separated and managed.

DBMS Architecture
Most commercial databases are based on the three-level architecture model called
the ANSI/SPARC (American National Standards Institute/Standard Planning and
Requirements Committee) model.
Database architecture is in there levels. Those are

1. External/View Level
2. Conceptual Level
3. Internal Level

The overall design of the database is called database sche


schema.
ma. Schemas are not changed
frequently. In general, database systems support one internal schema, one conceptual
schema and several external schemas.

External / View Level: Many users of the database system are not concerned with all the
information in the database. Instead, they need to access only a part of the database. The
external level of abstraction simplifies the end users interaction with the system. The
system may provide many views for the same database.

Conceptual / Logical Level: The conceptual level describes what data are stored in the
database, and what relationships exist among those data. This level is used by the
Database Administrator, who in turn decides what information must be kept in the
database.
Internal / Physical level: The internal level is the lowest level of abstraction and
describes the data storage and access methods. Database Administrator may be aware of
certain details of the physical organization of the data.

Guidelines to select a primary key:


• Give preference to numeric columns(s). The search algorithm performs better
when the primary key is numeric
• Give preference to a single attribute. The search algorithm gives better output with
a single attribute primary key than with a composite attribute primary key
• Give preference to the minimal composite key. A composite key is a collection of
two or more attributes.
• Primary keys are chosen according to business convenience.

DBMS Users
End Users: Works at the external level and generally makes updates to the database or
executes queries on the database.
Application Programmer: Writes application programs.
Database Administrator: Defines the conceptual, internal and external schema, control
access privileges to/from users and ensures the consistency of the database.

Different types Keys


Candidate/Primary Key: - A Primary key is a set of one or more attributes that can
uniquely identify a row in a given table.

Foreign Key: - A foreign key is a set of attributes the values of which are required to
match the values of a candidate key in the same or another table. The foreign key
attributes can have duplicate or null values.

Self Referencing: - A table might include a foreign key, the values of which are required
to match the value of a candidate key in the same table. This is known as self referencing.
Non –Key Attributes: The attributes other than the primary key attributes in a
table/relation are called non-key
key attributes.

Data Models
A data model is a conceptual toll to describe data, data relationships, data schematics and
consistency
cy constraints. Two of the widely used data models are

1) Object Based Logical Model


a) E-R Model
2) Record Based Logical Model
a) Hierarchical Data M
Model
b) Network Data Model
odel
c) Relational Data Model
odel
d) Structural Terminology
erminology

RDBMS
Relational Database Management System is a type of DBMS that stores data in the
form of related tables.

Databases are widely used in real life applications such as:

1) Airlines: for reservations and schedule information.


2) Banking: for customer information, accounts, loans and banking transactions
3) Universities: For student information, course registrations and grades.
4) Telecommunications: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards and storing information about the
communication networks.
5) Sales: For customer, product and purchase information in any industry.

Entity Relationship model (E-R Model)


Entity relationship Diagram (ERD) was first defined in 1976 by peter chen. Since
then Charles Bachman and James Martin have added some small refinements to the basic
ERD principles.

Entity: Entity is a common word anything real or abstract, about which we want to store
data. Entity types fall into five categories: roles, events, locations, tangible things or
concepts.

Attribute: An attribute is a characteristic property of an entity. An entity could have


multiple attributes.

Example: For an entity car, the attributes would be the color, model number, number of
doors, right or left hand drive etc.

Relationship: Relationship is a natural association that exists between one or more


entities.

Cardinality of a Relationship: Cardinality of relationship defines the type of


relationship between two participating entities.

Example: One employee can take many books from library. One book can be taken by
only one employee. Cardinality of relationship between employee and book is “one to
many”.

There are four types of cardinality relationship.

i) One to One Relationship


ii) One to Many Relationship
iii) Many to One Relationship
Example: Many employees can work for only one department but one
department can have many employees.
iv) Many to Many Relationship
Example: One Student is enrolled for many courses and one course is enrolled
by many students.
E-R Diagram Notations
Entity: an Entity is an object or concept about which business user wants to store
information

Weak entity: A weak entity is dependent on another entity to exist. Example bank
branch depends upon bank name for its existence. Without bank name it is impossible to
identify bank uniquely.

Attributes: Attributes are the properties or characteristics of an entity.

Key attribute: A key attribute is the unique (primary key), distinguishing characteristic
of the entity.

Multi valued attribute: A multi valued attribute can have more than one value. For
example, an employee entity can have multiple skill values.

Derived attribute: A derived attribute is based on another attribute. For example, an


employee’s monthly salary is based on the employee’s basic salary and house rent
allowance.

Relationships: Relationships illustrate how two entities share information in the database
structure.

A model is an abstract from of any system or process that hides the unnecessary
details, while highlighting those details important to the application. This will help the
business users to visualize the application before it is developed and suggest changes, if it
is not as per their requirement.

Modeling the databases using E-R diagrams is called as E-R Modeling. This
technique is also called as Top-Down approach, because one need not identify all the
attributes to model the system using this technique.

Steps in E-R Modeling


Usually the following six steps are followed to generate E-R Models.

a. Identify the entities: Look for general nouns in requirements specification document
which are of business interest to business users.
b. Find relationships: Identify the natural relationship and their cardinalities between
the entities.
c. Identify the key attributes for every entity: Identify the attribute or set of attributes
which can identify instance of entity uniquely
d. Identify other relevant attributes: Identify other attributes which are interest to
business users and want to store the information in database.
e. Complete E-R diagram: Draw complete E-R diagram with all attributes including
primary key.
f. Review your results with your business users: Look at the list of attributes
associated with each entity to see if anything has been omitted.

Advantages of E-R Modeling


1. Easy to understand. Represented in business users language. Can be understood by
non-technical specialist.
2. Intuitive and helps in physical database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design
5. Gives a higher level abstraction of the system.

What is normalization?
Normalization is the process of efficiently organizing data in a database. There are
two goals of the normalization process:

1. Eliminating redundant data (for example, storing the same data in more than one
table)
2. Ensuring data dependencies make sense (only storing related data in a table). OR
Organize data into an efficient and logical structure.

Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.

First Normal Form (1NF)


First Normal form sets the very basic rules for an organized database:

• Eliminate duplicate columns from the same table.


• Create separate tables for each group of related data and identify each row with a
unique column or set of columns (the primary key).

Second Normal Form (2 NF)


Second normal form further address the concept of removing duplicative data:

• Meet all the requirements of the first normal form.


• Remove subsets of data that apply to multiple rows of a table and place them in
separate tables.
• Create relationships between these new tables and their predecessors through the
use of foreign keys.

Third Normal Form (3 NF)


Third normal form goes one large step further:

• Meet all the requirements of the second normal form.


• Remove columns that are not dependent upon the primary key.

Boyce Codd Normal Form (BCNF)


A relation is said to be in Boyce Codd Normal Form if and only if all the determinants
are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Let us understand this concept by using Result table structure.

In the above table we have two candidate keys namely Student# Course# and course#
Emailid. Course# is overlapping among those candidate keys. Hence these candidate
keys are called as “overlapping candidate keys”.

The non-key attribute, Marks is non-transitively and fully functionally dependant on key
attributes. Hence this is in 3NF. But this is not in BCNF because there are four
determinants in this relation namely:
• Student# (Student# decides EMailid)
• Emailid (Emailid decides Student#)
• Student# Course# (decides rest of the attributes in Result table)
• Course# Emailid (decides rest of the attributes in Result table)

All above determinants are not candidate keys. Emailid decides Student# but
Emailid on its own is not a candidate key. Similarly Student# decides Emailid of a
student but Student# alone is not a candidate key. Only combination of Student# Course#
and Course# Emailid are candidate keys.

To make this table BCNF, we need to split this table into the following structure:

Fourth Normal Form (4 NF)


Finally, fourth normal form has one additional requirement:

• Meet all the requirement of the third normal form.


• A relation is in 4NF if it has no multi-valued dependencies.

Explanation with Example


Let's say we want to create a table of user information, and we want to store each
user’s Name, Company, Company Address, and some personal bookmarks, or urls. You
might start by defining a table structure like this:
Zero Form
users
Name company company_address url1 url2
Joe ABC 1 Work Lane abc.com xyz.com
Jill XYZ 1 Job Street abc.com xyz.com

We would say this table is in Zero Form because none of our rules of normalization
have been applied yet. Notice the url1 and url2 fields -- what do we do when our
application needs to ask for a third url? Do you want to keep adding columns to your
table and hard-coding that form input field into your HTML code? Obviously not, you
would want to create a functional system that could grow with new development
requirements. Let's look at the rules for the First Normal Form, and then apply them to
this table.

First Normal Form

• Eliminate repeating groups in individual tables.


• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.

Notice how we're breaking that first rule by repeating the url1 and url2 fields? And
what about Rule Three, primary keys? Rule Three basically means we want to put some
form of unique, auto-incrementing integer value into every one of our records. Otherwise,
what would happen if we had two users named Joe and we wanted to tell them apart?
When we apply the rules of the First Normal Form we come up with the following table:
users
userId name company company_address url
1 Joe ABC 1 Work Lane abc.com
1 Joe ABC 1 Work Lane xyz.com
2 Jill XYZ 1 Job Street abc.com
2 [
Jill XYZ 1 Job Street xyz.com

Now our table is said to be in the First Normal Form. We've solved the problem of
url field limitation, but look at the headache we've now caused ourselves. Every time we
input a new record into the users table, we've got to duplicate all that company and user
name data. Not only will our database grow much larger than we'd ever want it to, but we
could easily begin corrupting our data by misspelling some of that redundant information.
Let's apply the rules of Second Normal Form:

Second Normal Form

• Create separate tables for sets of values that apply to multiple records.
• Relate these tables with a foreign key.

We break the url values into a separate table so we can add more in the future without
having to duplicate data. We'll also want to use our primary key value to relate these
fields:
users
userId name company company_address
1 Joe ABC 1 Work Lane
2 Jill XYZ 1 Job Street

urls
urlId relUserId url
1 1 abc.com
2 1 xyz.com
3 2 abc.com
4 2 xyz.com

Ok, we've created separate tables and the primary key in the users table, userId, is
now related to the foreign key in the urls table, relUserId. We're in much better shape.
But what happens when we want to add another employee of company ABC? Or 200
employees? Now we've got company names and addresses duplicating themselves all
over the place, a situation just rife for introducing errors into our data. So we'll want to
look at applying the Third Normal Form:

Third Normal Form

• Eliminate fields that do not depend on the key.

Our Company Name and Address have nothing to do with the User Id, so they should
have their own Company Id:
users
userId name relCompId
1 Joe 1
2 Jill 2

urls
urlId relUserId url
1 1 abc.com
2 1 xyz.com
3 2 abc.com
4 2 xyz.com
companies
compId company company_address
1 ABC 1 Work Lane
2 XYZ 1 Job Street

Now we've got the primary key compId in the companies table related to the foreign
key in the users table called relCompId, and we can add 200 users while still only
inserting the name "ABC" once. Our users and urls tables can grow as large as they want
without unnecessary duplication or corruption of data. Most developers will say the Third
Normal Form is far enough, and our data schema could easily handle the load of an entire
enterprise, and in most cases they would be correct.

But look at our url fields - do you notice the duplication of data? This is perfectly
acceptable if we are not pre-defining these fields. If the HTML input page which our
users are filling out to input this data allows a free-form text input there's nothing we can
do about this, and it's just a coincidence that Joe and Jill both input the same bookmarks.
But what if it's a drop-down menu which we know only allows those two urls, or maybe
20 or even more. We can take our database schema to the next level, the Fourth Form,
one which many developers overlook because it depends on a very specific type of
relationship, the many-to-many relationship, which we have not yet encountered in our
application.

Data Relationships

Before we define the Fourth Normal Form, let's look at the three basic data relationships:
one-to-one, one-to-many, and many-to-many. Look at the users table in the First
Normal Form example above. For a moment let's imagine we put the url fields in a
separate table, and every time we input one record into the users table we would input
one row into the urls table. We would then have a one-to-one relationship: each row in
the users table would have exactly one corresponding row in the urls table. For the
purposes of our application this would neither be useful nor normalized.

Now look at the tables in the Second Normal Form example. Our tables allow one user to
have many urls associated with his user record. This is a one-to-many relationship, the
most common type, and until we reached the dilemma presented in the Third Normal
Form, the only kind we needed.
The many-to-many relationship, however, is slightly more complex. Notice in our Third
Normal Form example we have one user related to many urls. As mentioned, we want to
change that structure to allow many users to be related to many urls, and thus we want a
many-to-many relationship. Let's take a look at what that would do to our table structure
before we discuss it:

users

userId name relCompId

1 Joe 1

2 Jill 2

companies

compId company company_address

1 ABC 1 Work Lane

2 XYZ 1 Job Street

urls

urlId url

1 abc.com

2 xyz.com

url_relations

relationId relatedUrlId relatedUserId

1 1 1

2 1 2

3 2 1
4 2 2

In order to decrease the duplication of data (and in the process bring ourselves to
the Fourth Form of Normalization), we've created a table full of nothing but primary and
foreign keys in url_relations. We've been able to remove the duplicate entries in the urls
table by creating the url_relations table. We can now accurately express the
relationship that both Joe and Jill are related to each one of, and both of, the urls. So let's
see exactly what the Fourth Form of Normalization entails:

Fourth Normal Form

• In a many-to-many relationship, independent entities cannot be stored in the same


table.

Since it only applies to the many-to-many relationship, most developers can rightfully
ignore this rule. But it does come in handy in certain situations, such as this one. We've
successfully streamlined our urls table to remove duplicate entries and moved the
relationships into their own table.
Just to give you a practical example, now we can select all of Joe's urls by performing
the following SQL call:
SELECT name, url FROM users, urls, url_relations WHERE
url_relations.relatedUserId = 1 AND users.userId = 1 AND urls.urlId =
url_relations.relatedUrlId

And if we wanted to loop through everybody's User and Url information, we'd do
something like this:
SELECT name, url FROM users, urls, url_relations WHERE users.userId =
url_relations.relatedUserId AND urls.urlId = url_relations.relatedUrlId

What is the difference between a “where” clause and a “having” clause?


“Where” is a restriction statement? You use where clause to restrict data being
accessed from the database. Where clause is used before result is retrieved. But having
clause is used after retrieving the data. Having clause is a kind of filtering command.

What is de-normalization and when do we use De-normalization?


De-normalization is a technique to move from higher normal form to lower normal
form in order to speed up database access. De-normalization is done when fast retrieval is
must than redundancy.
What is a Trigger?
A trigger is a SQL procedure that initiates an action when an even (INSERT,
DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS.
Triggers are used to maintain the referential integrity of data by changing the data in
systematic fashion. A trigger cannot be called or executed; the DBMS automatically fires
to stored procedures in that both consist of procedural logic that is stored at the database
level.

What is a cursor?
Cursor is a database object used by applications to manipulate data in a set on a
row-by-row basis, instead of the typical SQL commands that operate on all the rows in
the set at one time.

In order to work with a cursor we need to perform some steps in the following order:

Declare cursor
Open Cursor
Fetch row from the cursor
Process fetched row
Close Cursor
De-allocate Cursor

What is the difference between clustered and non-Clustered Index?


A clustered index is a special type of index that recorders the way records in the
table are physically stored. Therefore table can have only one clustered index. The leaf
nodes of a clustered index contain the data pages.
A non-clustered index is a special type of index in which the logical order of the
index does not match the physical stored order of the rows on disk. The leaf node of a
non-clustered index does not consist of the data pages. Instead, the leaf nodes contain
index rows.

What is the difference between a primary key and a unique key?


Both primaries key and unique enforce uniqueness of the column on which they
are defined. But by default primary key creates a clustered index on the column, where
are unique creates a non-clustered index by default. Another major difference is that,
primary key doesn’t allow NULLs, but unique key allows one NULL only.
SQL Statements:
Statement Syntax
AND / OR SELECT column_name(s)
FROM table_name
WHERE condition
AND|OR condition
ALTER TABLE ALTER TABLE table_name
ADD column_name datatype

or

ALTER TABLE table_name


DROP COLUMN column_name
AS (alias) SELECT column_name AS column_alias
FROM table_name

or

SELECT column_name
FROM table_name AS table_alias
BETWEEN SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
CREATE DATABASE CREATE DATABASE database_name
CREATE TABLE CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
column_name2 data_type,
...
)
CREATE INDEX CREATE INDEX index_name
ON table_name (column_name)

or

CREATE UNIQUE INDEX index_name


ON table_name (column_name)
CREATE VIEW CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
DELETE DELETE FROM table_name
WHERE some_column=some_value
or

DELETE FROM table_name


(Note: Deletes the entire table!!)

DELETE * FROM table_name


(Note: Deletes the entire table!!)
DROP DATABASE DROP DATABASE database_name
DROP INDEX DROP INDEX table_name.index_name (SQL Server)
DROP INDEX index_name ON table_name (MS Access)
DROP INDEX index_name (DB2/Oracle)
ALTER TABLE table_name
DROP INDEX index_name (MySQL)
DROP TABLE DROP TABLE table_name
GROUP BY SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
IN SELECT column_name(s)
FROM table_name
WHERE column_name
IN (value1,value2,..)
INSERT INTO INSERT INTO table_name
VALUES (value1, value2, value3,....)

or

INSERT INTO table_name


(column1, column2, column3,...)
VALUES (value1, value2, value3,....)
INNER JOIN SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
LEFT JOIN SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
RIGHT JOIN SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
FULL JOIN SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name
LIKE SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern
ORDER BY SELECT column_name(s)
FROM table_name
ORDER BY column_name [ASC|DESC]
SELECT SELECT column_name(s)
FROM table_name
SELECT * SELECT *
FROM table_name
SELECT DISTINCT SELECT DISTINCT column_name(s)
FROM table_name
SELECT INTO SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_table_name

or

SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_table_name
SELECT TOP SELECT TOP number|percent column_name(s)
FROM table_name
TRUNCATE TABLE TRUNCATE TABLE table_name
UNION SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
UNION ALL SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2
UPDATE UPDATE table_name
SET column1=value, column2=value,...
WHERE some_column=some_value
WHERE SELECT column_name(s)
FROM table_name
WHERE column_name operator value

Вам также может понравиться