Вы находитесь на странице: 1из 19

SECTION-A

NOTES

1. Data

Data is the raw material that is to be processed for information or for collection of details. It is
unorganized data or facts that are to be processed. Data is plain fact and it has to be processed for
further information. Data itself is enough to get details and find the meaning of something. Data
is the computers language. Data is useless unless it is processed or has been made into
something. Data has no meaning when it has not been interpreted. Data is an unclear definition
of words jumbled up to form one meaning of something. Data comes in figures, dates and
numbers and is not processed.

Examples of Data

 Student Data on Admission Forms: When students get admission in a college. They fill
admission form. This form contains raw facts (data of student) like name, father’s name,
address of student etc.
 Data of Citizens: During census, data of all citizens is collected.
 Survey Data: Different companies collect data by survey to know the opinion of people
about their product.
 Students Examination data: In examination data about obtained marks of different
subjects for all students is collected.

2. Information

Information is processed data. The data that can be made useful is known as information.
Information is basically the data plus the meaning of what the data was collected for. Data does
not depend upon information but information depends upon data. It cannot be generated without
the help of data. Information is something that is being conveyed. Information is meaningful
when data is gathered and meaning is generated. Information cannot be generated without the
help of data. Information is the meaning that has been formed with the help of data and that
meaning makes sense because of the data that has been collected against the word. Information is
processed and comes in a meaningful form.

Examples of Information

 Student Address Labels: Stored data of students can be used to print address labels of
students.
 Census Report: Census data is used to get report/information about total population of a
country and literacy rate etc.
 Survey Reports and Results: Survey data is summarized into reports/information to
present to management of the company.
 Result Cards of Individual Students: In examination system collected data (obtained
marks in each subject) is processed to get total obtained marks of a student. Total
obtained marks are Information. It is also used to prepare result card of a student.
 Merit List: After collecting admission forms from candidates, merit is calculated on the
basis of obtained marks of each candidate. Normally, percentage of marks obtained is
calculated for each candidate. Now all the candidates names are arranged in descending
order by percentage. This makes a merit list. Merit list is used to decide whether a
candidate will get admission in the college or not.

Key Differences

 Data is the input language for a computer and information is the output language for
human.
 Data is unprocessed facts or mere figures but information is processed data which has
been made sense of.
 Data does not depend on information but information depends on data and without it,
information cannot be processed.
 Data is not specific but information is specific enough to generate meaning.
 Data is the raw material that is collected but information is a detailed meaning generated
from the data.

Database (DB)

A database is a collection of information that is organized so that it can be easily accessed,


managed and updated. Data is organized into rows, columns and tables, and it is indexed to make
it easier to find relevant information. Data gets updated, expanded and deleted as new
information is added. We can create and update the databases.

Computer databases contains aggregations of data records or files such as sales transactions,
product catalogs and inventories and customer profiles. Database manager provides users with
the ability to control read/write access, specify report generation and analyze usage. Some
databases offer ACID (atomicity, consistency, isolation and durability) compliance to guarantee
that data is consistent and that transactions are complete.

There are many different kinds of databases:-

1. Relational database

A relational database, invented by E.F. Codd at IBM in 1970, is a tabular database in which data
is defined so that it can be reorganized and accessed in a number of different ways.
Relational databases are made up of a set of tables with data that fits into a predefined category.
Each table has at least one data category in a column, and each row has a certain data instance
for the categories which are defined in the columns.

The Structured Query Language (SQL) is the standard user and application program interface for
a relational database. Relational databases are easy to extend, and a new data category can be
added after the original database creation without requiring that you modify all the existing
applications.

2. Distributed database

A distributed database is a database in which portions of the database are stored in multiple
physical locations, and in which processing is dispersed or replicated among different points in a
network.

Distributed databases can be homogeneous or heterogeneous. All the physical locations in a


homogeneous distributed database system have the same underlying hardware and run the same
operating systems and database applications. The hardware, operating systems or database
applications in a heterogeneous distributed database may be different at each of the locations.

3. Cloud database

A cloud database is a database that has been optimized or built for a virtualized environment,
either in a hybrid cloud, public cloud or private cloud. Cloud databases provide benefits such as
the ability to pay for storage capacity and bandwidth on a per-use basis, and they provide
scalability on demand, along with high availability.

A cloud database also gives enterprises the opportunity to support business applications in a
software-as-a-service deployment.

4. NoSQL database

NoSQL databases are useful for large sets of distributed data.

NoSQL databases are effective for big data performance issues that relational databases aren't
built to solve. They are most effective when an organization must analyze large chunks of
unstructured data or data that's stored across multiple virtual servers in the cloud.

5. Object-oriented database

Items created using object-oriented programming languages are stored in relational databases but
object-oriented databases are well-suited for those items.

An object-oriented database is organized around objects rather than actions and data rather than
logic. For example, a multimedia record in a relational database can be a definable data object, as
opposed to an alphanumeric value.
6. Graph database

A graph-oriented database, or graph database is a type of NoSQL database that uses graph
theory to store, map and query relationships. Graph databases are basically collections of nodes
and edges where each node represents an entity and each edge represents a connection between
nodes.

Graph databases are growing in popularity for analyzing interconnections. For example,
companies might use a graph database to mine data about customers from social media.

Accessing the database: DBMS and RDBMS

A database management system (DBMS) is a type of software that allows us to define,


manipulate, retrieve and manage data stored within a database.

A relational database management system (RDBMS) is a type of database management


software that was developed in the 1970s, based on the relational model, and is still the
most popular way to manage a database.

Conventional File Processing Vs DBMS

What Is File Processing System? Disadvantages of It.


Before DBMS was invented, Information was stored using File Processing System. In this
System, data is stored in permanent system files (secondary Storage). Different application
programs are written to extract data from these files and to add record to these files. But, There
are Number of disadvantages in using File Processing System, to store the data.

1. Data Redundancy and Inconsistency :


Since the data files and application programs are created by different programmers over a long
period.

 The data files are likely to have different formats.


 program maybe written in several programming languages.
 same information may be duplicated in several places

This results in data redundancy and inconsistency.


consider following two data files:

 Saving account data file: Stores information about customer.


{acc_no, name, social_security, addr, teleph_no};
 Checking account data file: Stores information about customer.
{acc_no, name, social_security, addr, teleph_no};
Fields {name, social_security, addr, teleph_no} are same in both the files i.e. Duplication of data
is there which results data redundancy. Data Redundancy increases the cost of storing &
retrieving the data.
If the values of these common fields are not matching for some record in both files, then it
results in Inconstancy of data.

2.Difficulty in Accessing The Data :


Conventional file processing system doesn't allow needed data to be retrieved in a convenient
and efficient manner. for Example, consider a data file, Saving account data file with fields

{acc_no, name, social_security, addr, balance}.

Application programs to access the data are written, But if user wants to display only those
records for which balance is greater than $10,000. And is that program is not written, then it ids
difficult to access that data.

3. Data Isolation :
Because data are scattered in various files, and files may be in different formats, it is difficult to
write new application programs to retrieve the appropriate data.

4. Integrity Problems :
The data values stored in the database must satisfy certain types of consistency constraints.
Application programmers enforce these consistency constraints by adding appropriate code in
the various application programs. When a new constraint is to be added, it is difficult to change
the program to enforce the new constraint.

5. Atomicity problem :
It is difficult to ensure that the data stored in the database is in consistent manner; the atomicity
property in a conventional file-processing system is difficult to satisfy.

6. Concurrent-access anomalies:
If multiple users are updating the same data simultaneously it will result in inconsistent data
state. In file processing system it is very difficult to handle this using program code. This results
in concurrent access anomalies.

7. Security Problems
Not every user of the database system should be able to access all the data. For example, in a
banking system, payroll personnel need to see only that part of the database that has information
about various bank employees. They do not need access to information about customer accounts.
Since application programs are added to the system in ad hoc manner, it is difficult to enforce
such security constraints.
Drawbacks of File system:

 Data Isolation: Because data are scattered in various files and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
 Duplication of data – Redundant data
 Dependency on application programs – Changing files would lead to change in
application programs.

Advantages of file-oriented system


Advantage and Disadvantages of File-oriented system are given below

Advantage of File-oriented system:

1. Backup:

 It is possible to take faster and automatic back-up of database stored in files of computer-based
systems.
 computer systems provide functionalities to serve this purpose. it is also possible to develop
specific application program for this purpose.

2. Compactness:

 It is possible to store data compactly.

3. Data Retrieval:

 Computer-based systems provide enhanced data retrieval techniques to retrieve data stored in
files in easy and efficient way.

4. Editing:

 It is easy to edit any information stored in computers in form of files.


 Specific application programs or editing software can be used for this purpose.

5. Remote Access:

 In computer-based systems,it is possible to access data remotely.


 so,to access data it is not necessary for a user to remain present at location where these data are
kept.

6. Sharing:
 Data stored in files of computer-based systems can be shared among multiple users at a same
time.

Disadvantage of File-oriented system:

1. Data Redundancy:

 It is possible that the same information may be duplicated in different files.this leads to data
redundancy results in memory wastage.

2. Data Inconsistency:

 Because of data redundancy,it is possible that data may not be in consistent state.

3. Difficulty in Accessing Data:

 Accessing data is not convenient and efficient in file processing system.

4. Limited Data Sharing:

 Data are scattered in various files.also different files may have different formats and these files
may be stored in different folders may be of different departments.
 So, due to this data isolation, it is difficult to share data among different applications.

5. Integrity Problems:

 Data integrity means that the data contained in the database in both correct and consistent.for
this purpose the data stored in database must satisfy correct and constraints.

6. Atomicity Problems:

 Any operation on database must be atomic.


 this means, it must happen in its entirely or not at all.

7. Concurrent Access Anomalies:

 Multiple users are allowed to access data simultaneously.this is for the sake of better
performance and faster response.

8. Security Problems:

 Database should be accessible to users in limited way.


 Each user should be allowed to access data concerning his requirements only.
Advantage of DBMS over file system

There are several advantages of Database management system over file system. Few of them are
as follows:

 No redundant data – Redundancy removed by data normalization


 Data Consistency and Integrity – data normalization takes care of it too
 Secure – Each user has a different set of access
 Privacy – Limited access
 Easy access to data
 Easy recovery
 Flexible

Disadvantages of DBMS:

 DBMS implementation cost is high compared to the file system


 Complexity: Database systems are complex to understand
 Performance: Database systems are generic, making them suitable for various
applications. However this feature affect their performance for some applications

Database is a collection of related data and data is a collection of facts and figures that can be
processed to produce information.

Mostly data represents recordable facts. Data aids in producing information, which is based on
facts. For example, if we have data about marks obtained by all students, we can then conclude
about toppers and average marks.

A database management system stores data in such a way that it becomes easier to retrieve,
manipulate, and produce information.

Characteristics

Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data management.
A modern DBMS has the following characteristics −
 Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use students as an entity and their age as an attribute.
 Relation-based tables − DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.

 Isolation of data and application − A database system is entirely different than its data.
A database is an active entity, whereas data is said to be passive, on which the database
works and organizes. DBMS also stores metadata, which is data about data, to ease its
own process.

 Less redundancy − DBMS follows the rules of normalization, which splits a relation
when any of its attributes is having redundancy in values. Normalization is a
mathematically rich and scientific process that reduces data redundancy.

 Consistency − Consistency is a state where every relation in a database remains


consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state. A DBMS can provide greater consistency as compared to
earlier forms of data storing applications like file-processing systems.

 Query Language − DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and as different
filtering options as required to retrieve a set of data. Traditionally it was not possible
where file-processing system was used.

 ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,


and Durability (normally shortened as ACID). These concepts are applied on transactions,
which manipulate data in a database. ACID properties help the database stay healthy in
multi-transactional environments and in case of failure.

 Multiuser and Concurrent Access − DBMS supports multi-user environment and


allows them to access and manipulate data in parallel. Though there are restrictions on
transactions when users attempt to handle the same data item, but users are always
unaware of them.

 Multiple views − DBMS offers multiple views for different users. A user who is in the
Sales department will have a different view of database than a person working in the
Production department. This feature enables the users to have a concentrate view of the
database according to their requirements.

 Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into the database and retrieving the same at a later stage.
DBMS offers many different levels of security features, which enables multiple users to
have different views with different features. For example, a user in the Sales department
cannot see the data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the user. Since a
DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to
break the code.

Users

A typical DBMS has users with different rights and permissions who use it for different
purposes. Some users retrieve data and some back it up. The users of a DBMS can be broadly
categorized as follows −

 Administrators − Administrators maintain the DBMS and are responsible for


administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.
 Designers − Designers are the group of people who actually work on the designing part
of the database. They keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints, and views.

 End Users − End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
DBMS - Data Models
Data models define how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.

The very first data model could be flat data-models, where all the data used are to be kept in the
same plane.

Entity-Relationship Model

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. The ER Model creates entity set, relationship set, general attributes and constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

 Entities and their attributes.

 Relationships among entities.

These concepts are explained below.

 Entity − An entity in an ER Model is a real-world entity having properties called


attributes. Every attribute is defined by its set of values called domain. For example, in
a school database, a student is considered as an entity. Student has various attributes like
name, age, class, etc.
 Relationship − The logical association among entities is called relationship.
Relationships are mapped with entities in various ways. Mapping cardinalities define the
number of association between two entities.

Mapping cardinalities −

o one to one
o one to many
o many to one

o many to many

Relational Model

The most popular data model in DBMS is the Relational Model. It is more scientific a model
than others. This model is based on first-order predicate logic and defines a table as an n-ary
relation.

The main highlights of this model are −

 Data is stored in tables called relations.


 Relations can be normalized.

 In normalized relations, values saved are atomic values.

 Each row in a relation contains a unique value.

 Each column in a relation contains values from a same domain.


Data Schemas
Database Schema

A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.

A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.

A database schema can be divided broadly into two categories −

 Physical Database Schema − This schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a
secondary storage.
 Logical Database Schema − This schema defines all the logical constraints that need to
be applied on the data stored. It defines tables, views, and integrity constraints.

Database Instance

It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain any
data or information.

A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that its
every instance (state) is in a valid state, by diligently following all the validations, constraints,
and conditions that the database designers have imposed.

Data Independence
If a database system is not multi-layered, then it becomes difficult to make any changes in the
database system. Database systems are designed in multi-layers as we learnt earlier.

Data Independence

A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer, it does
not affect the data at another level. This data is independent but mapped to each other.

Logical Data Independence

Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all its constraints, applied on
that relation.

Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data residing
on the disk.

Physical Data Independence

All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or logical
data.

For example, in case we want to change or upgrade the storage system itself − suppose we want
to replace hard-disks with SSD − it should not have any impact on the logical data or schemas.

Types of Database languages

We need a method to create all the logical objects like tables, views, procedures and packages in
the database and we need some interface between the user and the database, so that we can
access the data stored in it. We also need a standardized method to organize these tables and
views in the database.
DBMS is software that defines different operations to be carried out in the database. It varies
from creating a database, tables, index, constraints to manipulating the data in the database like
inserting, deleting, updating, retrieving, sorting etc. In order to perform all these operations,
DBMS defines two forms of database languages.

Below are the database languages. Click them for more details.

DDL Data Definition Language

DML Data Manipulation Language

DCL Data Control Language

TCL Transaction Control Language

DDLs are used to define the metadata of the database. We can create schema, tables, constraints,
indexes in the database. DDLs are also used to modify Schema, tables index etc. Basically, using
DDL statements, we create skeleton of the database. It helps to store the metadata information
like number of schemas and tables, their names, columns in each table, indexes, constraints etc in
the database.

Some of the DDL commands are

Create

Create is used to create schema, tables, index, and constraints in the database. The basic syntax
to create table is as follows.

CREATE TABLE tablename (Column1 DATATYPE, Column2 DATATYPE, … ColumnN


DATATYPE);

CREATE TABLE STUDENT (STUDENT_ID CHAR (10), STUDENT_NAME CHAR (10));

Alter

Suppose we have created a STUDENT table with his ID and Name. Later we realize that this
table should have his address and Age too. What do we do at this stage? We will add the column
to the existing table by the use of ALTER command.

This command is used to modify the structure of the Schema or table. It can even used to add,
modify or delete the columns in the table. The syntax for alter statement is as follows.

To add a new column: ALTER TABLE table_name ADD column_name datatype;


To delete a column: ALTER TABLE table_name DROP COLUMN column_name;
To modify a column: ALTER TABLE table_name MODIFY column_name datatype;
To rename table: ALTER TABLE table_name RENAME TO new_table_name;
To rename the column: ALTER TABLE table_name RENAME COLUMN old_Column_name
to new_Column_name;

Suppose we want to add Address column to the STUDENT table.

ALTER TABLE STUDENT ADD Address varchar2 (100);

DROP: - DROP statement is used to remove the table or index from the database. It can even be
used to remove the database. Once the DROP statement is executed, the object will not available
for use.

DROP TABLE table_name;


DROP INDEX index_name;
DROP DATABASE database_name;
DROP TABLE STUDENT;

Truncate

Truncate statement is used to remove the content of the table but keeps the structure of the table.
This simply removes all the records from the table. No partial removal of data is possible here. It
also removes all the spaces allocated for the data.

TRUNCATE TABLE table_name;


TRUNCATE TABLE STUDENT;

Data Manipulation Language - DML

When we have to insert records into table or get specific record from the table, or need to change
some record, or delete some record or perform any other actions on records in the database, we
need to have some media to perform it. DML helps to handle user requests. It helps to insert,
delete, update, and retrieve the data from the database. Let us see some of them.

Select

Select command helps to pull the records from the tables or views in the database. It either pulls
the entire data from the table/view or pulls specific records based on the condition. It can even
retrieve the data from one or more tables/view in the database.

The basic SELECT command is


SELECT * FROM table_name; -- retrieves all the rows and columns from table
table_name and displays it in tabular form.
SELECT COLUMN1, COLUMN2, COLUMN3 from table_name; -- retrieves only 3 columns
from table table_name
SELECT t1.COLUMN1, t2.COLUMN1
FROM table_name1 t1, table_name2 t2
WHERE t1.COLUMN2 = t2.COLUMN2;

Insert

Insert statement is used to insert new records into the table. The general syntax for insert is as
follows:

INSERT INTO TABLE_NAME


(col1, col2, col3,...colN) VALUES (value1, value2, value3,...valueN);

Update

Update statement is used to modify the data value in the table. General syntax for update is as
below:

UPDATE table_name
SET column_name1 = value1,
column_name2 = value2,
...
column_nameN = valueN,
[WHERE condition]

Delete

Using Delete statement, we can delete the records in the entire table or specific record by
specifying the condition.

DELETE FROM table_name [WHERE condition];

Suppose we have to delete an employee with id 110 from Employee table. Then the delete
statement would be

DELETE FROM EMPLOYEE WHERE EMP_ID = 110;

Data Control Language - DCL


DCL languages are used to control the user access to the database, tables, views, procedures,
functions and packages. They give different levels of access to the objects in the database.
GRANT

GRANT provides the privileges to the users on the database objects. The privileges could be
select, delete, update and insert on the tables and views. On the procedures, functions and
packages it gives select and execute privileges. We can either give all the privileges or any one or
more privileges to the objects. The syntax of GRANT is as below:

GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];

REVOKE

REVOKE removes the privileges given on the database objects. We can remove all the privileges
or remove one or more privileges from the objects.

REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}

REVOKE INSERT ON STUDENT FROM Mathew;

Transaction Control Language - TCL


COMMIT

COMMIT saves the transaction on the database. The transaction can be insert, delete or update.
Once the COMMIT is issued, the changes are saved permanently in the database. It cannot be
undone.

UPDATE STUDENT SET STUDENT_NAME = ‘Mathew’ WHERE STUDENT_NAME = ‘Mahtwe’;


COMMIT;

ROLLBACK

ROLLBACK command is used to undo the insert, delete or update transaction in the database. It
undoes the transaction performed on the table and restores the previous stored value.

UPDATE STUDENT SET STUDENT_NAME = ‘Stewart’ WHERE STUDENT_NAME = ‘Mathew;


ROLLBACK;

Вам также может понравиться