Вы находитесь на странице: 1из 29

Question Bank for Database Management Techniques

What are the Drawbacks of traditional file systems for large data storage?
File processing system is good when there is only limited number of files and data in are very less. As the data and files in
the system grow, handling them becomes difficult.

Data Mapping and Access: - Although all the related informations are grouped and stored in different files, there
is no mapping between any two files. i.e.; any two dependent files are not linked. Even though Student files and
Student_Report files are related, they are two different files and they are not linked by any means. Hence if we
need to display student details along with his report, we cannot directly pick from those two files. We have to write
a lengthy program to search Student file first, get all details, then go Student_Report file and search for his report.

When there is very huge amount of data, it is always a time consuming task to search for particular information
from the file system. It is always an inefficient method to search for the data.

Data Redundancy: - There are no methods to validate the insertion of duplicate data in file system. Any user can
enter any data. File system does not validate for the kind of data being entered nor does it validate for previous
existence of the same data in the same file. Duplicate data in the system is not appreciated as it is a waste of space,
and always lead to confusion and mishandling of data. When there are duplicate data in the file, and if we need to
update or delete the record, we might end up in updating/deleting one of the record, leaving the other record in the
file. Again the file system does not validate this process. Hence the purpose of storing the data is lost.

Though the file name says Student file, there is a chance of entering staff information or his report information in
the file. File system allows any information to be entered into any file. It does not isolate the data being entered
from the group it belongs to.

Data Dependence: - In the files, data are stored in specific format, say tab, comma or semicolon. If the format of
any of the file is changed, then the program for processing this file needs to be changed. But there would be many
programs dependent on this file. We need to know in advance all the programs which are using this file and change
in the entire place. Missing to change in any one place will fail whole application. Similarly, changes in storage
structure, or accessing the data, affect all the places where this file is being used. We have to change it entire
programs. That is smallest change in the file affect all the programs and need changes in all them.

Data inconsistency: - Imagine Student and Student_Report files have student’s address in it, and there was a
change request for one particular student’s address. The program searched only Student file for the address and it
updated it correctly. There is another program which prints the student’s report and mails it to the address
mentioned in the Student_Report file. What happens to the report of a student whose address is being changed?
There is a mismatch in the actual address and his report is sent to his old address. This mismatch in different copies
of same data is called data inconsistency. This has occurred here, because there is no proper listing of files which
has same copies of data.

Data Isolation: - Imagine we have to generate a single report of student, who is studying in particular class, his
study report, his library book details, and hostel information. All these informations are stored in different files.
How do we get all these details in one report? We have to write a program. But before writing the program, the
programmer should find out which all files have the information needed, what is the format of each file, how to
search data in each file etc. Once all these analysis is done, he writes a program. If there is 2-3 files involved,
programming would be bit simple. Imagine if there is lot many files involved in it? It would be require lot of effort
from the programmer. Since all the datas are isolated from each other in different files, programming becomes
difficult.
Security: - Each file can be password protected. But what if have to give access to only few records in the file? For
example, user has to be given access to view only their bank account information in the file. This is very difficult in
the file system.

Integrity: - If we need to check for certain insertion criteria while entering the data into file it is not possible
directly. We can do it writing programs. Say, if we have to restrict the students above age 18, then it is by means of
program alone. There is no direct checking facility in the file system. Hence these kinds of integrity checks are not
easy in file system.

Atomicity: - If there is any failure to insert, update or delete in the file system, there is no mechanism to switch
back to the previous state. Imagine marks for one particular subject needs to be entered into the Report file and then
total needs to be calculated. But after entering the new marks, file is closed without saving. That means, whole of
the required transaction is not performed. Only the totaling of marks has been done, but addition of marks not being
done. The total mark calculated is wrong in this case. Atomicity refers to completion of whole transaction or not
completing it at all. Partial completion of any transaction leads to incorrect data in the system. File system does not
guarantee the atomicity. It may be possible with complex programs, but introduce for each of transaction costs
money.

Concurrent Access: - Accessing the same data from the same file is called concurrent access. In the file system,
concurrent access leads to incorrect data. For example, a student wants to borrow a book from the library. He
searches for the book in the library file and sees that only one copy is available. At the same time another student
also, wants to borrow same book and checks that one copy available. First student opt for borrow and gets the book.
But it is still not updated to zero copy in the file and the second student also opt for borrow! But there are no books
available. This is the problem of concurrent access in the file system.

Describes levels of abstractions in database systems in detail.

database is full of datas and records. What we see in rows and columns is quite different when it reaches the memory. What
we see is the actual data. But when they are stored in the memory like disks or tapes, they are stored in the form of bits. But
any users will not understand these bits. He needs to see the actual data to understand. But all the details about the data
stored in the memory are not necessary for the users. He needs only little information that he is interested or wants to work
with. Masking the unwanted data from the users happens at different levels in the database. This masking of data is called
data abstraction. There are 4 levels of data abstraction.

External level - This is the highest level in data abstraction. At this level users see the data in the form of rows and
columns. This level illustrates the users how the data is stored in terms of tables and relations. Users view full or
partial data based on the business requirement. The users will have different views here, based on their levels of
access rights. For example, student will not have access to see Lecturers salary details, one employee will not have
access to see other employees details, unless he is a manager. At this level, one can access the data from database
and perform some calculations based on the data. For example calculate the tax from the salary of employee,
calculate CGPA of a Student, Calculate age of a person from his Date of Birth etc. These users can be real users or
any programs.

Any changes/ computations done at this level will not affect other levels of data. That means, if we retrieve the few
columns of the STUDENT table, it will not change the whole table, or if we calculate the CGPA of a Student, it
will not change/update the table. This level of data is based on the below levels, but it will not alter the data at
below levels.

Logical/ Conceptual level - This is the next level of abstraction. It describes the actual data stored in the database
in the form of tables and relates them by means of mapping. This level will not have any information on what a
user views at external level. This level will have all the data in the database.

Any changes done in this level will not affect the external or physical levels of data. That is any changes to the
table structure or the relation will not modify the data that the user is viewing at the external view or the storage at
the physical level. For example, suppose we have added a new column ‘skills’ which will not modify the external
view data on which the user was viewing Ages of the students. Similarly, it will have space allocated for ‘Skills’ in
the physical memory, but it will not modify the space or address of Date of Birth (using which Age will be derived)
in the memory. Hence external and physical independence is achieved.

Internal level - This is one of the intermediary levels. In most of the cases this level is not mentioned and usually it
is said that we have 3 levels of data abstraction. This level depends on the DBMS software. This level is how the
database is seen from DBMS. We can even combine logical level and this level.

Physical level - This is the lowest level in data abstraction. This level describes how the data is actually stored in
the physical memory like magnetic tapes, hard disks etc. In this level the file organization methods like hashing,
sequential, B+ tree comes into picture. At this level, developer would know the requirement, size and accessing
frequency of the records clearly. So designing this level will not be much complex for him.

What are the Schema and Instance? Explain in detail.

Instances in DBMS
In simple words, it is the snapshot of the database taken at a particular moment. It can also be described in
more significant way as the collection of the information stored in the database at that particular moment.
Instance can also be called as the database state or current set of occurrence due the fact that it is information
that is present at the current state.
Every time we update the state say we insert, delete or modify the value of the data item in the record, it
changes from one state to other. At the given time, each schema has its own set of instances.
Lets take an example to understand in a much better way,
An organization with an employees database will have three different instances such as production that is
used to monitor the data right at that moment, per-production that is used to test new functionality prior to
release of production and the development that is used by database developers to create new functionality.

Schema in DBMS
It is the overall description or the overall design of the database specified during the database design.
Important thing to be remembered here is it should not be changed frequently. Basically, it displays the
record types(entity),names of data items(attribute) but not the relation among the files.
Interesting point is the values in schema might change but not the structure of schema.

To understand it well, Schema can be assumed as a framework where in the values of data items are to be
fitted, these values can be changed but not frame/format of the schema.
Consider the below two examples of schema for database stores and discounts
STORES

store_nam store_id store_add city state zip_code


e
DISCOUNTS

discount_type store_id lowqty highqty discount

The former example shows the schema for stores displaying the name of the store, store id,address,city and
state in which it is located and the zip code of respective location.

The latter example is all about schema of discounts that clearly shows the type,id and quality,thus we can
now relate to the fact that schema only displays the record types (entities) and names of data
items(attributes) but does not show the relation among the files.
Schema can be partitioned as logical schema and physical schema.
Look at the below diagram

Here,former part shows the logical schema which is concerned with the data structure with exploring data
structure offered to DBMS so that schema is very easy for the computer to understand.
The latter part that is the physical schema is concerned with the way or the manner in which conceptual
database gets represented in the computer as it is stored in the database.Physical schema is hidden behind the
logical schema and thus can be be modified without affecting the application programs

Database management system provides data definition language(DDL) and document schema definition
language(DSDL) to specify both logical and physical schema.

Sub schema in DBMS


It can be defined as the subset or sub-level of schema that has the same properties as the schema. In simple
words it is just a effective plan or the schema for the view. Well, it is interesting to note that it provides the
users a window through which the user can view only that part of database which is of matter of interest to
him. It Identifies subset of areas, sets, records, data names defined in database that is of interest to him. Thus
a portion of database can be seen by application programs and different application programs has different
view of data.

Quickly we can summarize the above things, information/data in database at particular moment is known as
instance,physical arrangement of data as it appears in database can be defined as schema, and the logical
view of data as it appears to the application can be called as sub schema.

State and Explain any 6out of 12 Codd’s Rule for RDBMS.

Rules
Rule 0: The foundation rule:
For any system that is advertised as, or claimed to be, a relational data base management system, that system must be
able to manage data bases entirely through its relational capabilities.
Rule 1: The information rule:
All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values
in tables.
Rule 2: The guaranteed access rule:
Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a
combination of table name, primary key value and column name.
Rule 3: Systematic treatment of null values:
Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other
number) are supported in fully relational DBMS for representing missing information and inapplicable information in a
systematic way, independent of data type.
Rule 4: Dynamic online catalog based on the relational model:
The data base description is represented at the logical level in the same way as ordinary data, so that authorized users
can apply the same relational language to its interrogation as they apply to the regular data.
Rule 5: The comprehensive data sublanguage rule:
A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-
blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined
syntax, as character strings and that is comprehensive in supporting all of the following items:
1. Data definition.
2. View definition.
3. Data manipulation (interactive and by program).
4. Integrity constraints.
5. Authorization.
6. Transaction boundaries (begin, commit and rollback).
Rule 6: The view updating rule:
All views that are theoretically updatable are also updatable by the system.
Rule 7: Possible for high-level insert, update, and delete:
The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of
data but also to the insertion, update and deletion of data.
Rule 8: Physical data independence:
Application programs and terminal activities remain logically unimpaired whenever any changes are made in either
storage representations or access methods.
Rule 9: Logical data independence:
Application programs and terminal activities remain logically unimpaired when information-preserving changes of any
kind that theoretically permit unimpairment are made to the base tables.
Rule 10: Integrity independence:
Integrity constraints specific to a particular relational data base must be definable in the relational data sublanguage and
storable in the catalog, not in the application programs.
Rule 11: Distribution independence:
The end-user must not be able to see that the data is distributed over various locations. Users should always get the
impression that the data is located at one site only.
Rule 12: The non subversion rule:
If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or
bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).

Explain ACID properties for applied Database Systems.

ACID Properties in DBMS


A transaction is a single logical unit of work which accesses and possibly modifies the contents of a database. Transactions
access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties are followed. These are called ACID
properties.

Atomicity

By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is no midway i.e. transactions
do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves
following two operations.

—Abort: If a transaction aborts, changes made to database are not visible.

—Commit: If a transaction commits, changes made are visible.

Atomicity is also known as the ‘All or nothing rule’.

Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then amount
has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be
executed in entirety in order to ensure correctness of database state.

Consistency

This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers
to correctness of a database. Referring to the example above,

The total amount before and after the transaction must be maintained.

Total before T occurs = 500 + 200 = 700.

Total after T occurs = 400 + 300 = 700.

Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result T is incomplete.

Isolation

This property ensures that multiple transactions can occur concurrently without leading to inconsistency of database state.
Transactions occur independently without interference. Changes occurring in a particular transaction will not be visible to any
other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures
that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed
serially in some order.

Let X= 500, Y = 500.

Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of operations takes place due to which
T’’ reads correct value of X but incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)

is thus not consistent with the sum at end of transaction:

T: (X+Y = 50, 000 + 450 = 50, 450).

This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and changes
should be visible only after a they have been made to the main memory.

Durability:

This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored
in and written to disk and they persist even is system failure occurs. These updates now become permanent and are stored in a
non-volatile memory. The effects of the transaction, thus, are never lost.

The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of a database in a way such that each
transaction is a group of operations that acts a single unit, produces consistent results, acts in isolation from other operations and
updates that it makes are durably stored.

This article is contributed by Avneet Kaur. If you like GeeksforGeeks and would like to contribute, you can also write an article
using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the
GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

What are different types of Data Models available for expressing data in DBMS?
PDF file

Explain what DDL is with suitable example.


Data Definition language(DDL) in DBMS with Examples: Data Definition Language can be defined as a
standard for commands through which data structures are defined. It is a computer language that used for
creating and modifying the structure of the database objects, such as schemas, tables, views, indexes, etc.
Additionally, it assists in storing the metadata details in the database.
Some of the common Data Definition Language commands are:

● CREATE
● ALTER
● DROP

1. CREATE- Data Definition language(DDL)


The main use of the create command is to build a new table and it comes with a predefined syntax. It creates
a component in a relational database management system. There are many implementations that extend the
syntax of the command to create the additional elements, like user profiles and indexes.
The general syntax for the create command in Data Definition Language is mentioned below:
CREATE TABLE tablename (Column1 DATATYPE, Column2 DATATYPE, Column3 DATATYPE, …….. ColumnN
DATATYPE)
For Example

CREATE TABLE PUPIL (PUPIL_ID CHAR (10), STUDENT_Name Char (10);

Pupil Table with his ID and name is created by the DDL statement

Generally, the data types often used consists of strings and dates while creating a table. Every system varies
in how to specify the data type.

1. ALTER- Data Definition language(DDL)

An existing database object can be modified by the ALTER statement. Using this command, the users can
add up some additional column and drop existing columns. Additionally, the data type of columns involved
in a database table can be changed by the ALTER command.
The general syntax of the ALTER command is mentioned below:
ALTER TABLE table_name ADD column_name (for adding a new column)

ALTER TABLE table_name RENAME To new_table_name (for renaming a table)

ALTER TABLE table_name MODIFY column_name data type (for modifying a column)

ALTER TABLE table_name DROP COLUMN column_name (for deleting a column)

For Example
Add column to the pupil table
ALTER TABLE PUPIL ADD PHONE NUMBER varchar 97

Before Adding Column


Pupil ID PUPIL_Name

97 Albert

98 Sameer

After Adding Column


PUPIL_ID STUDENT_NAME MOBILE NUMBER

97 ALBERT

98 SAMEER

1. Drop- Data Definition language(DDL)


By the use of this command, the users can delete an index, table or view. A component from a relational
database management system can be removed by a DROP statement in SQL. There are many systems that
allow the DROP and some other Data Definition Language commands for occurring inside a transaction and
then it can be rolled back.
The object will not be available for use once the DROP statement executed
The General syntax of the Drop command is mentioned below:
DROP TABLE table_name;

DROP DATABASE database_name;

DROP TABLE Student;


DROP TABLE index_name;

1. Truncate- Data Definition language(DDL)

By using the Truncate command, the users can remove the table content, but the structure of the table is
kept. In simple language, it removes all the records from the table structure. The users can’t remove data
partially through this command. In addition to this, every space allocated for the data is removed by
Truncate command.
The syntax of the Truncate command is mentioned below:
TRUNCATE TABLE table_name;

TRUNCATE TABLE Student;

Explain what DML is with suitable example.


Explain Data Manipulation Language (DML) with Examples in DBMS.: Data Manipulation Language
(DML) can be defined as a set of syntax elements that are used to manage the data in the database. The
commands of DML are not auto-committed and modification made by them are not permanent to the
database. It is a computer programming language that is used to perform select, insert, delete and update
data in a database. The user requests are assisted by Data Manipulation Language. This language is
responsible for all forms of data modification in a database.

Types of Data Manipulation Languages

1. Procedural Programming
In this type, the user will specify what data is required and how to get it.

1. Declarative Programming
Here, the user will only specify what data is required.

Commands
The DML section of SQL consists of following a set of commands:

● Select/ From/ Where


● Insert/ Into/ Values
● Update/ Set/ Where
● Delete/ From/ Where
The database programmers and users are allowed by this basic to enter the data and information into the
database and then retrieve it by the use of several numbers of filter options.

1. Select/ From/ Where


● Select
It is one of the basic query commands available in SQL and works in the same way as the projection
operation of relational algebra. The attributes are selected by this command on the basis of the condition
defined by the Where Clause.

● From
A relation name is taken by this clause as an argument from where attributes are to be projected or selected.

● Where
The predictions or conditions that should match for qualifying the attributes to be projected are defined by
this clause.
For Example
Select author_name
From book_set
Where age> 40
The names of the authors will be yielded by the command from the relation book_set whose age is greater
than 40.

1. Insert/ Into/ Values


Using this command, the programmers can insert values into the rows of a table. The general syntax for this
command is mentioned below:

Insert into Table_Name


(column1, column2, column3….. columnN) Values ( value1, value2, value3….. valueN)

For Example
Consider a table Pupil with the following fields.

Insert
into Student Values( 78,’Nicole’, 8)
The above command will insert a record into Pupil table

1. Update/ Set/ Where


The values of columns in a table can be modified or updated by using this command. The general syntax for
this command is mentioned below:

Update table name


Set column_name1 = value1
column_name2= value2
column_name3= value3
…..
Column_nameN= valueN,
(Where Condition)
For example
UPDATE tutorialspoint SET Author= “webmaster” Where Author= “anonymous”

1. Delete/ From/ Where


The programmers can remove one or more rows from a table by using this command. The general syntax of
this command is mentioned below:
DELETE FROM table_name (Where Condition);
For Example
Consider the following Pupil Table

Delete
from Pupil where P_id=80
The above-mentioned command will delete the record where P_id is 80 from Pupil Table.

So it
was all about DML (Data Manipulation Command), If you have any query then please comment below.

Explain significance of Database Design. Discuss Logical and Physical types of Database Design.

What are Design approaches in DBMS?

Database Approach
In order to remove all limitations of the File Based Approach, a new approach was required that must be
more effective known as Database approach
The Database is a shared collection of logically related data, designed to meet the information needs of an
organization. A database is a computer based record keeping system whose overall purpose is to record and
maintains information. The database is a single, large repository of data, which can be used simultaneously
by many departments and users. Instead of disconnected files with redundant data, all data items are
integrated with a minimum amount of duplication.
The database is no longer owned by one department but is a shared corporate resource. The database holds
not only the organization's operational data but also a description of this data. For this reason, a database is
also defined as a self-describing collection of integrated records. The description of the data is known as the
Data Dictionary or Meta Data (the 'data about data'). It is the self-describing nature of a database that
provides program-data independence.
A database implies separation of physical storage from use of the data by an application program to achieve
program/data independence. Using a database system, the user or programmer or application specialist need
not know the details of how the data are stored and such details are "transparent to the user". Changes (or
updating) can be made to data without affecting other components of the system. These changes include, for
example, change of data format or file structure or relocation from one device to another.
In the DBMS approach, application program written in some programming language like Java, Visual
Basic.Net, and Developer 2000 etc. uses database connectivity to access the database stored in the disk with
the help of operating system's file management system.

Comment on Entity relationship Model with the help of suitable example.Explain Entity, Relationship, its
attributes, Roles, Cardinality constraints, Total and Partial Participation, weak entity sets with good
example.

Database Management System | ER Model


ER Model is used to model the logical view of the system from data perspective which consists of these components:
Entity, Entity Type, Entity Set –

An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it may be an object with a
conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type and set of all entities is called as entity set. e.g.; E1 is an entity having Entity Type Student
and set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s):

Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB, Age, Address, Mobile_No are
the attributes which defines entity type Student. In ER diagram, attribute is represented by an oval.

Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key attribute.For example, Roll_No
will be unique for each student. In ER diagram, key attribute is represented by an oval with underlying lines.

Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For example, Address attribute of
student Entity type consists of Street, City, State, and Country. In ER diagram, composite attribute is represented by an
oval comprising of ovals.

Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more than one for
a given student). In ER diagram, multivalued attribute is represented by double oval.

Derived Attribute –
An attribute which can be derived from other attributes of the entity type is known as derived attribute. e.g.;
Age (can be derived from DOB). In ER diagram, derived attribute is represented by dashed oval.

The complete entity type Student with its attributes can be represented as:
Relationship Type and Relationship Set:

A relationship type represents the association between entity types. For example,‘Enrolled in’ is a relationship type that exists
between entity type Student and Course. In ER diagram, relationship type is represented by a diamond and connecting the entities
with lines.

A set of relationships of same type is known as relationship set. The following relationship set depicts S1 is enrolled in C2, S2 is
enrolled in C1 and S3 is enrolled in C3.

Degree of a relationship set:


The number of different entity sets participating in a relationship set is called as degree of a relationship set.
Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called as unary relationship.
For example, one person is married to only one person.

Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as binary relationship.For
example, Student is enrolled in Course.
n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called as n-ary relationship.

Cardinality:

The number of times an entity of an entity set participates in a relationship set is known as cardinality. Cardinality can be of
different types:

One to one – When each entity in each entity set can take part only once in the relationship, the cardinality is one to one. Let us
assume that a male can marry to one female and a female can marry to one male. So the relationship will be one to one.

Using Sets, it can be represented as:

Many to one – When entities in one entity set can take part only once in the relationship set and entities in other entity set
can take part more than once in the relationship set,cardinality is many to one. Let us assume that a student can take only one
course but one course can be taken by many students. So the cardinality will be n to 1. It means that for one course there can be n
students but for one student, there will be only one course.

Using Sets, it can be represented as:


In this case, each student is taking only 1 course but 1 course has been taken by many students.
Many to many – When entities in all entity sets can take part more than once in the relationship cardinality is
many to many. Let us assume that a student can take more than one course and one course can be taken by many
students. So the relationship will be many to many.

Using sets, it can be represented as:

In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3 and S4. So it is many to
many relationships.
Participation Constraint:

Participation Constraint is applied on the entity participating in the relationship set.

Total Participation – Each entity in the entity set must participate in the relationship. If each student must enroll in a
course, the participation of student will be total. Total participation is shown by double line in ER diagram.
Partial Participation – The entity in the entity set may or may NOT participate in the relationship. If some courses
are not enrolled by any of the student, the participation of course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course
Entity set having partial participation.

Using set, it can be represented as,

Every student in Student Entity set is participating in relationship but there exists a course C4 which is not taking part
in the relationship.
Weak Entity Type and Identifying Relationship:

As discussed before, an entity type has a key attribute which uniquely identifies each entity in the entity set. But there exists some
entity type for which key attribute can’t be defined. These are called Weak Entity type.

For example, A company may store the information of dependants (Parents, Children, Spouse) of an Employee. But the
dependents don’t have existence without the employee. So Dependent will be weak entity type and Employee will be Identifying
Entity type for Dependant.

A weak entity type is represented by a double rectangle. The participation of weak entity type is always total. The relationship
between weak entity type and its identifying strong entity type is called identifying relationship and it is represented by double
diamond.

Comment on Normalization Theory Model with the help of suitable example.


Introduction to Normalization
● Normalization is a process of organizing the data in the database.
● It is a systematic approach of decomposing tables to eliminate data redundancy.
● It was developed by E. F. Codd.
● Normalization is a multi-step process that puts the data into a tabular form by removing the duplicate
data from the relation tables.
● It is a step by step decomposition of complex records into simple records.
● It is also called as Canonical Synthesis.
● It is the technique of building database structures to store data.
Definition of Normalization
“Normalization is a process of designing a consistent database by minimizing redundancy and ensuring
data integrity through decomposition which is lossless.”
Features of Normalization
● Normalization avoids the data redundancy.
● It is a formal process of developing data structures.
● It promotes the data integrity.
● It ensures data dependencies make sense that means data is logically stored.
● It eliminates the undesirable characteristics like Insertion, Updation and Deletion Anomalies.

Types of Normalization
Following are the types of Normalization:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Fourth Normal Form
5. Fifth Normal Form
6. BCNF (Boyce – Codd Normal Form)
7. DKNF (Domain Key Normal Form)
1. First Normal Form (1NF)

● First Normal Form (1NF) is a simple form of Normalization.

● It simplifies each attribute in a relation.

● In 1NF, there should not be any repeating group of data.

● Each set of column must have a unique value.

● It contains atomic values because the table cannot hold multiple values.
Example: Employee Table

ECode Employee_Name Department_Name

1 ABC Sales, Production

2 PQR Human Resource

3 XYZ Quality Assurance, Marketing

Employee Table using 1NF

ECode Employee_Name Department_Name

1 ABC Sales

1 ABC Production

2 PQR Human Resource

3 XYZ Quality Assurance

3 XYZ Marketing

2. Second Normal Form (2NF)

● In 2NF, the table is required in 1NF.

● The main rule of 2NF is, 'No non-prime attribute is dependent on the proper subset of any candidate key
of the table.'

● An attribute which is not part of candidate key is known as non-prime attribute.

● Example : Employee Table using 1NF


ECode Employee_Name Employee_Age

1 ABC 38

1 ABC 38

2 PQR 38

3 XYZ 40

3 XYZ 40


● Candidate Key: ECode, Employee_Name

● Non prime attribute: Employee_Age

● The above table is in 1NF. Each attribute has atomic values. However, it is not in 2NF because non
prime attribute Employee_Age is dependent on ECode alone, which is a proper subset of candidate key.
This violates the rule for 2NF as the rule says 'No non-prime attribute is dependent on the proper subset
of any candidate key of the table'.

● 2NF (Second Normal Form) : Employee1 Table


ECode Employee_Age

1 38

2 38

3 40

● Employee2 Table
ECode Employee_Name

1 ABC

1 ABC

2 PQR

3 XYZ

3 XYZ

● Now, the above tables comply with the Second Normal Form (2NF).

3. Third Normal Form (3NF)

● Third Normal Form (3NF) is used to minimize the transitive redundancy.

● In 3NF, the table is required in 2NF.

● While using the 2NF table, there should not be any transitive partial dependency.

● 3NF reduces the duplication of data and also achieves the data integrity.

● Example : <Employee> Table


EId Ename DOB City State Zip

001 ABC 10/05/1990 Pune Maharashtra 411038

002 XYZ 11/05/1988 Mumbai Maharashtra 400007

● In the above <Employee> table, EId is a primary key but City, State depends upon Zip code.

● The dependency between Zip and other fields is called Transitive Dependency.
● Therefore we apply 3NF. So, we need to move the city and state to the new <Employee_Table2> table,
with Zip as a Primary key.

● <Employee_Table1> Table
EId Ename DOB Zip

001 ABC 10/05/1990 411038

002 XYZ 11/05/1988 400007

● <Employee_Table2> Table
City State Zip

Pune Maharashtra 411038

Mumbai Maharashtra 400007

● The advantage of removing transitive dependency is, it reduces the amount of data dependencies and
achieves the data integrity.

● In the above example, using with the 3NF, there is no redundancy of data while inserting the new
records.

● The City, State and Zip code will be stored in the separate table. And therefore the updation becomes
more easier because of no data redundancy.

4. BCNF (Boyce – Code Normal Form)

● BCNF which stands for Boyce – Code Normal From is developed by Raymond F. Boyce and E. F. Codd
in 1974.

● BCNF is a higher version of 3NF.

● It deals with the certain type of anomaly which is not handled by 3NF.

● A table complies with BCNF if it is in 3NF and any attribute is fully functionally dependent that is A →
B. (Attribute 'A' is determinant).

● If every determinant is a candidate key, then it is said to be BCNF.

● Candidate key has the ability to become a primary key. It is a column in a table.

● Example : <EmployeeMain> Table


Empid Ename DeptName DepType

E001 ABC Production D001

E002 XYZ Sales D002

● The functional dependencies are:


● Empid → EmpName

● DeptName → DeptType
● Candidate Key:
● Empid

● DeptName

● The above table is not in BCNF as neither Empid nor DeptName alone are keys.

● We can break the table in three tables to make it comply with BCNF.

● <Employee> Table
Empid EmpName

E001 ABC

E002 XYZ

● <Department> Table
DeptName DeptType

Production D001

Sales D002

● <Emp_Dept> Table
Empid DeptName

E001 Production

E002 Sales

● Now, the functional dependencies are:


● Empid → EmpName

● DeptName → DeptType

● Candidate Key:
● <Employee> Table : Empid

● <Department> Table : DeptType

● <Emp_Dept> Table : Empid, DeptType

● So, now both the functional dependencies left side part is a key, so it is in the BCNF.

5. Fourth Normal Form (4NF)


● Fourth Normal Form (4NF) does not have non-trivial multivalued dependencies other than a candidate
key.

● 4NF builds on the first three normal forms (1NF, 2NF and 3NF) and the BCNF.

● It does not contain more than one multivalued dependency.

● This normal form is rarely used outside of academic circles.


● For example : A table contains a list of three things that is 'Student', 'Teacher', 'Book'. Teacher is in
charge of Student and recommended book for each student. These three elements (Student, Teacher and
Book) are independent of one another. Changing the student's recommended book, for instance, has no
effect on the student itself. This is an example of multivalued dependency, where an item depends on
more than one value. In this example, the student depends on both teacher and book.

● Therefore, 4NF states that a table should not have more than one dependencies.

6. Fifth Normal Form (5NF)

● 5NF is also knows as Project-Join Normal Form (PJ/NF).

● It is designed for reducing the redundancy in relational databases.

● 5NF requires semantically related multiple relationships, which are rare.

● In 5NF, if an attribute is multivalued attribute, then it must be taken out as a separate entity.

● While performing 5NF, the table must be in 4NF.

7. DKNF (Domain Key Normal Form)

 DKNF stands for Domain Key Normal Form requires the database that contains no constraints other than
domain constraints and key constraints.

 In DKNF, it is easy to build a database.

 It avoids general constraints in the database which are not clear domain or key constraints.

 The 3NF, 4NF, 5NF and BCNF are special cases of the DKNF.

 It is achieved when every constraint on the relation is a logical consequence of the definition.

Explain Specialization, Generalization and Aggregation Extended ER features with suitable examples.
DBMS | ER Model: Generalization, Specialization and Aggregation

Prerequisite – Introduction of ER Model


Generalization, Specialization and Aggregation in ER model are used for data abstraction in which
abstraction mechanism is used to hide details of a set of objects.

Generalization –
Generalization is the process of extracting common properties from a set of entities and create a
generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher level entity if they have some attributes in common. For Example,
STUDENT and FACULTY can be generalized to a higher level entity called PERSON as shown in
Figure 1. In this case, common attributes like P_NAME, P_ADD become part of higher entity
(PERSON) and specialized attributes like S_FEE become part of specialized entity (STUDENT).
Specialization –
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-down
approach where higher level entity is specialized into two or more lower level entities. For Example,
EMPLOYEE entity in an Employee management system can be specialized into DEVELOPER,
TESTER etc. as shown in Figure 2. In this case, common attributes like E_NAME, E_SAL etc.
become part of higher entity (EMPLOYEE) and specialized attributes like TES_TYPE become part
of specialized entity (TESTER).

Aggregation –
An ER diagram is not capable of representing relationship between an entity and a relationship
which may be required in some scenarios. In those cases, a relationship with its corresponding
entities is aggregated into a higher level entity. For Example, Employee working for a project may
require some machinery. So, REQUIRE relationship is needed between relationship WORKS_FOR
and entity MACHINERY. Using aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into single entity and relationship REQUIRE is created
between aggregated entity and MACHINERY.

Explain significance of Super key, Candidate key, Primary Key and Foreign Key with example.
Various Keys in Database Management System (DBMS)

Various Keys in Database Management System (DBMS): For the clarity in DBMS, keys are preferred
and they are important part of the arrangement of a table. Keys make sure to uniquely identify a
table’s each part or record of a field or combination of fields. A database is made up of tables, which
(tables) are made up of records, which (records) further made up of fields. Let us take an example to
illustrate what are keys in database management system. This article is about different keys in
database management system (DBMS).

Example:

Keys In Database Management


In the above data item, each column is a field and each row is a record.

Types of Keys in Database Management System: Each key which has the parameter of uniqueness is as
follows:
1. Super key 7. Non- prime attribute
2. Candidate key 8. Foreign key
3. Primary key 9. Simple key
4. Composite key 10. Compound key
5. Secondary or Alternative key 11. Artificial key
6. Non- key attribute
The detailed explanation of all the mentioned keys is as follows:
1. Super key

Super Key is a set of properties within a table; it specially identifies each record in a table. Candidate
key is a unique case of super key.
● For example: Roll No. of a student is unique in relation. The set of properties like roll no., name,
class, age, sex, is a super key for the relation student.

1. Candidate keys

Candidate keys are the set of fields; primary key can be selected from these fields. A set of properties
or attributes acts as a primary key for a table. Every table must have at least one candidate key or several
candidate keys. It is a super key’s subset.

● Example:

● The above fields of a candidate key uniquely identify a student.


● It has the properties like – Being unique and Parameter of irreducibility.

1. Primary key

The candidate key which is very suitable to be the main key of table is a primary key.

● The primary keys are compulsory in every table.


● The properties of a primary key are:
○ Model stability
○ Occurrence of minimum fields
○ Defining value for every record i.e. being definitive
○ Feature of accessibility
● Example

1. Composite key
Composite Key has two or more properties which specially identifies the occurrence of an entity.
● Example:

● In the above example the customer identity and order identity has to combine to uniquely identify the
customer details.

1. Secondary or Alternative key


The rejected candidate keys as primary keys are called as secondary or alternative keys.

● Example:

1. Non-key Attribute
The attributes excluding the candidate keys are called as non-key attributes.

● Example:

1. Non-prime Attribute

Excluding primary attributes in a table are non-prime attributes.

● Example:
Non prime attributes

1. Foreign key

Generally foreign key is a primary key from one table, which has a relationship with another table.

● Example:

1. Simple key

Simple keys have a single field to specially recognize a record. The single field cannot be divided
into more fields. Primary key is also a simple key.

● Example: In the below example student id is a single field because no other student will have same
Id. Therefore, it is a simple key.

1. Compound key
Compound key has many fields to uniquely recognize a record. Compound key is different from
composite key because any part of this key can be foreign key but in composite key its part may or may not
be a foreign key.

1. Surrogate/Artificial key

Surrogate key is artificially generated key and its main purpose it to be the primary key of table.
Artificial keys do not have meaning to the table.There are few properties of surrogate or artificial keys.
They are unique because these just created when you don’t have any natural primary key.
They are integer values.
One cannot find the meaning of surrogate keys in the table.
End users cannot surrogate key.
Surrogate keys are allowed when

● No property has the parameter of primary key.


● The primary key is huge and complex.
Example: Table which has the details of the student has primary key but it is large and complex. The
addition of row id column to it is the DBA’s decision, where the primary key is row id.

12. Natural /Domain/ Business Key


It is a key that is naturally declared as the Primary key. Natural keys are sometimes called as business or
domain keys because these key are based on the real world observation. So it is a key whose attributes or
values exists in the real world. These attributes have logical relationship with the table.
For Example: Social Security Number (SSN) is a natural key that can be declared as the primary key

Вам также может понравиться