August 2010 Master of Computer Application (MCA) - Semester 4 MC0077 - Advanced Database Systems - 4 Credits

August 2010
Master of Computer Application (MCA) – Semester 4

MC0077 – Advanced Database Systems– 4 Credits
(Book ID: B0882)
Assignment Set – 2 (60 Marks)
Q.1 Explain the theory of database internals.

ANS:-
A DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a
database. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from
an application program and instructs the operating system to transfer the appropriate data. The queries and responses
must be submitted and received according to a format that conforms to one or more applicable protocols. When a
DBMS is used, information systems can be changed more easily as the organization's information requirements change.
New categories of data can be added to the database without disruption to the existing system.It also manages the data
and the information related to the data
Database servers are dedicated computers that hold the actual databases and run only the DBMS and related software.
Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable
storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in
large volume transaction processing environments. DBMSs are found at the heart of most database applications.
DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs
typically rely on a standard operating system to provide these functions.
Chandra Shekhar Singh MCA 4th Sem Roll No.

511021227
August 2010
(Book ID: B0882)
Q.2 Describe the following with respect to Query processing:

A) Query Optimizer B) Query Plan
C) Implementation
 Optimizer Operations
A SQL statement can be executed in many different ways, such as full table scans, index
scans, nested loops, and hash joins. The query optimizer determines the most efficient way to
execute a SQL statement after considering many factors related to the objects referenced and
the conditions specified in the query. This determination is an important step in the processing
of any SQL statement and can greatly affect execution time.
Note:
The optimizer might not make the same decisions from one version of Oracle Database to the
next. In recent versions, the optimizer might make different decisions, because better
information is available.
The output from the optimizer is an execution plan that describes an optimum method of
execution. The plans shows the combination of the steps Oracle Database uses to execute a
SQL statement. Each step either retrieves rows of data physically from the database or
prepares them in some way for the user issuing the statement.
For any SQL statement processed by Oracle, the optimizer performs the operations listed in
Table 11-1.
Table 11-1 Optimizer Operations
Operation Description
2

511021227
August 2010
(Book ID: B0882)
Operation Description
Statement For complex statements involving, for example, correlated subqueries or
transformation views, the optimizer might transform the original statement into an
equivalent join statement.
Choice of optimizer The optimizer determines the goal of optimization. See "Choosing an
goals Optimizer Goal".
Choice of access For each table accessed by the statement, the optimizer chooses one or
paths more of the available access paths to obtain table data. See
"Understanding Access Paths for the Query Optimizer".
Choice of join For a join statement that joins more than two tables, the optimizer
orders chooses which pair of tables is joined first, and then which table is joined
to the result, and so on. See "How the Query Optimizer Chooses Execution
Plans for Joins".
 Queary Plan
A query plan (or query execution plan) is an ordered set of steps used to access or modify information in a SQL
relational database management system. This is a specific case of the relational model concept of access plans.
Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with
widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the
different, correct possible plans for executing the query and returns what it considers the best alternative. Because
query optimizers are imperfect, database users and administrators sometimes need to manually examine and tune
the plans produced by the optimizer to get better performance.

511021227
August 2010
(Book ID: B0882)
 Implementation
Implementation is the carrying out, execution, or practice of a plan, a method, or any design for doing something.
As such, implementation is the action that must follow any preliminary thinking in order for something to actually
happen. In an information technology context, implementation encompasses all the processes involved in getting
new software or hardware operating properly in its environment, including installation, configuration, running,
testing, and making necessary changes. The word deployment is sometimes used to mean the same thing.
Q.3. Explain the following with respect to Heuristics of Query Optimizations:

A) Equivalence of Expressions B) Selection Operation
C) Projection Operation D) Natural Join Operation
 Equivalent expressions
We often want to replace a complicated expression with a simpler one that means the same thing. For example, the
expression x + 4 + 2 obviously means the same thing as x + 6, since 4 + 2 = 6. More interestingly, the expression x +
x + 4 means the same thing as 2x + 4, because 2x is x + x when you think of multiplication as repeated addition. (Which
of these is simpler depends on your point of view, but usually 2x + 4 is more convenient in Algebra.)
Two algebraic expressions are equivalent if they always lead to the same result when you evaluate them, no matter
what values you substitute for the variables. For example, if you substitute x := 3 in x + x + 4, then you get 3 + 3 + 4,
which works out to 10; and if you substitute it in 2x + 4, then you get 2(3) + 4, which also works out to 10. There's
nothing special about 3 here; the same thing would happen no matter what value we used, so x + x + 4 is equivalent to
2x + 4. (That's really what I meant when I said that they mean the same thing.)

511021227
August 2010
(Book ID: B0882)
When I say that you get the same result, this includes the possibility that the result is undefined. For example, 1/x + 1/x
is equivalent to 2/x; even when you substitute x := 0, they both come out the same (in this case, undefined). In contrast,
x2/x is not equivalent to x; they usually come out the same, but they are different when x := 0. (Then x2/x is undefined,
but x is 0.) To deal with this situation, there is a sort of trick you can play, forcing the second expression to be
undefined in certain cases. Just add the words ‘for x ≠ 0’ at the end of the expression to make a new expression; then
the new expression is undefined unless x ≠ 0. (You can put any other condition you like in place of x ≠ 0, whatever is
appropriate in a given situation.) So x2/x is equivalent to x for x ≠ 0.
To symbolise equivalent expressions, people often simply use an equals sign. For example, they might say ‘x + x + 4 =
2x + 4’. The idea is that this is a statement that is always true, no matter what x is. However, it isn't really correct to
write ‘1/x + 1/x = 2/x’ to indicate an equivalence of expressions, because this statement is not correct when x := 0. So
instead, I will use the symbol ‘≡’, which you can read ‘is equivalent to’ (instead of ‘is equal to’ for ‘=’). So I'll say, for
example,
• x + x + 4 ≡ 2x + 4,
• 1/x + 1/x ≡ 2/x, and
• x2/x ≡ x for x ≠ 0.
The textbook, however, just uses ‘=’ for everything, so you can too, if you want.
 Selection Operation
1. Consider the query to find the assets and branch-names of all banks who have depositors living in Port Chester.
In relational algebra, this is
2.
3. (CUSTOMER DEPOSIT BRANCH))
4.
o This expression constructs a huge relation,
o CUSTOMER DEPOSIT BRANCH
5

511021227
August 2010
(Book ID: B0882)
of which we are only interested in a few tuples.
o We also are only interested in two attributes of this relation.

o We can see that we only want tuples for which CCITY = ``PORT CHESTER''.
o Thus we can rewrite our query as:
o
o DEPOSIT BRANCH)
o
o This should considerably reduce the size of the intermediate relation.
 Projection Operation
1. Like selection, projection reduces the size of relations.
It is advantageous to apply projections early. Consider this form of our example query:
2. When we compute the subexpression

3.
we obtain a relation whose scheme is

(CNAME, CCITY, BNAME, ACCOUNT#, BALANCE)
4. We can eliminate several attributes from this scheme. The only ones we need to retain are those that
o appear in the result of the query or
6

511021227
August 2010
(Book ID: B0882)
o are needed to process subsequent operations.

5. By eliminating unneeded attributes, we reduce the number of columns of the intermediate result, and thus its
size.
6. In our example, the only attribute we need is BNAME (to join with BRANCH). So we can rewrite our expression
as:
7.
8.
9.
10. Note that there is no advantage in doing an early project on a relation before it is needed for some other
operation:
o We would access every block for the relation to remove attributes.
o Then we access every block of the reduced-size relation when it is actually needed.
o We do more work in total, rather than less!
 Natural Join Operation

1. Another way to reduce the size of temporary results is to choose an optimal ordering of the join operations.
2. Natural join is associative:
3.
4. Although these expressions are equivalent, the costs of computing them may differ.
o Look again at our expression
o
7

511021227
August 2010
(Book ID: B0882)
o we see that we can compute DEPOSIT BRANCH first and then join with the first part.
o However, DEPOSIT BRANCH is likely to be a large relation as it contains one tuple for every account.
o The other part,
o
is probably a small relation (comparatively).
o So, if we compute
o
first, we get a reasonably small relation.
o It has one tuple for each account held by a resident of Port Chester.
o This temporary relation is much smaller than DEPOSIT BRANCH.
5. Natural join is commutative:
6.
o Thus we could rewrite our relational algebra expression as:
o
o
o But there are no common attributes between CUSTOMER and BRANCH, so this is a Cartesian product.
o Lots of tuples!
o If a user entered this expression, we would want to use the associativity and commutativity of natural
join to transform this into the more efficient expression we have derived earlier (join with DEPOSIT first,
then with BRANCH).

511021227
August 2010
(Book ID: B0882)
Q.4 Explain the following:

A) Data Management Functions B) Database Design & Creation
 Data Management Functions

The Administer Database: responsible for maintaining the integrity of the Data Management database
Perform Queries: receives a query request from Access and Dissemination and executes the query to generate a result
set} that is transmitted to the requester
The Generate Report: receives a report request and executes any queries or other processes necessary to generate the
report then supplies the report to the requester
Receive Database Update: adds, modifies or deletes information in the Data Management persistent storage
Activate Request: maintains a record of subscription requests and periodically compares it to the contents of the archive
to determine if all needed data is available. If needed data is available, this function generates a Dissemination Request
which is sent to the Access. This function can also generate Dissemination Requests on a periodic basis
 Database Design & Creation

This project will explore, among other things, getting data into and out of a SQL Database. In fact, we’ll explore a
number of ways of doing so. To get started, however, we need to create a database that is sufficient for the project and
sufficiently illustrative of working with relational databases, without being overly complex.
Fortunately, the project requirements are of about the right complexity. Returning to the spec, (such as it is) we can see
that we need to capture the following information each time an entry is made in a participating blog:
The name of the blog
The title of the entry
The URL of the entry
Info about the blogger (first name, last name, email & phone)
It would also be good to know when the blog entry was first created and when it was last modified.
Flat or Normalized?
You certainly could create a flat file that has all the information we need:

511021227
August 2010
(Book ID: B0882)
The advantage to this approach is that it is quick, easy, simple to understand. Putting in a few records, however, quickly
reveals why this kind of flat-file database is now reserved for non-programmers creating simple projects such as an
inventory of their music:
Even with just a few records, you can see that a great deal of data is duplicated in every record,
This duplication can make the database very hard to maintain and subject to corruption. To prevent this, databases are
‘normalized’ – a complex subject that boils down to eliminating the duplication by factoring out common elements into
separate tables and then reaching into those tables by using unique values (foreign keys). Thus, without bogging
ourselves down in db theory, we can quickly redesign this flat database into three tables:
10

511021227
August 2010
(Book ID: B0882)
In this normalized database, each entry has only the information that is unique to the particular BlogEntry:
Title
URL
Date Created
Date Modified
Short Description
The entry also has the ID of the Blogger who wrote the entry and the ID of the Blog that the entry belongs to.
A second table holds the information about each Blog, and a third table holds the information about each Blogger.
Thus, a given Bloggers first and last name, alias (email address at Microsoft.com) and phone number are entered only
ONCE for each blogger, and referenced in each entry by ID (known as a foreign key).
The diagram also shows that the two foreign-key relationships are named
FK_Blog_Entries_Blogs
11

511021227
August 2010
(Book ID: B0882)
FK_BlogEntries_Bloggers
These simple names indicate that there is a foreign key relationship from BlogEntries to Blogs and another from
BlogEntries to Bloggers. The data is now much cleaner:
Notice that the BlogEntries table now does not duplicate the first and last name and phone number for each Blogger,
but rather just refers to that Blogger’s ID. For example, entry #3
was written by Blogger 4 (see ID in the middle image) and was placed in Blog 3 (see ID in the top image)
Creating A Join Query (T-SQL)
You can easily write a query that recreates the complete set of information by “joining” the tables:
12

511021227
August 2010
(Book ID: B0882)
select BlogName, FirstName, LastName, Title, Description from BlogEntries be

join Bloggers bb on be.Blogger = bb.ID
join Blogs b on be.Blog = b.ID
This statement says to select the listed fields (it isn’t necessary to identify the table explicitly as there is no duplication).
It then tells SqlServer where to find them in the “from” clause. The first table is BlogEntries which we assign a short-
name of be.
We then we join to that tables the Bloggers table, but we match up the rows in BlogEntries with the Rows in Blogger
where the value in BlogEtries.Blogger is equal to the ID in the table Bloggers.
You can imagine that SQL Server thus looks at the two tables and joins the matching rows, making a new (temporary)
wider table:
It then looks at the Blog table and joins the Blog value to the ID in THAT table:
Giving us the ability to create a virtual flat table without the duplicated values.
Novice Previous: Forms
Advanced Previous: Two Levels
Q.5 Describe the Structural Semantic Data Model (SSM) with relevant examples.
ANS.
Modelling Complex and Multimedia Data
13

511021227
August 2010
(Book ID: B0882)
Data modelling addresses a need in information system analysis and design to develop a model of the information
requirements as well as a set of viable database structure proposals. The data modelling process consists of:
1. Identifying and describing the information requirements for an information system,

2. Specifying the data to be maintained by the data management system, and
3. Specifying the data structures to be used for data storage that best support the information requirements.
A fundamental tool used in this process is the data model, which is used both for specification of the information
requirements at the user level and for specification of the data structure for the database. During implementation of a
database, the data model guides construction of the schema or data catalog which contains the metadata that describe
the DB structure and data semantics that are used to support database implementation and data retrieval.
Data modelling, using a specific data model type, and as a unique activity during information system design, is
commonly attributed to Charles Bachman (1969) who presented the Data Structure Diagram as one of the first, widely
used data models for network database design. Several alternative data model types were proposed shortly thereafter,
the best known of which are the:
• Relational model (Codd, 1970) and the

• Entity-relationship, ER, model (Chen, 1976).
The relational model was quickly criticized for being 'flat' in the sense that all information is represented as a set of
tables with atomic cell values. The definition of well-formed relational models requires that complex attribute types
(hierarchic, composite, multi-valued, and derived) be converted to atomic attributes and that relations be normalized.
Inter-entity (inter-relation) relationships are difficult to visualize in the resulting set of relations, making control of the
completeness and correctness of the model difficult. The relational model maps easily to the physical characteristics of
electronic storage media, and as such, is a good tool for design of the physical database.
The entity-relationship approach to modelling, proposed by Chen (1976), had two primary objectives: first to visualize
inter-entity relationships and second to separate the DB design process into two phases:
14

511021227
August 2010
(Book ID: B0882)
1. Record, in an ER model, the entities and inter-entity relationships required "by the enterprise", i.e. by the
owner/user of the information system or application. This phase and its resulting model should be independent
of the DBMS tool that is to be used for realizing the DB.
2. Translate the ER model to the data model supported by the DBMS to be used for implementation.
This two-phase design supports modification at the physical level without requiring changes to the enterprise or user
view of the DB content.
Also Chen's ER model quickly came under criticism, particularly for its lack of ability to model classification
structures. In 1977, (Smith & Smith) presented a method for modelling generalization and aggregation hierarchies that
underlie the many extended/enhanced entity-relationship, EER, model types proposed and in use today.
Q. 6 . Explain:
A) Data Dredging B) Data Mining Techniques
 Data Dredging
definition -
Data dredging, sometimes referred to as "data fishing" is a data mining practice in which large volumes of data are
analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a
hypothesis and follows with an examination of the data. Sometimes conducted for unethical purposes, data dredging
often circumvents traditional data mining techniques and may lead to premature conclusions. Data dredging is
sometimes described as "seeking more information from a data set than it actually contains."
15

511021227
August 2010
(Book ID: B0882)
Data dredging sometimes results in relationships between variables announced as significant when, in fact, the data
require more study before such an association can legitimately be determined. Many variables may be related through
chance alone; others may be related through some unknown factor. To make a valid assessment of the relationship
between any two variables, further study is required in which isolated variables are contrasted with a control group.
Data dredging is sometimes used to present an unexamined concurrence of variables as if they led to a valid conclusion,
prior to any such study.
Although data dredging is often used improperly, it can be a useful means of finding surprising relationships that might
not otherwise have been discovered. However, because the concurrence of variables does not constitute information
about their relationship (which could, after all, be merely coincidental), further analysis is required to yield any useful
conclusions.
 Data Mining Techniques
Data mining is sorting through data to identify patterns and establish relationships.
Data mining parameters include:
• Association - looking for patterns where one event is connected to another event
• Sequence or path analysis - looking for patterns where one event leads to another later event
• Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok)
• Clustering - finding and visually documenting groups of facts not previously known
• Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of
data mining is known as predictive analytics.)
Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics and marketing.
Web mining, a type of data mining used in customer relationship management (CRM), takes advantage of the huge
amount of information gathered by a Web site to look for patterns in user behavior.
16

511021227
August 2010
(Book ID: B0882)
17

511021227

August 2010 Master of Computer Application (MCA) - Semester 4 MC0077 - Advanced Database Systems - 4 Credits

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

August 2010 Master of Computer Application (MCA) - Semester 4 MC0077 - Advanced Database Systems - 4 Credits

Загружено:

Авторское право:

Доступные форматы

August 2010

Master of Computer Application (MCA) – Semester 4

Q.1 Explain the theory of database internals.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Q.2 Describe the following with respect to Query processing:

Table 11-1 Optimizer Operations

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Q.3. Explain the following with respect to Heuristics of Query Optimizations:

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

of which we are only interested in a few tuples.

o We also are only interested in two attributes of this relation.

2. When we compute the subexpression

we obtain a relation whose scheme is

Chandra Shekhar Singh MCA 4th Sem Roll No.

o are needed to process subsequent operations.

 Natural Join Operation

Chandra Shekhar Singh MCA 4th Sem Roll No.

is probably a small relation (comparatively).

first, we get a reasonably small relation.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Q.4 Explain the following:

 Data Management Functions

 Database Design & Creation

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

select BlogName, FirstName, LastName, Title, Description from BlogEntries be

Chandra Shekhar Singh MCA 4th Sem Roll No.

1. Identifying and describing the information requirements for an information system,

• Relational model (Codd, 1970) and the

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

 Data Mining Techniques

Data mining parameters include:

Chandra Shekhar Singh MCA 4th Sem Roll No.

Chandra Shekhar Singh MCA 4th Sem Roll No.

Вам также может понравиться