Вы находитесь на странице: 1из 30

SSEC\MCA\DBMS Question Bank\2010-2013 Batch

Two Marks
Unit III
1. What do you mean by an index?
An index can be viewed as a collection of data entries, with an efficient way to locate all data
entries with search key value k. Each such data entry, k∗, contains enough information to
enable us to retrieve (one or more) data records with search key value k.

2. What are the different ways in which an entry can be made in an index?
A data entry k* allows us to retrieve one or more data records with key value k. The three main
alternatives:
• A data entry k* is an actual data record (with search key value k).
• A data entry is a <k, rid> pair, where rid is the record id of a data record with
search key value k.
• A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data
records with search key value k.

3. Differentiate between clustered and unclustered index.


When a file is organized so that the ordering of data records is the same as or close to the
ordering of data entries in some index, the index is clustered. An index that is not clustered is
called an unclustered index;

4. Differentiate between dense and sparse index.

An index is said to be dense if it contains (at least) one data entry for every search key value
that appears in a record in the indexed file.3 A sparse index contains one entry for each page
of records in the data file.

5. What do you mean by fully inverted and inverted file?


A data file is said to be inverted on a field if there is a dense secondary index on this field. A
fully inverted file is one in which there is a dense secondary index on each field that does not
appear in the primary key.

6. Define primary and secondary indices.


An index on a set of fields that includes the primary key is called a primary index. An index
that is not a primary index is called a secondary index. A primary index is guaranteed not to
contain duplicates, but an index on other (collections of) fields can contain duplicates. Thus, in
general, a secondary index contains duplicates.

7. What do you mean by a composite search key or concatenated keys?


The search key for an index can contain several fields; such keys are called composite search
keys or concatenated keys. As an example, consider a collection of employee records, with
fields name, age, and sal, stored in sorted order by name. Example of composite key would be
a composite index with key <age, sal>, a composite index with key <sal, age>, an index with
key age, and an index with key sal.

V. R. Kanagavalli Page 1
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

8. Differentiated between a range query and an equality query?

If the search key is composite, an equality query is one in which each field in the search key
is bound to a constant. For example, retrieving all data entries with age = 20 and sal = 10. The
hashed file organization supports only equality queries, since a hash function identifies the
bucket containing desired records only if a value is specified for each field in the search key.
A range query is one in which not all fields in the search key are bound to constants.
An example of a range query retrieving all data entries with age < 30 and sal > 40.

9. What is the advantage of tree structured indexes?


Tree-structured indexes are ideal for range selections, and also support equality selections quite
efficiently.

10. Define ISAM. What is the disadvantage of ISAM?

ISAM is a static tree-structured index in which only leaf pages are modified by inserts and
deletes. If a leaf page is full, an overflow page is added. Unless the size of the dataset and the
data distribution remain approximately the same, overflow chains could become long and
degrade performance.

11. Define a B+ Tree and its order

A B+ tree is a dynamic, height-balanced index structure that adapts gracefully to changing data
characteristics. Each node except the root has between d and 2d entries. The number d is called
the order of the tree. Each non-leaf node withm index entries has m+1 children pointers. The
leaf nodes contain data entries. Leaf pages are chained in a doubly linked list.

12. How the B+ Tree does handles insertion and deletion of data in it?
During insertion, nodes that are full are split to avoid overflow pages. Thus, an insertion might
increase the height of the tree.
During deletion, a node might go below the minimum occupancy threshold. In this case, the
entries can be either redistributed from adjacent siblings, or the node can be merged with a
sibling node. A deletion might decrease the height of the tree.

13. What is the purpose of key compression in B+ Tree?


Key compression is the technique used in B+ Trees search key values in index nodes are
shortened to ensure a high fan-out.

14. List out some of the characteristics of a B+ Tree.

• Operations (insert, delete) on the tree keep it balanced.


• A minimum occupancy of 50 percent is guaranteed for each node.
• Searching for a record requires just a traversal from the root to the appropriate leaf.

V. R. Kanagavalli Page 2
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

15. What do you mean by the height of the B+ Tree?


The length of a path from the root to any leaf (because the B+ tree is balanced) is referred to as
the height of the tree.

16. Define a B+ Tree.


The B+ tree search structure is a balanced tree in which the internal nodes direct the search and
the leaf nodes contain the data entries. In order to retrieve all leaf pages efficiently, they are
linked using page pointers. The leaf pages can be traversed in either direction as they are
organized as a doubly linked list.

17. Give the format of an index page.

Where the Pi are pointers to the data entries that is having values in the range of Ki and Ki+1
and P0 points to the data entries that has key value less than K1 and Pm points to the data
entries that has key values greater than Km.

18. Give the format of a one level index structure.

19. Give the format of an ISAM Index structure.

V. R. Kanagavalli Page 3
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

20. Explain the page allocation in ISAM

V. R. Kanagavalli Page 4
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

21. What is the drawback of ISAM


ISAM is a static structure and suffers from the problem that long overflow chains can develop
as the file grows, leading to poor performance.

22. Explain hash based indexes.


Hash-based indexes are designed for equality queries. A hashing function is applied to a search
field value and returns a bucket number. The bucket number corresponds to a page on disk that
contains all possibly relevant records.

23. Explain static hashing technique.


A Static Hashing index has a fixed number of primary buckets. During insertion, if the
primary bucket for a data entry is full, an overflow page is allocated and linked to the primary
bucket. The list of overflow pages at a bucket is called its overflow chain. Static Hashing can
answer equality queries with a single disk I/O, in the absence of overflow chains. As the file
grows, however, Static Hashing suffers from long overflow chains and performance
deteriorates.

24. Define dynamic hashing. /Define Extendible hashing


Extendible Hashing is a dynamic index structure that extends Static Hashing by introducing a
level of indirection in the form of a directory. Usually the size of the directory is 2d for some
d, which is called the global depth of the index. The correct directory entry is found by looking
at the first d bits of the result of the hashing function. The directory entry points to the page on
disk with the actual data entries. If a page is full and a new data entry falls into that page, data
entries from the full page are redistributed according to the first l bits of the hashed values. The
value l is called the local depth of the page.

25. What do you mean by skewed data and collision in hashing? (Or) What are the drawbacks in
hashing technique?
If the data is not distributed normally over the available domain for the data, then the data is
said to be skewed. Collisions are data entries with the same hash value.

26. Explain the linear hashing.

Linear Hashing avoids a directory by splitting the buckets in a round-robin fashion. Linear
Hashing proceeds in rounds. At the beginning of each round there is an initial set of buckets.
Insertions can trigger bucket splits, but buckets are split sequentially in order. Overflow pages
are required, but overflow chains are unlikely to be long because each bucket will be split at
some point.

27. Compare Extendible hashing and linear hashing.


Extendible and Linear Hashing are closely related. Linear Hashing avoids a directory structure
by having a predefined order of buckets to split. The disadvantage of Linear Hashing relative
to Extendible Hashing is that space utilization could be lower, especially for skewed
V. R. Kanagavalli Page 5
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

distributions, because the bucket splits are not concentrated where the data density is highest,
as they are in Extendible Hashing. A directory-based implementation of Linear Hashing can
improve space occupancy, but it is still likely to be inferior to Extendible Hashing in extreme
cases.

V. R. Kanagavalli Page 6
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

Unit IV

28. What do you mean by an external sorting algorithm.


An external sorting algorithm sorts a file of arbitrary length using only a limited amount of
main memory.

29. Briefly explain the two-way merge sort algorithm.


The two-way merge sort algorithm is an external sorting algorithm that uses only three buffer
pages at any time. Initially, the file is broken into small sorted files called runs of the size of
one page. The algorithm then proceeds in passes. In each pass, runs are paired and merged into
sorted runs twice the size of the input runs. In the last pass, the merge of two runs results in a
sorted instance of the file. The number of passes is _log2N_+1, where N is the number of pages
in the file.

30. Briefly explain the external merge sort algorithm.


The external merge sort algorithm improves upon the two-way merge sort if there are B > 3
buffer pages available for sorting. The algorithm writes initial runs of B pages each instead of
only one page. In addition, the algorithm merges B−1 runs instead of two runs during the
merge step. The number of passes is reduced to , where N1 = The
average length of the initial runs can be increased to 2 *B pages, reducing N1 to N1 =

31. What is the advantage of blocked I/O?


In blocked I/O several consecutive pages (called a buffer block) are read/ written through a
single request. Blocked I/O is usually much cheaper than reading or writing the same number
of pages through independent I/O requests.

32. What do you mean by double buffering?


In double buffering, each buffer is duplicated. While the CPU processes tuples in one buffer,
an I/O request for the other buffer is issued.

33. Differentiate external sorting and clustered B+ tree index.


If the file to be sorted has a clustered B+ tree index with a search key equal to the fields to be
sorted by, then we can simply scan the sequence set and retrieve the records in sorted order.
This technique is clearly superior to using an external sorting algorithm. If the index is
unclustered, an external sorting algorithm will almost certainly be cheaper than using the
index.

34. What is the need for sorting records? (or) explain the advantage of sorting the records.
• Users may want answers in some order; for example, by increasing age.
• Sorting records is the first step in bulk loading a tree index.
• Sorting is useful for eliminating duplicate copies in a collection of records.
• A widely used algorithm for performing a very important relational algebra operation,
• called join, requires a sorting step

35. What do you mean by a run?


Each sorted subfile is called as a run in a external merge sort.
V. R. Kanagavalli Page 7
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

36. Define access path.


The alternative ways to retrieve tuples from a relation are called access paths. An access path
is either (1) a file scan or (2) an index plus a matching selection condition.

37. When does a general selection condition match an index? What is a primary term in a selection
condition with respect to a given index?
An index is said to match a selection condition if the index can be used to retrieve just the
tuples that satisfy the condition. The general format is attr op value, where op is one of the
comparison operators <, ≤, =, _=, ≥, or>. An index matches such a selection if the index search
key is attr and either (1) the index is a tree index or (2) the index is a hash index and op is
equality.

38. Define selectivity of an access path.


The selectivity of an access path is the number of pages retrieved (index pages plus data
pages) if this access path is used to retrieve all desired tuples. If a relation contains an index
that matches a given selection, there are at least two access paths, namely, the index and a scan
of the data file.

39. Define most selective access path of a query.


The most selective access path is the one that retrieves the fewest pages; using the most
selective access path minimizes the cost of data retrieval.

40. Write the index nested loops join algorithm.


In a nested loops join, the join condition is evaluated between each pair of tuples from R and S.

41. Write the block nested loops join algorithm.

42. Define conjunctive and disjunctive selections in a query.


General selection conditions can be expressed in conjunctive normal form, where each
conjunct consists of one or more terms. Conjuncts that contain V are called disjunctive.

43. What do you mean by an index only scan?


It is a hash-based implementation first partitions the file according to a hash function on the
output attributes. Two tuples that belong to different partitions are guaranteed not to be
duplicates because they have different hash values. In a subsequent step each partition is read
into main memory and within-partition duplicates are eliminated. If an index contains all
output attributes, tuples can be retrieved solely from the index. This technique is called an
index-only scan.

V. R. Kanagavalli Page 8
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

44. Compare and contrast nested loops join, block nested loops join operations
In a nested loops join, the join condition is evaluated between each pair of tuples from R and S.
A block nested loops join performs the pairing in a way that minimizes the number of disk
accesses

45. Write in short about the difference between sort-merge join and hash join.
A sort-merge join sorts R and S on the join attributes using an external merge sort and
performs the pairing during the final merge step. A hash join first partitions R and S using a
hash function on the join attributes. Only partitions with the same hash values need to be
joined in a subsequent step. A hybrid hash join extends the basic hash join algorithm by
making more efficient.

46. How does hybrid hash join improve upon the basic hash join algorithm?
Hybrid hash join is that we avoid writing the first partitions of R and S to disk during the
partitioning phase and reading them in again during the probing phase.

47. Define histogram and its variants


A histogram is a data structure that approximates a data distribution by dividing the value
range into buckets and maintaining summarized information about each bucket. In an
equiwidth histogram, the value range is divided into subranges of equal size. In an equidepth
histogram, the range is divided into subranges such that each subrange contains the same
number of tuples.

48. When do we say two algebraic expressions to be equivalent?


Two relational algebra expressions are equivalent if they produce the same output for all
possible input instances. Several relational algebra equivalences allow a relational algebra
expression be modified to obtain an expression with a cheaper plan.

49. Enumerate the steps in optimizing a relational algebra expression.


Optimizing a relational algebra expression involves two basic steps:
• Enumerating alternative plans for evaluating the expression. Typically, an optimizer
considers a subset of all possible plans because the number of possible plans is very
large.
• Estimating the cost of each enumerated plan, and choosing the plan with the least
estimated cost.

50. How is the cost estimated for an evaluation plan?


There are two parts to estimating the cost of an evaluation plan for a query block:
1. For each node in the tree, we must estimate the cost of performing the corresponding
operation. Costs are affected significantly by whether pipelining is used or temporary relations
are created to pass the output of an operator to its parent.
2. For each node in the tree, we must estimate the size of the result, and whether it is sorted.

51. Define Reduction Factor.


Reduction factor is associated with each with each term in the WHERE clause. It is the ratio of
the (expected) result size to the input size considering only the selection represented by the
term. The actual size of the result can be estimated as the maximum size times the product of
the reduction factors for the terms in the WHERE clause. Of course, this estimate reflects the

V. R. Kanagavalli Page 9
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

—unrealistic, but simplifying—assumption that the conditions tested by each term are
statistically independent.

52. How do we estimate the size of the final result of a query?


The size of the final result of a query is estimated by taking the product of the sizes of the
relations in the FROM clause and the reduction factors for the terms in the WHERE clause.
Similarly, the size of the result of each operator in a plan tree is estimated by using reduction
factors, since the subtree rooted at that operator’s node is itself a query block.

53. What does the rule of cascading projections state?


The rule for cascading projections says that successively eliminating columns from a relation
is equivalent to simply eliminating all but the columns retained by the final projection:

54. Differentiate between rule based optimizers and randomized plan generators
Rule-based optimizers use a set of rules to guide the generation of candidate plans, and
randomized plan generation, which uses probabilistic algorithms such as simulated
annealing to explore a large space of plans quickly, with a reasonable likelihood of finding a
good plan.

55. Write in short about parametric query optimization and multiple-query optimization.
Parametric query optimization, which seeks to find good plans for a given query for each of
several different conditions that might be encountered at run-time; and multiple-query
optimization, in which the optimizer takes concurrent execution of several queries into
account.

56. What are the problems caused by redundancy?


Redundant storage: Some information is stored repeatedly.
Update anomalies: If one copy of such repeated data is updated, an inconsistency is created
unless all copies are similarly updated.
Insertion anomalies: It may not be possible to store some information unless some other
information is stored as well.
Deletion anomalies: It may not be possible to delete some information without losing some
other information as well.

57. Define lossless join and dependency preservation property.


The lossless-join property enables us to recover any instance of the decomposed relation from
corresponding instances of the smaller relations. The dependency preservation property
enables us to enforce any constraint on the original relation by simply enforcing some
contraints on each of the smaller relations.

58. Define functional dependency


A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a
relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r
of R satisfies the FD X → Y if the following holds for every pair of tuples t1 and t2 in r:
If t1.X = t2.X, then t1.Y = t2.Y

59. Define super key.

V. R. Kanagavalli Page 10
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

If X → Y holds, where Y is the set of all attributes, and there is some subset V of X such that V
→ Y holds, then X is a superkey;

V. R. Kanagavalli Page 11
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

60. Define Armstrong’s Axioms.


The following three rules, called Armstrong’s Axioms, can be applied repeatedly to infer all
FDs implied by a set F of FDs. We use X, Y, and Z to denote sets of attributes over a relation
schema R:

Reflexivity:
Augmentation: If X → Y, then XZ → YZ for any Z.
Transitivity: If X → Y and Y → Z, then X → Z.

61. Define Attribute Closure of an attribute X


Attribute closure X+ with respect to F (set of FDs),is defined as the set of attributes A such
that X → A can be inferred using the Armstrong Axioms.

62. Write the algorithm for finding Attribute Closure of an attribute X.

63. When do we say relation in First normal form.


A relation is in first normal form if every field contains only atomic values, that is, not lists or
sets.

64. Define fully functional dependency and partial dependent.


Full functional dependency indicates that if A and B are attributes of a relation, B is fully
functionally dependent on A if B is functionally dependent on A, but not on any proper subset
of A.
A functional dependency AàB is partially dependent if there is some attributes that can be
removed from A and the dependency still holds.

65. Define second normal form.


Second normal form (2NF) is a relation that is in first normal form and every non-primary-key
attribute is fully functionally dependent on the primary key.

66. Define Third normal form.


Third normal form (3NF) requires that there are no functional dependencies of non-key
attributes on something other than a candidate key. A table is in 3NF if all of the non-primary
key

67. Differentiate between BCNF and third normal form.


A relation is in BCNF, if and only if, every determinant is a candidate key. The difference
between 3NF and BCNF is that for a functional dependency A à B, 3NF allows this dependency in a
relation if B is a primary-key attribute and A is not a candidate key, whereas BCNF insists that for this
dependency to remain in a relation, A must be a candidate key.

V. R. Kanagavalli Page 12
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

68. Define Multi Valued Dependency.


Multi-valued dependency (MVD) represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a set of values for B and a set of
value for C. However, the set of values for B and C are independent of each other.
A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A à>
B in relation R is defined as being trivial if

• B is a subset of A

or

• AUB=R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied.

69. Define fourth normal form/Codd normal form


Codd normal form and contains no nontrivial multi-valued dependencies.

70. Define fifth normal form.


A relation that has no join dependency. Fifth normal form is satisfied when all tables are
broken into as many tables as possible in order to avoid redundancy. Once it is in fifth
normal form it cannot be broken into smaller relations without changing the facts or the
meaning.

71. Define DKNF


The relation is in DKNF when there can be no insertion or deletion anomalies in the
database.

72. Define database workload


A database workload description includes the following elements:
1. A list of queries and their frequencies, as a fraction of all queries and updates.
2. A list of updates and their frequencies.
3. Performance goals for each type of query and update.

73. What are the details to be collected for queries and updates in a database workshop.
For each query in the workload,
• Which relations are accessed.
• Which attributes are retained (in the SELECT clause).
• Which attributes have selection or join conditions expressed on them (in the WHERE
clause) and how selective these conditions are likely to be.
Similarly, for each update in the workload,
• Which attributes have selection or join conditions expressed on them (in the WHERE
• clause) and how selective these conditions are likely to be.
• The type of update (INSERT, DELETE, or UPDATE) and the updated relation.
• For UPDATE commands, the fields that are modified by the update.

V. R. Kanagavalli Page 13
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

74. What are the guidelines regarding indices in physical database design?
There are guidelines that help to decide whether to index, what to index, whether to use a
multiple-attribute index, whether to create an unclustered or a clustered index, and whether to
use a hash or a tree index. Indexes can speed up queries but can also slow down update
operations.
75. What is a DBMS benchmark? Give examples.
A DBMS benchmark tests the performance of a class of applications or specific aspects of a
DBMS to help users evaluate system performance. Well-known benchmarks include TPC-A,
TPC-B, TPC-C, and TPC-D.

76. What do you mean by physical database tuning?(Or) State the need for database tuning?
After an initial physical design, continuous database tuning is important to obtain best possible
performance. Using the observed workload over time, we can reconsider our choice of indexes
and our relation schema. Other tasks include periodic reorganization of indexes and updating
the statistics in the system catalogs.

77. Write a short note on co clustering.


Co-clustering:
• It can speed up joins, in particular key–foreign key joins corresponding to 1:N
relationships.
• A sequential scan of either relation becomes slower.
• Similarly, a sequential scan of all Assembly tuples is also slower.
• Inserts, deletes, and updates that alter record lengths all become slower.

78. Define access control mechanism. Mention the types of the same.
An access control mechanism is a way to control the data that is accessible to a given user.
The two different types are discretionary and mandatory access control.

79. What are the main objectives of DBMS Security? Explain with example.
1. Secrecy: Information should not be disclosed to unauthorized users. For example, a student
should not be allowed to examine other students’ grades.
2. Integrity: Only authorized users should be allowed to modify data. For example, students
may be allowed to see their grades, yet not allowed (obviously!) to modify them.
3. Availability: Authorized users should not be denied access. For example, an instructor who
wishes to change a grade should be allowed to do so.

80. Define Discretionary Access Control.


Discretionary access control is based on the concept of access rights, or privileges, and
mechanisms for giving users such privileges. A privilege allows a user to access some data
object in a certain manner (e.g., to read or to modify).

81. Define Mandatory Access Control.


Mandatory access control is based on systemwide policies that cannot be changed by
individual users. In this approach each database object is assigned a security class, each user is
assigned clearance for a security class, and rules are imposed on reading and writing of
database objects by users. The DBMS determines whether a given user can read or write a
given object based on certain rules that involve the security level of the object and the
clearance of the user. These rules seek to ensure that sensitive data can never be ‘passed on’ to
a user without the necessary clearance.

V. R. Kanagavalli Page 14
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

82. Write the general format of GRANT Command. What is the use of GRANT Option in the
format?
The GRANT command gives users privileges to base tables and views. The syntax of this command is
as follows: GRANT privileges ON object TO users [ WITH GRANT OPTION ]. If a user has a
privilege with the grant option, he or she can pass it to another user (with or without the grant
option) by using the GRANT command.

83. What are the privileges granted to the user through a GRANT Command?
SELECT: The right to access (read) all columns of the table specified as the object, including
columns added later through ALTER TABLE commands.
INSERT (column-name): The right to insert rows with (non-null or nondefault) values in the
named column of the table named as object.
The privileges UPDATE (column-name) and UPDATE are similar.
DELETE: The right to delete rows from the table named as object.
REFERENCES (column-name): The right to define foreign keys (in other tables) that refer to
the specified column of the table object. REFERENCES without a column name specified
denotes this right with respect to all columns, including any that are added later.

84. Define Privilege Descriptor.


The privilege descriptor specifies the following: the grantor of the privilege, the grantee who
receives the privilege, the granted privilege (including the name of the object involved), and
whether the grant option is included. When a user creates a table or view and ‘automatically’
gets certain privileges, a privilege descriptor with system as the grantor is entered into this
table.

85. Define authorization Graph.


Authorization graph is a node in which the nodes are users—technically, they are
authorization ids—and the arcs indicate how privileges are passed. There is an arc from (the
node for) user 1 to user 2 if user 1 executed a GRANT command giving a privilege to user 2;
the arc is labeled with the descriptor for the GRANT command. A GRANT command has no
effect if the same privileges have already been granted to the same grantee by the same
grantor.

86. Define multilevel and polyinstantiation.


The presence of data objects that appear to have different values to users with different
clearances is called polyinstantiation. A multilevel table, which is a table with the surprising
property that users with different security clearances will see a different collection of rows
when they access the same table.

87. Describe Covert Channel.


Even if a DBMS enforces the mandatory access control scheme discussed above, information
can flow from a higher classification level to a lower classification level through indirect
means, called covert channels.

88. What is the responsibility of the database administrator?


1. Creating new accounts: Each new user or group of users must be assigned an authorization
id and a password. Note that application programs that access the database have the same
authorization id as the user executing the program.
2. Mandatory control issues: If the DBMS supports mandatory control—some customized
systems for applications with very high security requirements (for example, military data)

V. R. Kanagavalli Page 15
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

provide such support—the DBA must assign security classes to each database object and
assign security clearances to each authorization id in accordance with the chosen security
policy.
3. The DBA is also responsible for maintaining the audit trail, which is essentially the log of
updates with the authorization id (of the user who is executing the transaction) added to each
log entry.

89. Define Audit Trail.


Audit trail is essentially the log of updates with the authorization id (of the user who is
executing the transaction) added to each log entry. This log is just a minor extension of the log
mechanism used to recover from crashes.

90. Define Statistical Databases and what is the security issue in the statistical databases?
A statistical database is one that contains specific information on individuals or events but is
intended to permit only statistical queries. Security in such databases poses problems because
it is possible to infer protected information (such as an individual sailor’s rating) from answers
to permitted statistical queries. Such inference opportunities represent covert channels that can
compromise the security policy of the database.

Unit V
91. Define transaction.
A transaction is defined as any one execution of a user program in a DBMS and differs from an
execution of a program outside the DBMS (e.g., a C program executing on Unix) in important
ways. (Executing the same program several times will generate several transactions.)

92. State the ACID Properties . (Or) Explain ACID


Atomicity. Either all operations of the transaction are properly reflected in the database or
none are.
Consistency. Execution of a transaction in isolation preserves the consistency of the database.
Isolation. Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
93. What are the different states of a transaction?
Active – the initial state; the transaction stays in this state while it is executing
Partially committed – after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database restored to its state prior
to the start of the transaction. Two options after it has been aborted:
restart the transaction; can be done only if no internal logical error

V. R. Kanagavalli Page 16
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

kill the transaction


Committed – after successful completion.

94. Explain the relationship between the different states of a transaction.

95. What is the function of the recovery management of the database?


The recovery-management component of a database system implements the support for
atomicity and durability.
96. Explain the shadow-database scheme in short.
The shadow-database scheme:
a. assume that only one transaction is active at a time.
b. a pointer called db_pointer always points to the current consistent copy of the database.
c. all updates are made on a shadow copy of the database, and db_pointer is made to
point to the updated shadow copy only after the transaction reaches partial commit and
all updated pages have been flushed to disk.
d. in case transaction fails, old consistent copy pointed to by db_pointer can be used, and
the shadow copy can be deleted.

97. Define Schedule.


A schedule is a list of actions (reading, writing, aborting, or committing) from a set of
transactions, and the order in which two actions of a transaction T appear in a schedule must
be the same as the order in which they appear in T. Intuitively, a schedule represents an actual
or potential execution sequence.

98. Define complete schedule.


A schedule that contains either an abort or a commit for each transaction whose actions are
listed in it is called a complete schedule. A complete schedule must contain all the actions of
every transaction that appears in it.

99. Define Serial schedule.


If the actions of different transactions are not interleaved—that is, transactions are executed
from start to finish, one by one—the schedule is called a serial schedule.

100. What is the advantage of concurrent execution?


Multiple transactions are allowed to run concurrently in the system. Advantages are:

V. R. Kanagavalli Page 17
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

a. increased processor and disk utilization, leading to better transaction throughput:


one transaction can be using the CPU while another is reading from or writing to the
disk

b. reduced average response time for transactions: short transactions need not wait
behind long ones.

101. Define Concurrency Control Schemes.

Concurrency control schemes – mechanisms to achieve isolation; that is, to control the
interaction among the concurrent transactions in order to prevent them from destroying the
consistency of the database

102. What is the meaning of a serializable schedule?


A serializable schedule over a set S of committed transactions is a schedule whose
effect on any consistent database instance is guaranteed to be identical to that of some
complete serial schedule over S.

103. What are the different types of anamolies or conflicts that can occur while interleaving
the transactions?
a. Reading Uncommitted Data (WR Conflicts)

b. Unrepeatable Reads (RW Conflicts)

c. Overwriting Uncommitted Data (WW Conflicts)

104. What do you mean by a recoverable schedule? What is the advantage of the same?
A recoverable schedule is one in which transactions commit only after (and if!) all
transactions whose changes they read commit. If transactions read only the changes of
committed transactions, not only is the schedule recoverable, but also aborting a
transaction can be accomplished without cascading the abort to other transactions.Such a
schedule is said to avoid cascading aborts.

105. What do you mean by conflict serializable schedules?


Two schedules are said to be conflict equivalent if they involve the (same set of) actions
of the same transactions and they order every pair of conflicting actions of two committed
transactions in the same way. A schedule is conflict serializable if it is conflict
equivalent to some serial schedule.

106. What do you mean by a precedence graph? Where is it used?


The precedence graph for a schedule S contains:
- A node for each committed transaction in S.
- An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj’s actions.
* Strict2PL checks allows only the schedules for which the precedence graph acyclic.

V. R. Kanagavalli Page 18
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

107. What are the rules of strict 2Phase locking?


STRICT 2 PHASE LOCKING Rules

(1) If a transaction T wants to read (respectively, modify) an object, it first requests a


shared (respectively, exclusive) lock on the object.

(2) All locks held by a transaction are released when the transaction is completed.

108. What is the difference between strict 2phase locking and 2 phase locking?
The strict 2 phase locking releases the locks only when the transaction is completed
whereas the 2 phase locking For 2PL the 2nd rule is replaced by “ A transaction cannot
request additional locks once it releases any lock.”

Thus, every transaction has a ‘growing’ phase in which it acquires locks, followed by a
'shrinking’ phase in which it releases locks

109. Define View serializable schedules.


If Ti reads the initial value of object A in S1, it must also read the initial value of A in S2.
2. If Ti reads a value of A written by Tj in S1, it must also read the value of A written by
Tj in S2.
3. For each data object A, the transaction (if any) that performs the final write on A in S1
must also perform the final write on A in S2.
A schedule is view serializable if it is view equivalent to some serial schedule.
Every conflict serializable schedule is view serializable, although the converse is not
true.

110. Define latches


Latches:

- Short duration locks that are set before reading or writing a page to ensure atomic
operation

- Unset immediately after the physical read/write operation is completed

111. Define Convoys.


Convoys: It is the queue of transactions that is formed for want of lock to be released by
another transaction that is put on hold by a preemptive OS during its process scheduling

112. Differentiate between lock upgradation and downgrading.


Lock upgrade request – to upgrade shared lock to exclusive lock (e.g., UPDATE
operation)
Downgrading – The transaction initially obtains exclusive locks and then downgrades to
shared locks.

V. R. Kanagavalli Page 19
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

113. What do you mean by an update lock?

Update lock – it is compatible with shared locks but not other update and exclusive locks.
If the object need not be updated, then it is downgraded to shared lock

114. Define Deadlock. How can it be prevented?


A cycle of transactions waiting for locks to be released is called a deadlock.

Each transaction is given a timestamp when it starts up. The lower the timestamp, the
higher is the transaction’s priority;

If a transaction Ti requests a lock and transaction Tj holds a conflicting lock, the lock
manager can use one of the following policies

• Wait-die: If Ti has higher priority, it is allowed to wait; otherwise it is aborted.

• Wound-wait: If Ti has higher priority, abort Tj; otherwise Ti waits.

In the wait-die scheme, lower priority transactions can never wait for higher priority
transactions. In the wound-wait scheme, higher priority transactions never wait for lower
priority transactions. In either case no deadlock cycle can develop.

115. What is a waits-for graph? Give examples.


It is maintained by the lock manager to detect deadlock cycles where the Nodes denote
active transaction, and an arc from Ti to Tj denotes that Ti is waiting for Tj to release a
lock. A cycle in the waits-for graph indicates a deadlock.

116. How is a deadlock resolved?


A deadlock is resolved by aborting a transaction that is on a cycle and releasing its
locks. There are various choices for deciding which transaction has to be aborted. A
transaction

- with the fewest locks

- that has done the least work

- That is farthest from its completion

117. Define timeout mechanism.


V. R. Kanagavalli Page 20
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

If a transaction has been waiting too long for a lock, it is assumed to be in a deadlock
cycle and so aborted.

118. What do you mean by conservative 2PL?

Conservative 2PL can also prevent deadlocks. Under Conservative 2PL, a transaction
obtains all the locks that it will ever need when it begins, or blocks waiting for these locks
to become available.

119. What are the rules to be followed in implementing concurrency control in B+ Trees?

1. The higher levels of the tree only serve to direct searches, and all the ‘real’ data is in
the leaf levels (in the format of one of the three alternatives for data entries).

2. For inserts, a node must be locked (in exclusive mode, of course) only if a split can
propagate up to it from the modified leaf.

120. Define multiple-granularity locking.

Multiple-granularity Locking allows to efficiently set locks on objects that contain


other objects. The idea is to exploit the hierarchical nature of the ‘contains’
relationship. A database contains a set of files, each file contains a set of pages, and
each page contains a set of records. This containment hierarchy can be thought of as a
tree of objects, where each node contains all its children.

121. Define Intention shared and intention exclusive locks.

Intention shared (IS) and intention exclusive (IX) locks.

• To lock a node in S (respectively X) mode, a transaction must first lock all its
ancestors in IS (respectively IX) mode.

• SIX lock that is logically equivalent to holding an S lock and an IX lock. A transaction
can obtain a single SIX lock (which conflicts with any lock that conflicts with either S
or IX) instead of an S lock and an IX lock.

122. Define Lock Escalation.

This is the approach for deciding the level of granularity locking by obtaining fine
granularity locks (e.g., at the record level) and after the transaction requests a certain
number of locks at that granularity, to start obtaining locks at the next higher granularity
(e.g., at the page level).

V. R. Kanagavalli Page 21
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

123. What are the basic premises of optimistic concurrency control?

The basic premise is that most transactions will not conflict with other transactions, and
the idea is to be as permissive as possible in allowing transactions to execute.

Transactions proceed in three phases:

1. Read: The transaction executes, reading values from the database and writing to a
private workspace.

2. Validation: If the transaction decides that it wants to commit, the DBMS checks
whether the transaction could possibly have conflicted with any other concurrently
executing transaction. If there is a possible conflict, the transaction is aborted; its private
workspace is cleared and it is restarted.

3. Write: If validation determines that there are no possible conflicts, the changes to
data objects made by the transaction in its private workspace are copied into the database.

124. Define timestamp based concurrency control.

Each transaction can be assigned a timestamp at startup, and it is ensured, at execution


time, that if action ai of transaction Ti conflicts with action aj of transaction Tj, ai occurs
before aj if TS(Ti) < TS(Tj). If an action violates this ordering, the transaction is aborted
and restarted.

125. Define Thomas write rule and justify the same.

Ignoring outdated writes is called the Thomas Write Rule. If the Thomas Write Rule
is not used, that is, T is aborted in case (2) above, the timestamp protocol,
like 2PL, allows only conflict serializable schedules.

If the ThomasWrite Rule is used, some serializable schedules are permitted that are not
conflict serializable, as illustrated by the following schedule.

126. What is the purpose of multiversion concurrency control?

V. R. Kanagavalli Page 22
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

This protocol represents yet another way of using timestamps, assigned at startup time, to
achieve serializability.

The goal is to ensure that a transaction never has to wait to read a database object, and
the idea is to maintain several versions of each database object, each with a write
timestamp, and to let transaction Ti read the most recent version whose timestamp
precedes TS(Ti).

127. Describe the responsibilities of a transaction manager.

The recovery manager of a DBMS is responsible for ensuring transaction atomicity and
durability. It ensures atomicity by undoing the actions of transactions that do not commit and
durability by taking sure that all actions of committed transactions survive system
crashes, (e.g., a core dump caused by a bus error) and media failures (e.g., a disk is
corrupted).

128. Explain the various conflicts in short.

Three types of conflicting actions lead to three different anomalies. In a write-read (WR)
conflict, one transaction could read uncommitted data from another transaction. Such a read is
called a dirty read. In a read-write (RW) conflict, a transaction could read a data object twice
with different results. Such a situation is called an unrepeatable read. In a write-write (WW)
conflict, a transaction overwrites a data object written by another transaction. If the first
transaction subsequently aborts, the change made by the second transaction could be lost
unless a complex recovery mechanism is used.

129. What are the functions of recovery manager?

The recovery manager of a DBMS is responsible for ensuring two important properties of
transactions: atomicity and durability. It ensures atomicity by undoing the actions of
transactions that do not commit and durability by making sure that all actions of committed
transactions survive system crashes, (e.g., a core dump caused by a bus error) and media
failures (e.g., a disk is corrupted).

130. What is the meaning of steal and force approaches?

If changes made by a transaction can be propagated to disk before the transaction has
committed, then a steal approach is used. If all changes made by a transaction are immediately
forced to disk after the transaction commits, a force approach is said to be used.

131. Define ARIES recovery algorithm.

ARIES is a recovery algorithm that is designed to work with a steal, no-force approach.
When the recovery manager is invoked after a crash, restart proceeds in three phases:
1. Analysis: Identifies dirty pages in the buffer pool (i.e., changes that have not been written to
disk) and active transactions at the time of the crash.
2. Redo: Repeats all actions, starting from an appropriate point in the log, and restores the
database state to what it was at the time of the crash.

V. R. Kanagavalli Page 23
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

3. Undo: Undoes the actions of transactions that did not commit, so that the database reflects
only the actions of committed transactions.

132. What are the three main principles of ARIES?

There are three main principles behind the ARIES recovery algorithm:
Write-ahead logging: Any change to a database object is first recorded in the log; the record
in the log must be written to stable storage before the change to the database object is written
to disk.
Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of
the DBMS before the crash and brings the system back to the exact state that it was in at the
time of the crash. Then, it undoes the actions of transactions that were still active at the time of
the crash (effectively aborting them).
Logging changes during Undo: Changes made to the database while undoing a transaction
are logged in order to ensure that such an action is not repeated in the event of repeated
(failures causing) restarts.

133. Define a log.


The log, sometimes called the trail or journal, is a history of actions executed by the DBMS.
Physically, the log is a file of records stored in stable storage, which is assumed to survive
crashes; this durability can be achieved by maintaining two or more copies of the log on
different disks (perhaps in different locations), so that the chance of all copies of the log being
simultaneously lost is negligibly small.

134. What do you mean by a log tail?


The most recent portion of the log, called the log tail, is kept in main memory and is
periodically forced to stable storage. This way, log records and data records are written to disk
at the same granularity (pages or sets of pages).

135. What is the purpose of log record?


Every log record is given a unique id called the log sequence number (LSN). As with any
record id, we can fetch a log record with one disk access given the LSN. Further, LSNs should
be assigned in monotonically increasing order; this property is required for the ARIES
recovery algorithm. If the log is a sequential file, in principle growing indefinitely, the LSN
can simply be the address of the first byte of the log record.

136. Define the rules of Write Ahead Logging.

The Write-Ahead Logging Protocol:

1. Must force the log record for an update before the corresponding data page gets to
disk.

2. Must write all log records for a transaction before commit.

#1 guarantees Atomicity.

#2 guarantees Durability.

137. What are the contents of the update log record?

V. R. Kanagavalli Page 24
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

The pageid indicates the page id of the modified page; the length in bytes and the offset of the
change are also included. The before-image is the value of the changed bytes before the
change; the after-image is the value after the change. An update log record that contains both
before- and after-images can be used to redo the change and to undo it.

138. Define Compensation Log Record. (Or) What is the purpose of Compensation Log
Record? (Or) What are the contents of compensation Log Record?

A compensation log record (CLR) is written just before the change recorded in an update log
record U is undone. (Such an undo can happen during normal system execution when a
transaction is aborted or during recovery from a crash.) A compensation log record C describes
the action taken to undo the actions recorded in the corresponding update log record and is
appended to the log tail just like any other log record. The compensation log record C also
contains a field called undoNextLSN, which is the LSN of the next log record that is to be
undone for the transaction that wrote update record U; this field in C is set to the value of
prevLSN in U.

139. Differentiate between Compensation Log Record and Update Record.

Unlike an update log record, a CLR describes an action that will never be undone, that is, we
never undo an undo action. The reason is simple: an update log record describes a change
made by a transaction during normal execution and the transaction may subsequently be
aborted, whereas a CLR describes an action taken to rollback a transaction for which the
decision to abort has already been made. Thus, the transaction must be rolled back, and the
undo action described by the CLR is definitely required.
140. What are the contents of Transaction Table?

Transaction table: This table contains one entry for each active transaction. The entry
contains in general, the transaction id, relations accessed by the transaction, attributes related
to the transaction, list of locks held by the transaction, type of the transaction, the status, and a
field called lastLSN, which is the LSN of the most recent log record for this transaction. The
status of a transaction can be that it is in progress, is committed, or is aborted.

V. R. Kanagavalli Page 25
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

141. What do you mean by a Dirty page table?


Dirty page table contains one entry for each dirty page in the buffer pool, that is, each page
with changes that are not yet reflected on disk. The entry contains a field recLSN, which is
the LSN of the first log record that caused the page to become dirty. This LSN identifies
the earliest log record that might have to be redone for this page during restart from a
crash.

142. What are the phases of restart in ARIES Recovery algorithm?

143. What do you mean by a checkpoint? (Or) How does the ARIES recovery algorithm use
the checkpoints? What is the purpose of checkpoint?

Checkpoints are nothing but snapshots of DBMS. Checkpointing in ARIES has three steps.
First, a begin checkpoint record is written to indicate when the checkpoint starts. Second, an
end checkpoint record is constructed, including in it the current contents of the transaction
table and the dirty page table, and appended to the log. The third step is carried out after the
end checkpoint record is written to stable storage: A special master record containing the
LSN of the begin checkpoint log record is written to a known place on stable storage. While
the end checkpoint record is being constructed, the DBMS continues executing transactions
and writing other log records; the only guarantee we have is that the transaction table and dirty
page table are accurate as of the time of the begin checkpoint record.

144. What are the steps in analysis phase of crash recovery?

The Analysis phase performs three tasks:


1. It determines the point in the log at which to start the Redo pass.
2. It determines (a conservative superset of the) pages in the buffer pool that were dirty at the
time of the crash.
3. It identifies transactions that were active at the time of the crash and must be undone.

145. What do you mean by repeating paradigm? (Or) How does ARIES differ from other
crash recovery algorithms?

V. R. Kanagavalli Page 26
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

During the Redo phase, ARIES reapplies the updates of all transactions, committed or
otherwise. Further, if a transaction was aborted before the crash and its updates were undone,
as indicated by CLRs, the actions described in the CLRs are also reapplied. This repeating
history paradigm distinguishes ARIES from other proposed WAL based recovery algorithms
and causes the database to be brought to the same state that it was in at the time of the crash.

146. What are the steps in redo phase of crash recovery?


The Redo phase begins with the log record that has the smallest recLSN of all pages in the
dirty page table constructed by the Analysis pass because this log record identifies the oldest
update that may not have been written to disk prior to the crash. Starting from this log record,
Redo scans forward until the end of the log. For each redoable log record (update or CLR)
encountered, Redo checks whether the logged action must be redone. The action must be
redone unless one of the following conditions holds:
• The affected page is not in the dirty page table, or
• The affected page is in the dirty page table, but the recLSN for the entry is greater than
the LSN of the log record being checked, or
• The pageLSN (stored on the page, which must be retrieved to check this condition) is
greater than or equal to the LSN of the log record being checked.

147. What is the purpose of goal phase of crash recovery?


The goal of this phase is to undo the actions of all transactions that were active at the time of
the crash, that is, to effectively abort them. This set of transactions is identified in the
transaction table constructed by the Analysis phase.

148. What do you mean by Loser Transactions?


Transactions that were active at the time of crash are called as loser transactions. All actions of
losers must be undone, and further, these actions must be undone in the reverse of the order in
which they appear in the log.

149. What is the sequence of actions in Undo phase of crash recovery?


The set of lastLSN values for all loser transactions is called as ToUndo. Undo repeatedly
chooses the largest (i.e., most recent) LSN value in this set and processes it, until ToUndo is
empty. To process a log record:
1. If it is a CLR, and the undoNextLSN value is not null, the undoNextLSN value is added to
the set ToUndo; if the undoNextLSN is null, an end record is written for the transaction
because it is completely undone, and the CLR is discarded.
2. If it is an update record, a CLR is written and the corresponding action is undone, as
described in Section 20.1.1, and the prevLSN value in the update log record is added to the set
ToUndo.
When the set ToUndo is empty, the Undo phase is complete. Restart is now complete, and the
system can proceed with normal operations.

150. Write in short the working sequence of ARIES algorithm.


After a system crash, the Analysis, Redo, and Undo phases are executed. The redo phase
repeats history by transforming the database into its state before the crash. The Undo phase
undoes actions by loser transaction, transactions that are aborted since they were active at the
time of the crash. ARIES handles subsequent crashes during system restart by writing
compensating log records (CLRs) when undoing actions of aborted transaction. CLRs indicate
which actions have already been undone and prevent undoing the same action twice.

151. How do we recover from media failure?


V. R. Kanagavalli Page 27
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

To be able to recover from media failure without reading the complete log copy of the database
is taken periodically. The procedure of copying the database is similar to creating a
checkpoint.

V. R. Kanagavalli Page 28
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

Big Questions

Unit III
1. Define Index. Explain in detail about various types of indices, with example along with the
advantage and disadvantage of each type.
2. Explain the ISAM index structure in detail with examples illustrating insertion and deletion
with suitable diagrams
3. Explain how the data is inserted into the ISAM index structure with the algorithm and a neat
diagram
4. Explain how the ISAM index structure handles the deletion of a data item with the algorithm
and example.
5. Differentiate between static and dynamic index structures with suitable examples.
6. Explain the linear hashing in detail with example and neat diagrams.
7. Explain in detail static hashing with example and neat diagrams.
8. Explain the extendible hashing method with example and neat diagrams.
Unit IV
9. Explain two way merge sort algorithm with neat diagram and an example.
10. Explain the algorithm for implementing external merge sort with neat diagram and an example.
11. Explain the working of replacement sort with neat diagram and example.
12. Explain the effect of double buffering and blocked I/O in the performance of sorting
algorithms.
13. Explain the nested join operation in detail.

14. Explain the sort-merge join in detail with cost analysis.

15. Explain hash join in detail with neat diagram


16. Write in detail about query optimization.
17. Write in detail about normalization and various normal forms in DBMS.
18. What are factors to be considered for physical database design?
19. Write a detailed note on DBMS Benchmarks.
20. What is the operation involved in physical database tuning?
21. Explain in detail about the discretionary access control.
22. Explain in detail about the mandatory access control.
23. Write in detail about the encryption technology and how it can be used for database security?

V. R. Kanagavalli Page 29
SSEC\MCA\DBMS Question Bank\2010-2013 Batch

Unit V
24. Explain the ACID properties of transaction. Explain the usefulness of each.
25. Draw the state diagram of a transaction and explain
26. Explain the concept of deadlock handling with deadlock prevention, detection and recovery.
27. Describe the concurrency control based on locking.
28. Discuss the concurrency control without locking.
29. Write in detail about schedules and its significance in concurrency control.
30. Write in detail about the working principle of ARIES algorithm.
31. Write in detail about the Log Table and its significance in crash recovery.
32. Explain the importance of checkpoints in crash recovery.
33. Explain the sequence of actions in redo and undo phases of crash recovery.
34. Explain the crash recovery process in detail.

V. R. Kanagavalli Page 30