Вы находитесь на странице: 1из 4

Introduction: Various SQL Queries used to search for records in a Database. Ex Find all Records at Perryridge Brach .

. Disadvantage Have to Search all the records. Is this efficient??? Definitely not Efficient Solution Indexing & Hashing Practical Usage Index in a book, Card catalogs in library, GOOGLE (uses efficient hashing technique). Index File is of the form Search Key Indices Types
Ordered Indices Hash

Simple Ex Index of a book contains Page no as a search key and generally it is in sorted order. Primary Index An Index whose search key also defines the sequential order of the file. Also called Clustering Indices.

Pointer

(Search key Branch Name)

Secondary Index An Index whose search key defines an order different from the sequential order of the file. Also called Non-Clustering Indices.

Several techniques available for both Indexing and Hashing. Which to select??? It is done on the criteria, 1. 2. 3. 4. 5. Access Types Access Time Insertion Time Deletion Time Space overhead

Search Key An attribute or set of attributes used to look up records in a file. Ordered Indices An ordered Index stores the values of search keys in sorted Order, and associates with each key the records that contain it.

(Search key Account Balance)

Types of Primary Index

Primary Index

Dense

Sparse

Dense Indices An Index record appears for every search key value in the file.

Sparse Indices An Index record appears for only some of the search key values. (Refer to the pic in Primary Index). Dense Indices
1. 2. Faster i.e. access time less More space overhead.

Sparse Indices
1. 2. Slower i.e. access time more. Less Space overhead.

(Multilevel Two level sparse index) Index Update Deletion and Insertion Regardless of what index used, whenever a record is either inserted into or deleted from the file. Insertion A lookup performed using the search key value. The next decision based on whether Index or Sparse. (Similarly for deletion) Secondary Indices Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition. We can have a secondary index with an index record for each search-key value Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Secondary indices have to be dense.

Time Space tradeoff Depends on the programmer and the application. Multilevel Indices What will happen if Primary index is too large, say more than 1,00,000 records?? It wont be efficient to use a single level of sparse Index. So we can create many level of Primary Index. Like create a sparse Index for a sparse index itself or a dense index for sparse index.

Advantages Secondary Indices improve the performance of queries that use keys other than the search key of primary index. Disadvantages However, they impose a significant overhead on modification to the database.

Hashing Hashing provides a very fast access to records on certain search conditions. Static Hashing Bucket Unit of storage containing one or more records. We obtain the bucket of record directly from its search key value using hash function. Let K Set of all search-key values. B Set of all bucket addresses A hash function h is a function from K to B. Ex To insert a record with search key Ki we compute h(Ki) which gives the address of the bucket for that record. Records with different search key values may be mapped to same bucket, thus entire bucket has to be searched sequentially to locate a record. Hash File Organization Let us see the hash file organization of account file. Let us assume the hash function to return the sum of binary representation of the characters modulo 10. Then h(perryridge) = 5, h(Round Hill) = 3 etc Hash functions Worst hash function Maps all the search-key values to the same bucket. An Ideal hash function must be uniform and random. Bucket Overflows Bucket overflows can occur because of 1. Insufficient buckets 2. Skew in distribution of records. Multiple keys have same search key value. Chosen hash function produces non uniform distribution of key values. This can be avoided by using Overflow Buckets. Overflow chaining the overflow buckets of a given bucket are chained together in a linked list. This scheme is called closed hashing.

If the initial number of buckets is too small, and file grows, performance will degrade due to many overflows. If more space is allocated and there is not much to store then it leads to underflow. Database shrinks and as a result space is wasted. Can be improved by periodic re organization of file with new hash function but again it is expensive and disrupts normal operation

Hash Indices Better solution Dynamic Hashing Hashing is not only used for file organization but also for index structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Hash indices Secondary indices Example of Hash Index

Disadvantages of Static hashing