You are on page 1of 9

B+ Trees

A B+ tree is a data structure you can use to implement a very sorting, large amounts of data efficient method for sorting large amounts of data; B+ trees enable a correspondingly efficient searching algorithm. You can think of a B+ tree as providing an index to a database, which is why B+ trees are sometimes referred to as indices. In Visual Prolog, a B+ tree resides in an external database. Each entry in a B+ tree is a pair of values: a key string key string and an associated database reference number. When building your database, you first insert a record in the database and establish a key for that record. The Visual Prolog btree predicates may then be used to insert this key and the database reference number corresponding to this record into a B+ tree. When searching a database for a record, all you have to do is to obtain a key for that record, and the B+ tree will give you the corresponding reference number. Using this reference number, you can retrieve the record from the database. As a B+ tree evolves, its entries are kept in key order. This means that you can easily obtain a sorted listing of the records. A B+ tree is analogous to a binary tree, with the exception that in a B+ tree, more than one key string is stored at each node. B+ trees are also balanced; this means that the search paths to each key in the leaves of the tree have the same length. Because of this feature, a search for a given key among more than a million keys can be guaranteed, even in the worst case, to require accessing the disk only a few times--depending on how many keys are stored at each node. Although B+ trees are placed in an external database, they don't need to point to terms in the same database. It is possible to have a database containing a number of chains, and another database with a B+ tree pointing to terms in those chains.

1. Pages, Order, and Keylength

In a B+ tree, keys are grouped together in pages; each page has the same size, and all pages can contain the same number of keys, which means that all the stored keys for that B+ tree must be the same size. The size of the keys is determined by the KeyLen argument, which you must supply when creating a B+ tree. If you attempt to insert strings longer than KeyLen into a B+ tree, Visual Prolog will truncate them. In general, you should choose the smallest possible value for KeyLen in order to save space and maximize speed. When you create a B+ tree, you must also give an argument called its Order. This argument determines how many keys should be stored in each tree node; usually, you must determine the best choice by trial and error. A good first try for Order is 4, which stores between 4 and 8 keys at each node. You must choose the value of Order by experimentation because the B+ tree's search speed depends on the values KeyLen and Order, the number of keys in the B+ tree, and your computer's hardware configuration.

2. Duplicate Keys
When setting up a B+ tree, you must allow for all repeat occurrences of your key. For example, if you're setting up a B+ tree for a database of customers in which the key is the customer's last name, you need to allow for all those customers called Smith. For this reason, it is possible to have duplicate keys in a B+ tree. When you delete a term in the database, you must delete the corresponding entry in a B+ tree with duplicate keys by giving both the key and the database reference number.

3. Multiple Scans
In order multiple, scans of B+ trees to have more than one internal pointer to the same B+ tree, you can open the tree more than once. Note, however, that if you update one copy of a B+ tree, for which you have other copies currently open, the pointers for the other copies will be repositioned to the top of the tree.

4. The B+ Tree Standard Predicates


Visual Prolog provides several predicates for handling B+ trees; these predicates work in a manner that parallels the corresponding db_... predicates. (1) bt_create/5 and bt_create/6 You create new B+ trees by calling the bt_create predicate. bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order) /* (i,i,o,i,i) */ bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order, Duplicates) /* (i,i,o,i,i,i) */ The BtreeName argument specifies the name for the new tree. You later use this name as an argument for bt_open. The arguments KeyLen and Order for the B+ Tree are given when the tree is created and can't be changed afterwards. If you are calling bt_create/5 or bt_create/6 with the Duplicates argument set to 1, duplicates will be allowed in the B+Tree. If you call bt_create/6 with the Duplicatesargument set to 0 you will not be allowed to insert duplicates in the B+Tree. (2) bt_open/3 bt_open opens an already created B+ tree in a database, which is identified by the name given in bt_create. bt_open(Dbase, BtreeName, Btree_Sel) /* (i,i,o) */

When you open or create a B+ tree, the call returns a selector (Btree_Sel) for that B+ tree. A B+ tree selector belongs to the predefined domain bt_selector and refers to the B+ tree whenever the system carries out search or positioning operations. The relationship between a B+ tree's name and its selector is exactly the same as the

relationship between an actual file name and the corresponding symbolic file name. You can open a given B+ tree more than once in order to handle several simultaneous scans. Each time a B+ tree is opened, a descriptor is allocated, and each descriptor maintains its own internal B+ tree pointer. (3) bt_close/2 and bt_delete/2 You can close an open B+ tree with a call to bt_close or delete an entire B+ tree with bt_delete. bt_close(Dbase, Btree_Sel) bt_delete(Dbase, BtreeName) /* (i,i) */ /* (i,i) */

Calling bt_close releases the internal buffers allocated for the open B+ tree with BtreeName. (4) bt_copyselector bt_copyselector gives you a new pointer for an already open B+ tree selector (a new scan). bt_copyselector(Dbase,OldBtree_sel,NewBtree_sel) /* (i,i,o) */

The new selector will point to the same place in the B+ tree as the old selector. After the creation the two B+ tree selectors can freely be repositioned without affecting each other. (5) bt_statistics/8 bt_statistics returns statistical information for the B+ tree identified by Btree_Sel. bt_statistics(Dbase,Btree_Sel,NumKeys,NumPages, /* (i,i,o,o, */ Depth,KeyLen,Order,PgSize) /* o,o,o,o) */

The arguments to bt_statistics represent the following:

Dbase Btree_Sel NumKeys NumPages Depth KeyLen Order PgSize

is the db_selector identifying the database. is the bt_selector identifying the B+ tree. is bound to the total number of keys in the B+ tree Btree_Sel. is bound to the total number of pages in the B+ tree. is bound to the depth of the B+ tree. is bound to the key length. is bound to the order of the B+ tree. is bound to the page size (in bytes).

(6) key_insert/4 and key_delete/4 You use the standard predicates key_insert and key_delete to update the B+ tree. key_insert(Dbase, Btree_Sel, Key, Ref key_delete(Dbase, Btree_Sel, Key, Ref) /* (i,i,i,i) */ /* (i,i,i,i) */

By giving both Key and Ref to key_delete, you can delete a specific entry in a B+ tree with duplicate keys. (7) key_first/3, key_last/3, and key_search/4 Each B+ tree maintains an internal pointer to its nodes. key_first and key_last allow you to position the pointer at the first or last key in a B+ tree, respectively. key_search positions the pointer on a given key.

key_first(Dbase, Btree_Sel, Ref) key_last(Dbase, Btree_Sel, Ref) key_search(Dbase, Btree_Sel, Key, Ref)

/* (i,i,o) */ /* (i,i,o) */ /* (i,i,i,o)(i,i,i,i) */

If the key is found, key_search will succeed; if it's not found, key_search will fail, but the internal B+ tree pointer will be positioned at the key immediately after where Key would have been located. You can then use key_current to return the key and database reference number for this key. If you want to position on an exact position in a B+ tree with duplicates you can also provide the Ref as an input argument. (8) key_next/3 and key_prev/3 You can use the predicates key_next and key_prev to move the B+ tree's pointer forward or backward in the sorted tree. key_next(Dbase, Btree_Sel, NextRef) key_prev(Dbase, Btree_Sel, PrevRef) /* (i,i,o) */ /* (i,i,o) */

If the B+ tree is at one of the ends, trying to move the pointer further will cause a fail, but the B+ tree pointer will act as if it were placed one position outside the tree. (9) key_current/4 key_current returns the key and database reference number for the current pointer in the B+ tree. key_current(Dbase, Btree_Sel, Key, Ref) /* (i,i,o,o) */

key_current fails after a call to the predicates bt_open, bt_create, key_insert, or key_delete, or when the pointer is positioned before the first key (using key_prev) or after the last (with key_next).

5. Example: Accessing a Database via B+ Trees

The following example program handles several text files in a single database file at once. You can select and edit the texts as though they were in different files. A corresponding B+ tree is set up for fast access to the texts and to produce a sorted list of the file names. /* Program ch14e04.pro */ DOMAINS db_selector = dba PREDICATES % List all keys in an index list_keys(db_selector,bt_selector) CLAUSES list_keys(dba,Bt_selector):key_current(dba,Bt_selector,Key,_), write(Key,' '), fail. list_keys(dba,Bt_selector):key_next(dba,Bt_selector,_),!, list_keys(dba,Bt_selector). ist_keys(_,_). PREDICATES open_dbase(bt_selector) main(db_selector,bt_selector) ed(db_selector,bt_selector,string) ed1(db_selector,bt_selector,string) CLAUSES % Loop until escape is pressed main(dba,Bt_select):write("File Name: "), readln(Name), ed(dba,Bt_select,Name),!,

main(dba,Bt_select). main(_,_). % The ed predicates ensure that the edition will never fail. ed(dba,Bt_select,Name):ed1(dba,Bt_select,Name),!. ed(_,_,_). %* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * % There are three choices: %% a) The name is an empty string - list all the names % b) The name already exists - modify the contents of the file % c) The name is a new name - create a new file %* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ ed1(dba,Bt_select,""):-!, key_first(dba,Bt_select,_), list_keys(dba,Bt_select), nl. ed1(dba,Bt_select,Name):key_search(dba,Bt_select,Name,Ref),!, ref_term(dba,string,Ref,Str), edit(Str,Str1,"Edit old",NAME,"",0,"PROLOG.HLP",RET), clearwindow, Str><Str1, RET=0, term_replace(dba, string, Ref, Str1). ed1(dba,Bt_select,Name):edit("",STR1,"Create New",NAME,"",0,"PROLOG.HLP",RET), clearwindow, ""><Str1, RET=0, chain_insertz(dba,file_chain,string,Str1,Ref), key_insert(dba,Bt_select,Name,Ref). open_dbase(INDEX):existfile("dd1.dat"),!, db_open(dba,"dd1.dat",in_file),

bt_open(dba,"ndx",INDEX). open_dbase(INDEX):db_create(dba,"dd1.dat",in_file), bt_create(dba,"ndx",INDEX,20,4). GOAL open_dbase(INDEX), main(dba,INDEX), bt_close(dba,INDEX), db_close(dba).