Linear Hash

A Concurrent Implementation of Linear Hashing
Huxia Shi
Department of Computer Science and Engineering, York University

4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3
Abstract. Traditional hashing algorithms have fixed hash function ranges.

Although they provide efficient access to the indices of static databases,
they are not appropriate for dynamic databases because of their inability
in handling data size growth and shrinkage. Linear hashing, a dynamic
hashing algorithm, addresses the main deficiency of traditional hashing.
This paper presents a concurrent access solution to linear hashing. The
efficiency of this solution has been compared with a sequential access
solution. Both solutions are implemented in Java1 . Some properties of
these implementations have been checked by means of the model checker
Java PathFinder2 (JPF for short).
1 Introduction
Linear hashing is a dynamic hashing algorithm, which is of interest to the
database community. Hash tables have been widely used to save indices for rela-
tively static databases. However, they are not appropriate for dynamic databases.
If there are not enough buckets in the hash table, each bucket of the hash table
will contain a lot of indices when the database size is dramatically increased. The
efficiency of the hash table in retrieving data is impaired in this case. On the
contrary, if the hash table is created with a large amount of buckets, it is waste
of memory space if the database always contains a small set of data or shrinks
from a large size to a small size. The dynamic hashing techniques address this
problem of traditional hash tables. The number of buckets can be adjusted in
dynamic hashing according to the change in the data size. Therefore, it is well
suited for dynamic database applications.
Concurrent access is an important aspect for linear hashing. As the main
application of linear hashing is in databases, it is normal that one user is deleting
some indices and another user is adding some indices at the same time. The
requirement of providing multiple users with acceptable response time motivates
the research of efficient concurrent algorithms for linear hashing. This paper
presents a concurrent solution [1]. Its performance is discussed and analyzed by
comparing with a sequential solution. Both solutions are implemented in Java.
These implementations and their verification using JPF are briefly explained.
The rest of the paper is organized as follows. In Section 2, we discuss related
work. Section 3 is a general introduction of the linear hashing technique and its
1
http://java.sun.com
2
http://javapathfinder.sourceforge.net/
2
operations. Section 4 explains one concurrent solution [1] for linear hashing. In
Section 5, the Java implementations of this solution and a sequential solution
are presented. Section 6 describes the verification of both implementations using
JPF. Section 7 discusses the experimental results about the performance. We
summarize this paper in Section 8.
2 Related work
Linear hashing is a dynamic hash table algorithm invented by Witold Litwin
in 1980 [2]. Other dynamic hashing algorithms include extendible hashing [3],
exponential hashing [4], and dynamic hashing [5]. The techniques used in this
concurrent solution are similar to the algorithms investigated in B-trees [6] [7]
and binary search trees [8] [9].
3 Linear Hashing
In this section, we outline the data structure and the operations of a linear hash
table.
When a linear hash table is initialized, N primary buckets with contiguous
logical addresses are created. Each primary bucket has the maximal capacity to
save b keys. The initialized hash function h0 : k → {0, 1, . . . , N − 1} maps a
key to the id of a primary bucket. When a new key is inserted into the linear
hash table, we first calculate the primary bucket id and then put the key into
the target bucket. When the number of the keys in a primary bucket exceeds
its maximal capacity b, a bucket named overflow bucket is created and linked to
the end of this primary bucket. More than one overflow bucket may be created
when more keys are added. The new created overflow bucket is linked to the
last overflow bucket. We call the primary bucket and all of its overflow buckets
a bucket chain. The primary buckets without the overflow buckets are called
length one bucket chains. All keys in a bucket chain are saved in order.
When more and more keys are inserted into an initialized linear hash table,
some bucket chains may contain a long list of overflow buckets. It is not efficient
for the find operation. The split operation is used at this time to expand the
number of the bucket chains and thus reduce the length of them. The first split
operation is applied to the first bucket chain. We choose a new hash function
h1 (k) such that for any key value k, either h1 (k) = h0 (k) or h1 (k) = h0 (k) + N .
This hash function is applied to all keys in the first bucket chain, and maps them
to two values, 0 or N . The keys with the hash value 0 are left. Other keys are
put into a new bucket chain. The number of bucket chains in this linear hash
table is increased to N + 1 by adding the new bucket chain at the end. Figure 1
shows the result of the splitting on the first bucket chain.
The further split operations are applied to the next bucket chains one by one.
After the bucket chain at position N − 1 is split, the next split operation will
be moved back to the first bucket chain with a higher level hash function h2 . To
summarize, there are three areas in the linear hash table as shown in Figure 1.
3
0 1 2 … N-1 N
16 13 26 15 20
24 17 30 …
32 38
next
hlevel+1(k) hlevel(k) hlevel+1(k)
Fig. 1. The split result of the first bucket chain
Lower boundary Upper boundary Hash function

area1 0 next-1 hlevel+1
area2 next 2level N − 1 hlevel
area3 2level N 2level N + next − 1 hlevel+1
Table 1. The properties of the three areas
The hash function hlevel is used in the middle area. The other two areas use hash
function hlevel+1 . Hash function hlevel maps any key to a value ranged from 0
to 2level N − 1. The relation between hlevel and hlevel+1 is hlevel+1 (k) = hlevel (k)
or hlevel+1 (k) = hlevel (k) + 2level N . A variable named next is introduced to save
the position of the bucket chain to be split next. The boundaries of the three
areas in a linear hash table can be determined by the next and level variables
as shown in Table 1. The next and level variables are called the root variables.
They are updated by a split operation as follows:
next ← (next + 1)mod(N ∗ 2level )
if next = 0 then level ← level + 1 endif
The merge operation is opposite to the splitting. It merges the bucket chain
at position next-1 and the one at the end of the table to a new bucket chain.
Then, the merged new bucket chain is used to replace the one at position next-1,
and the last bucket chain is deleted. If the next value is not zero, it is decreased
by 1. Otherwise, the level value is decreased by 1 and next moves to 2level N − 1.
It is illustrated in the following pseudocode.
if next = 0 then level ← level − 1 endif
next ← (next − 1)mod(N ∗ 2level )
The other three operations of a linear hash table are find, insert and delete.
The find operation checks if a key exists in a linear hash table. It first locates the
target bucket chain by the hash function and then checks the key data in this
bucket chain. Because of the special three areas data structure explained above,
4
the location procedure here is changed to two steps. In the first step, hlevel is
used to calculated a bucket chain position. The next step compares this value
with the lower boundary of the middle area. If the calculated position is higher
or equal to next, it is accepted as the valid target bucket id. Otherwise, hlevel+1
is used to get the target bucket id. The delete operation removes a key from
a linear hash table. The insert operation adds a key. Both the delete and the
insert operations use the same location procedure as the find operation to get
the target bucket id.
4 Concurrent Solution
The concurrent solution in [1] is illustrated in this section. It is based on three

different locks shown in Table 2.
Existing lock
Lock Request Read lock Selective lock Exclusive lock
Read lock yes yes no
Selective lock yes no no
Exclusive lock yes no no
Table 2. All locks used in the concurrent solution
This concurrent solution allows the find and the split operations to work
simultaneously. It introduces a data race problem. The find operation first reads
the level value and then uses it to find the target bucket chain. However, the
level value may be updated by a split process before it is used to calculate the
target bucket chain. To handle this problem, a variable locallevel is added into
each bucket chain. This variable saves the correct level value used in this bucket
chain. It is redundant information if all operations are serialized. How to use this
variable is illustrated in following subsections. Now the linear hashing algorithm
has three variables. Integers level and next are root variables. The variable
bucketChainList is an array containing all bucket chains. A bucket chain object
contains an integer variable locallevel and a sequence of buckets.
l e v e l , next : I n t e g e r
b u c k e t C h a i n L i s t : Array<BucketChain>
4.1 Find operation
In the find operation, a read lock is first added on the root variables. After this
read lock is successfully added, a second read lock is put on the target bucket
chain. Then the find operation releases the first lock on the root variables and
reads data in the target bucket chain. Finally it releases the second lock. The
5
level value used by the find operation may be changed by a simultaneous split
operation. Such an obsolete level can be identified by comparing it with the
locallevel variable in the target bucket chain. If the find procedure detects this
problem, it continues to increase the level value and locate the new bucket chain
until the correct one is found. This procedure of probing the correct target chain
is shared with the insert and delete operations.
4.2 Insert and delete operations

The insert and delete operations are similar. They first add a read lock on the
root variables and then selectively lock on the target bucket chain. After the
second lock is successfully added, the first lock is released and the target bucket
chain is updated. Finally the second lock is released.
To allow the simultaneous accesses to the same target bucket chain with
the find processes, the insert and delete operations do not change the bucket
sequence in the bucket chain directly. They create a bucket sequence with the
updated result. Then the old sequence is replaced with the newer one. If there
are several insert and delete operation requests on same bucket chain, these
operations must work serially. The first insert or delete process put a selective
lock on the target bucket chain. Thus, before this selective lock is released, other
insert or delete processes cannot successfully add their selective locks.
4.3 Split operation

The split operation adds a selective lock first on the root variables and then on
the bucket chain pointed to by next. After both locks are successfully added, the
bucket chain pointed to by next is split. Then, the value of the next variable is
increased. Finally, both selective locks we added at the beginning are released.
The split operation and find operation can work in parallel at the same bucket
chain. However, splitting, inserting, and deleting on the same bucket chain must
run serially because all of them put selective locks on the target bucket chain.
4.4 Merge operation

The merge operation is the only one which uses an exclusive lock. Consequently,
it can not work concurrently with any other operations on the same bucket chain.
First the root variables are exclusively locked and updated. Then the merge
operation adds exclusive locks on two bucket chains which are going to be
merged. In the next step, the lock on the root variables is degraded to selec-
tive to allow other find processes, which do not access the same bucket chains,
to continue. Finally, two bucket chains are merged, and all locks are released.
5 Java Implementations
Two Java implementations are presented in this section. The first one is based
on the concurrent solution shown in previous section. It is called the concurrent
6
implementation. Another implementation is rather simple. It is used to verify

the performance of the former. All operations in the second implementation are
forced to be serialized. Thus, it is called the sequential implementation. Both
implementations use the same parameters shown in Table 3.
N 2
hlevel (k) k mod 2level+1
Table 3. The parameters for the two implementations
5.1 Concurrent implementation
The classes of the concurrent implementation are shown in Figure 2.
Bucket
<<interface>> <<interface>> -data
LinearHashTable Serializable -size 0..*
+find() -next
+insert() +addData()
+delete() +getData()
+setNext()
+getNext()
1
ThreeLockLinearHashTable
#level Node
BucketChain
#next +chain
-primary
#rootLock 1 +chainLock
#bucketChainList
+find() 0..*
+insert() 1
+delete() 1
1 1
#split() 1
#merge()
LocalLevelBucketChain
Lock
-localLevel
-readLockNum
-selectiveLockNum +getLocalLevel()
-exclusiveLockNum +setLocalLevel()
1 +requestLock()
+releaseLock()
+degradeLock()
Fig. 2. The class diagram of the concurrent implementation
The Lock class encapsulates the logic of how these locks cooperate with
each other. The readLockN um, selectiveLockN um, and exclusiveLockN um
attributes record the number of the existing locks on a lockable object. The
7
Lock class has three methods. All of these methods are synchronized to avoid the
possible error caused by the concurrent accesses to the lock number attributes.
The requestLock method uses a loop to add a lock. If the lock is successfully
added, the requestLock method increases the lock number and exits the loop.
Otherwise, the request thread goes to the waiting queue. When this thread is
woken up later, the loop in the requestLock method will try to add a lock again.
The releaseLock method releases an added lock by reducing the lock number.
The degradeLock method has two parameters, f romT ype and toT ype. It reduces
the number of the lock whose type is f romT ype and increases the number of
the toT ype lock. A Lock object is associated with either the root variables or a
bucket chain.
The LinearHashTable is an interface which defines the operations of a linear
hash table. The concurrent implementation of this interface is the ThreeLock-
LinearHashTable class. The integer attributes level and next in the ThreeLock-
LinearHashTable class are the root variables. The rootLock attribute is a Lock
object, and used to protect the root variables. An array named bucketChainList
in the ThreeLockLinearHashTable class saves a sequence of Node elements. Each
Node object has a LocalLevelBucketChain object and a Lock object associated
with this bucket chain. The find, insert, delete, split, and merge methods in the
ThreeLockLinearHashTable class implement the detailed logic of the concurrent
operations on a linear hash table.
The LocalLevelBucketChain class inherits the BucketChain class. In compar-
ison with the BucketChain class, the LocalLevelBucketChain class has one more
attribute named localLevel, which has been illustrated in previous section. A
BucketChain object contains a sequence of the Bucket objects. The first one in
this sequence is the primary bucket. A Bucket object is composed of an integer
array with a fixed size b and a reference to the next Bucket.
The ThreeLockLinearHashTable, Node, LocalLevelBucketChain, and Buck-
etChain classes implement the java.io.Serializable interface. Thus, a linear hash
table represented by the ThreeLockLinearHashTable class can be saved onto the
hard disk, or loaded from the hard disk files. The Lock class does not implement
the java.io.Serializable interface because it is runtime information and need not
to be persisted.
5.2 Sequential implementation
Figure 3 is the class diagram of the sequential implementation of a linear hash

table. The SequentialLinearHashTable class implements the LinearHashTable
interface. It contains the root variables and a bucket chain list. The find, insert,
delete, split, and merge methods in the SequentialLinearHashTable class are
synchronized. Therefore, they have to run serially. No Lock object is used in this
implementation.
Similar to the concurrent implementation, the SequentialLinearHashTable,
BucketChain, and Bucket classes implement the java.io.Serializable interface to
allow a linear hash table to be saved onto the hard disk.
8
<<interface>> Bucket
LinearHashTable <<interface>>
Serializable -data
+find() -size
+insert() -next
+delete()
+addData()
+getData()
+setNext()
+getNext()
SequentialLinearHashTable 0..*
-level
-next
1
-bucketChainList
+find()
+insert() 1 BucketChain
+delete() -primary
-split()
-merge() 0..*
Fig. 3. The class diagram of the sequential implementation
6 Verification
JPF is used to verify the above two Java implementations. Four properties of
these implementations are checked. They are the freedom of deadlock, the free-
dom of data races, the importance of the locks, and the consistence of the number
of locks. The last two properties are only checked in the concurrent implemen-
tation because they are related with the locks.
The verification was conducted at a Linux server with one dual core CPU.
The JRE version is 1.6.0. The maximal memory assigned to JPF is 2.5G. The
maximal size of a bucket, b, is chosen to be 2 in the verification. Because JPF
consumes a lot of memory in the test, we can only use a very limited number of
test data. Only a small bucket size can make it possible to trigger the split and
merge operations with a tiny amount of test data.
Three different types of threads are used in the verification. They are the
insert, delete, and find threads. The number of test data manipulated by a
thread is fixed. It is chosen to be 6 because a smaller number cannot introduce
any interesting problem and for a bigger number JPF runs out of memory. The
threads with different types share the same test data. Thus they have chances
to collide at the same bucket chains. In order to cover a higher test range, the
threads with the same types use different test data. For example, there are two
insert threads and one delete thread in a verification test. One insert thread adds
the integers from 1 to 6, while another insert thread works with the integers from
7 to 12. The delete thread tries to remove the integers from 1 to 6 at the same
time. If there is no insert thread in a verification test, the linear hash table will
be initialized by adding all of the test data used in the delete and find threads.
9
The reason of this special treatment is that the find and delete operations on an
empty linear hash table cannot generate any interesting problem.
6.1 Deadlock
The freedom of deadlock is verified in both concurrent and sequential implemen-

tations. The different combinations of three thread types are tested. No code is
changed and the default JPF properties are used in this verification.
An array index out of bounds exception is found by JPF in the deadlock
verification. The analysis of the exception stack trace shows that it occurs when
a merge operation is performed on a linear hash table with the initial root
variables. The root variables next and level are both initialized to 0. After the
merge operation, next becomes −1 which points out of the buck chain array. This
problem occurs in some rare cases. For example, when an insert operation, which
creates an overflow event on a linear hash table with the initial root variables,
finds it is necessary to invoke a split operation, it sets a local boolean variable
to true and then releases all of the locks it holds. If this thread can continue,
it will check the local variable and call the split operation. However, a delete
thread can run on the same bucket chain before the split operation is started.
This delete thread can remove the overflow bucket introduced by above insert
thread, and thus generates an underflow event. The consequent merge operation
causes the array index out of bounds problem. This problem is not mentioned
in [1]. There is no handling of this possible issue in the pseudocode presented in
[1]. Our proposed solution is to check the values of the root variables after the
merge operation adds the exclusive lock on the root variables. If both of the root
variables are zero, the merge action will not be performed, and this operation
will be finished with releasing the exclusive lock on the root variables. This
solution is very simple. However, it has a flaw that some merge operations are
discarded just because they are scheduled at an inappropriate time. A possible
complete solution is that the merge thread keeps waiting on the lock of the root
variables and retrying the merge operation when it is notified later until it finds
the appropriate root variables. The first approach is used in our work because
the second one brings a big change to the algorithm in [1].
After fixing this problem, no other error is found in the deadlock verifica-
tions. The test results are listed in Table 4 and Table 5. The verification of the
concurrent implementation has the out of memory problem when the number
of threads is increased to three. Therefore, this implementation is only verified
with a very limited number of threads. Figure 4 compares the state space of the
two Java implementations.
10
Insert Thread Delete Thread Find Thread Time New State Number
Number Number Number
1 1 0 00:00:03 733
0 1 1 00:00:02 476
1 0 1 00:00:02 1159
2 0 0 00:00:02 655
0 2 0 00:00:03 1219
0 0 2 00:00:02 841
1 1 1 00:00:38 136972
0 2 1 00:00:11 29065
0 1 2 00:00:31 90986
2 1 0 00:00:14 35861
1 2 0 00:00:11 29862
1 0 2 00:01:30 328336
2 0 1 00:00:18 58655
2 2 0 00:11:35 1898391
0 2 2 00:18:17 3556731
2 0 2 00:11:20 2172770
2 2 2 10:01:53 Out of memory
Table 4. The deadlock verification of the sequential implementation
Insert Thread Delete Thread Find Thread Time New State Number
Number Number Number
1 1 0 00:02:16 494229
0 1 1 00:00:56 203312
1 0 1 00:00:58 219901
2 0 0 00:02:32 535261
0 2 0 00:02:30 519308
0 0 2 00:05:33 1181728
1 1 1 12:21:31 Out of memory
0 2 1 12:42:30 Out of memory
0 1 2 13:54:28 Out of memory
2 1 0 12:45:10 Out of memory
1 2 0 11:36:48 Out of memory
1 0 2 12:27:10 Out of memory
2 0 1 13:12:00 Out of memory
Table 5. The deadlock verification of the concurrent implementation
6.2 Data Race

The data race problem is checked in both Java implementations. The test method
is the same as the deadlock verification. Different combinations of the different
thread types are tried. The jpf.listener = gov.nasa.jpf.tools.P reciseRaceDetector
11
The sequential implementation The concurrent implementation
10000000
1000000
100000
State Number
10000
1000
100
10
1
1I/1D 1D/1F 1I/1F 2I 2D 2F
Thread Number
Fig. 4. The state space of the two implementations
attribute is added into the local jpf.properties file to enable the data race detec-
tion.
No data race is found in the tests of the sequential implementation. JPF
detects a data race on the root variable level when it verifies the concurrent
implementation. It is an expected result. The concurrent solution allows that
the split operation updates the root variables and other operations read the root
variables at the same time. The local level technique explained in Section 4 can
handle this problem.
6.3 Importance of the locks
In this verification, all three locks are shown to be essential to the concurrent
implementation. Three new classes are added in this test. All of them extend
the ThreeLockLinearHashTable class. The LackReadLinearHashTable class does
not use the read lock by overriding the find method in the superclass. Simi-
larly, the selective and exclusive locks are commented in the LackSelectiveLin-
earHashTable and LackExclusiveLinearHashTable classes, respectively.
When a delete thread and a find thread run at the same time on the concur-
rent implementation without the read lock, JPF reports a null pointer exception.
Because the find thread does not add a read lock on the target bucket chain,
this bucket chain can be set to null by a parallel merge operation just before
the find thread accesses it. Similar null pointer exceptions are reported by JPF
when the selective and the exclusive locks are disabled.
12
6.4 Number of locks

We want to check that the numbers of different locks are always correct when
multiple threads access the concurrent implementation in parallel. The number
of locks is verified after a lock is successfully required and before it is released.
The assert clauses used for different locks are listed in Table 6. If the number
of a lock is found to be incorrect, JPF will report an uncaught exception error
thrown by the assert clauses.
Read lock exclusiveLockN um == 0

Selective lock selectiveLockN um == 1
exclusiveLockN um == 0
Exclusive lock readLockN um == 0
selectiveLockN um == 0
exclusiveLockN um == 1
Table 6. The assert clauses for three locks
Two new classes are added in this verification. The readLockNum, selective-
LockNum, and exclusiveLockNum attributes in the Lock2 class are changed to
public. Other parts of the Lock2 class are same as the Lock class. The Three-
LockWithAssert class is generally same as the ThreeLockLinearHashTable class
except that it uses the Lock2 class instead of the Lock class and has the above
assert clauses added.
JPF raises a lot of warnings about the unprotected field accesses on the
readLockN um, selectiveLockN um, and exclusiveLockN um attributes. Because
in the real implementation, these attributes are private and all methods that ac-
cess them are synchronized, we are not interested in this possible problem. No
other problem is reported by JPF. Another issue in this test is same as the one in
the verification of the deadlock. We can only run two threads on the concurrent
implementation. When the number of threads is increased to three or more, JPF
runs out of memory.
7 Experiments of Performance
The experiments of the performance were conducted at a Linux server with four
dual core CPUs. The JRE version is 1.6.0. The maximal size of a bucket, b, is
chosen to be 400 to avoid creating too many files in the experiments involving
the hard disk IOs.
Figure 5 compares the performance of the two implementations. In this test,
all operations are in memory, and no hard disk IO is involved. The insert and
delete operations run simultaneously. A thread number t means there are t insert
threads and t delete threads at the same time. The total task of this test is to
insert ten million different integers and delete five million of them. All workloads
13
are equally distributed in the insert and delete threads. From this experimental
result, we find that the performance of the two implementations is similar. It
means the benefit we get from the parallel accesses does not obviously exceed
its overhead.
The concurrent implementation The sequential implementation
0.9
0.8
0.7
Time Usage (min)
0.6
0.5
0.4
0.3
0.2
0.1
0
1 3 5 10 20 50
Thread Number
Fig. 5. The performance of the concurrent insert and delete operations without the
hard disk writing
In real applications, a hash table is saved onto the hard disk. During the
running of a system, the changes in a hash table may need to be written back
to the hard disk files. Figure 6 and 7 give the performance comparisons of the
two implementations in which the hard disk IOs are added. The root variables
and each bucket chain have their own storage files. When there are changes in a
linear hash table, only the affected files are updated.
The test of Figure 6 only has the insert operations. Totally four hundred thou-
sand integers are inserted. The workloads are equally distributed in each thread.
The test of Figure 7 inserts the same number of integers, and deletes half of
them in the delete threads running at the same time. In the single thread case,
the performance of the concurrent implementation is worse than the sequential
one. However, in the multiple threads cases, the concurrent implementation is
slightly better than the sequential one. Another feature of the concurrent imple-
mentation is that its performance is rather stable when the number of threads
is increased. On the contrary, the performance of the sequential implementation
becomes worse with the increasing of the number of threads. The disk IOs are
the main costs in these tests. They work in serialized mode. In the concurrent
implementation, when a thread is doing the disk IO, other treads can continue
to add the locks and prepare the data in the memory, and then wait for disk IO.
The sequential implementation works in another way. When a thread is writing
14
the hard disk, other threads can not do any operations in memory, even if these
operations are not related to the part which is being saved.
0.7
0.6
Time Usage (min)
0.5
0.4
0.3
0.2
0.1
0
1 2 3 5 10 20 50
Thread Number
Fig. 6. The performance of the concurrent insert operations with the hard disk writing
1.2
1
Time Usage (min)
0.8
0.6
0.4
0.2
0
1 2 3 5 10 20 50
Thread Number
Fig. 7. The performance of the concurrent insert and delete operations with the hard
disk writing
15
8 Conclusion
Linear hashing is a dynamic hashing algorithm. It allows the adjustment of the

hashing function range according to the growth or shrinkage of the stored data.
This paper introduces a high level concurrent access algorithm of linear hash-
ing. Then, the Java implementations of this algorithm and a sequential solution
are presented. Both implementations are checked by means of JPF. The free-
dom of deadlock and data races is verified with a limited number of threads.
The correctness of the number of locks is also checked by JPF in the concur-
rent implementation. The performance of both implementations are compared.
The concurrent implementation does not show any explicit advantage in the
performance when all operations are in memory. If the changes in a linear hash
table are written back to the hard disk, the performance of the concurrent im-
plementation is slightly better than the sequential one in the multiple threads
mode.
References
1. Ellis, C.S.: Concurrency in linear hashing. ACM Transactions on Database Systems
12(2) (June 1987) 195–217
2. Litwin, W.: Linear hashing: A new tool for file and table addressing. In: Proceedings
of the 6th Conference on Very Large Data Bases, Montreal, Canada, IEEE Computer
Society (October 1980) 213–223
3. R. Fagin, J. Nievergelt, N.P., Strong, H.: Extendible hashing - a fast access method
for dynamic files. ACM Transactions on Database Systems 4(3) (September 1979)
315–344
4. Lomet, D.: Bounded index exponential hashing. ACM Transactions on Database
Systems 8(1) (March 1983) 136–165
5. Larson, P.: Dynamic hashing. BIT Numerical Mathematics 18(2) (June 1978)
184–201
6. Lehman, P., Yao, S.: Efficient locking for concurrent operations on B-trees. ACM
Transactions on Database Systems 6(4) (December 1981) 650–670
7. Kwong, Y., Wood, D.: A new method for concurrency in B-trees. IEEE Transactions
on Software Engineering 8(3) (May 1982) 211–222
8. Kung, H.T., Lehman, P.L.: Concurrent manipulation of binary search trees. ACM
Transactions on Database Systems 5(3) (September 1980) 354–382
9. Ellis, C.S.: Concurrent search and insertion in AVL trees. IEEE Transactions on
Computers C-29(9) (September 1980) 811–817

Linear Hash

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Linear Hash

Загружено:

Авторское право:

Доступные форматы

A Concurrent Implementation of Linear Hashing

Department of Computer Science and Engineering, York University

Abstract. Traditional hashing algorithms have fixed hash function ranges.

Fig. 1. The split result of the first bucket chain

Lower boundary Upper boundary Hash function

Table 1. The properties of the three areas

The concurrent solution in [1] is illustrated in this section. It is based on three

Table 2. All locks used in the concurrent solution

4.1 Find operation

4.2 Insert and delete operations

4.3 Split operation

4.4 Merge operation

implementation. Another implementation is rather simple. It is used to verify

Table 3. The parameters for the two implementations

5.1 Concurrent implementation

The classes of the concurrent implementation are shown in Figure 2.

Fig. 2. The class diagram of the concurrent implementation

5.2 Sequential implementation

Figure 3 is the class diagram of the sequential implementation of a linear hash

Fig. 3. The class diagram of the sequential implementation

The freedom of deadlock is verified in both concurrent and sequential implemen-

Table 4. The deadlock verification of the sequential implementation

Table 5. The deadlock verification of the concurrent implementation

6.2 Data Race

The sequential implementation The concurrent implementation

Fig. 4. The state space of the two implementations

6.3 Importance of the locks

6.4 Number of locks

Read lock exclusiveLockN um == 0

Table 6. The assert clauses for three locks

The concurrent implementation The sequential implementation

The concurrent implementation The sequential implementation

The concurrent implementation The sequential implementation

Linear hashing is a dynamic hashing algorithm. It allows the adjustment of the

Вам также может понравиться