Вы находитесь на странице: 1из 122

MAKING A HASH OF IT

by

SUNDARESH
Table of Contents

PREFACE................................................................................................................3
HASH TABLE MANAGEMENT.........................................................................13
PRELIMINARIES............................................................................................13
RE-HASHING..................................................................................................18
OPEN ADDRESSING......................................................................................21
UNORDERED PERFECT HASHING........................................................22
ORDERED PERFECT HASHING..............................................................31
HASH TABLE ORGANIZATION...............................................................35
HASH COMPRESSION........................................................................................41
HASHING BY PARTS.....................................................................................42
POSTFACE............................................................................................................62
APPENDIX............................................................................................................63
PROBABILTY OVERVIEW............................................................................63
CODE LISTING...............................................................................................65
TEST RESULTS...............................................................................................74
REHASHING...............................................................................................74
UNORDERED PERFECT HASHING........................................................79
ORDERED PERFECT HASHING..............................................................99
PREFACE

This book delves into the innards and the intricacies of hashing, intended
for the purpose of managing hash tables as well as hashing as a technique
for lossless compression of information which although is along the lines of
cryptographic hashing wherein the objective is to extract information about
a message and to condense and to encode that information so as to form
message digests for the sake of later verification of the integrity of the
original message, is not exactly identical to it. A separate section is devoted
to introducing hash based compression. I have tried to treat each of these
two forms of hashing with equal importance, but as the reader will quickly
discern, only on the surface does this topic seem to be a minor one, but in
reality it is incredibly vast, offering huge scope, with limitless possibilities
with immense potential for improvement almost bordering on insanity.

This book is the outcome of my effort to partially document and to make


notes as best I could of my own personal investigation into hashing which I
conducted because it was something that always interested and intrigued
me. I began this somewhat protracted and prolonged investigation by taking
an informal practical approach, which was desultory, spasmodic,
meandering and haphazard, without a clear goal, a vision, a perspective or
a sense of purpose, which therefore proved to be not a very fruitful or a
very productive exercise, but only an all the more exasperating and an
exhausting one, but perhaps because my motive was right , not a futile one,
since it led me to question my own thought process’s. To any aspiring
researcher, or inquirer into the truth, if there is one inviolable law or
principle or practice I will heartily vouch for and attest to , it is this; “Pay
attention to your own inner thoughts and to your own inner feelings and
back your own instincts and your own hunches, and they will always be
confirmed” , as they have been in my case which was what surprised me
and astonished me at the same time and eventually persuaded me to ask the
hard questions concerning the definition of good Hash functions,
particularly the very basic questions such as, “What purpose should such a
function serve, which must never be lost sight of ?”, therefore “What
desirable characteristic properties must it possess in order for it to be able
to effectively and efficiently carry out its intended function ?” , and “How
could one systematically go about trying to build or to construct a
mathematical expression which embodies and captures those desired
specific characteristics or properties in its form; has and exhibits that
desired behavior ?”. One way to construct and build meaningful
expressions, which always works for me, is the observation that form
expresses meaning perfectly, so once the intended if not also the innate
meaning of what is desired for the purpose it is meant to serve has been
clearly conceived and clearly defined, if one now concentrates and focuses
on its meaning intently, clearly and extensively in the mind, the form
emerges and suggests and presents itself and appears automatically on its
own out of nowhere, as if by magic. As evidenced there is a little volitional
(voluntary) effort or action on ones part which is always under ones control,
and there is the reciprocatory non-volitional (involuntary)automatically
produced reaction or result or effect arising naturally as a consequence of
ones act, which is not under ones control. This clearly drives home the
point that there is both a free and a bound component to action. Whereas
the law is always bound, the agent is always free. Even the slightest and the
simplest changes in the form of an expression drastically affects and alters
its meaning radically, not to mention its use and its suitability by
introducing new possibilities and eliminating existing ones. These
considerations later alerted me to taking a more perfunctory, a more
considered, a more measured, a more cautious and circumspect, a more
formal and theoretically sound approach.
My discoveries are recorded in this book more or less in the order, and in
the manner, in which I came by them. I mean it when I say, I will not
diminish or undermine the value of either approach , only heartily applaud
and highly recommend both of them, since both approaches yield excellent
insights into building fairly good Hash functions. In my honest experience,
neither have I found the scenic route to be really long nor have I found the
direct route to be really lack luster. I think each of these approaches are
reflective of ones choice of priorities in life. One can either choose to
cultivate good taste or simply cut to the chase and cultivate good character.
I prefer taste. Character has never been my forte. But the good news is that
taste eventually builds character owing to inquisitiveness. One never feels
safe and content knowing that an idea always works or that an idea never
works, but is driven to find out why it always works or why it never works
and tries to establish a sound theoretical basis, to supply a valid reason for
it, and to come up with a rational explanation for it, in order to acquire a
better appreciation of it.

As I just stated, one can either adopt a strict, formal, theoretical approach
to problem solving or a less formal practical approach. These two
approaches , always go hand in hand , proceed alternately step in step, and
are mutually fulfilling and complimentary. Problem solving is as much
practical as it is theoretical, but the style of working I prefer is the one in
which the practical leads the way, and the theory inevitably follows suit,
although this approach calls for a bit more tenacity, grit and hardihood on
ones part, but no one is ever exempt from having to take the apparent
failures in life in ones stride.

However, forewarned is forearmed. All scientists concur when they say


that problem solving is a goal oriented activity, and one should clearly
define ones objectives or one goals idealistically before systematically
working towards achieving them. And work itself is a vector quantity with
both direction(inspiration) and magnitude(effort) to it, of which direction in
the form of the motive is the foremost, critical and vital component upon
which rests the success or the failure of our endeavor. And as far as motives
or desires go, there can only be two; one of either to be virtuous which is a
selfless and generous motivation, or the other to be wealthy, which is a
greedy and selfish motivation. Suffice it to say, the motive always
determines the means , which inexorably and inescapably determines the
end. If the motive for doing something , or for not doing something is to
make money, the means will always be devious, and the end will always be
disastrous. Philosophers and Psychologists concur when they say that the
external phenomenal reality of the body called the macrocosm is identical
to the internal noumenal reality of the spirit which is the world of feelings,
thoughts and emotions called the microcosm. They conclude that all life,
all existence and all being is a tripartite heart(spirit), mind , body complex.
The (noble or ignoble)desires of the heart constitute the genus, to cater to
which and to satisfy which, the mind furnishes and comes up with(an
honest and a straightforward or a crooked and a devious)plan or method
giving the spirit impetus, and the body becomes the implement or the tool
or the apparatus or by which the laudable or deplorable objectives
wrought out by the spirit are sought to be fulfilled. Thus the mind acts as
the bridge connecting the invisible world of the spirit to the visible world of
the body. Being clever, shrewd and wise, and working for others spiritual,
mental and physical upliftment, and being crafty, cunning and deceptive, in
conspiring and plotting others spiritual, mental and physical downfall are
opposite ulterior motives. This is clearly a point which I cannot stress upon
enough, since I keep harping about it over and over again.

Practically, problem solving is much like agriculture. The first step is


plowing or tilling the soil of our mind. Next is planting the seed of our good
idea, in the fertile soil, and diligently caring for, nurturing and tending to
that seed , letting it gestate, germinate and grow to bear, delicious,
nutritious, sumptuous fruit, finally the bird of our imagination eats of that
fruit grows wings, takes off and soars into the heavenly heights and gains
its freedom. If there is anything my own efforts have convinced me beyond
a shadow of doubt, it is that an ounce of healthy , liberating, fertile and
vivid imagination can accomplish infinitely more than tons and tons of
sickly, debilitating, morbid and sterile proof to the contrary. I say, dare to
dream or die trying.

Having said that, fair warning is always in order. The idea to make
money is never a good idea. If one chooses to be loyal and to be faithful to
ones natural desire to be virtuous, to abide by, to be loyal to and to be
faithful to ones own moral and ideological precepts, principles and values,
which is the seed of virtue sown in ones heart, one gets to eat the fruit of
that seed, and one gains in ones ability and in ones options and in ones
opportunity to be more virtuous, and at the same time one looses in ones
ability and in ones options and in ones opportunity to be wealthy. If on the
other hand , one chooses to be loyal and to be faithful to the desire to be
wealthy which is the seed of wealth sown in ones heart, then one gets to eat
the fruit of that seed, and one gains in ones ability and in ones options and
in ones opportunity to be more wealthy, and at the same time one looses in
ones ability and in ones options and in ones opportunity to be virtuous.
Thus, the ideal which one chooses to espouse, be it the ideal to be wealthy
or be it the ideal to be virtuous, will shape, will form, will build and will
develop ones character, out of which will spring forth ones actions in
thought, in word and in deed, which will eventually lead, to ones captivity
and to ones enslavement or to ones emancipation and to ones freedom, as
the case may be. After all freedom is the buzzword and the catch-phrase of
this modern era.

We are well accquainted with the fact that every system be it a natural
system or be it an artificial system, has a life cycle and we like to find
solace by contemplating on this life cycle and reminding ourselves of how
everything must return to whence it came, and everyone must return to
whence they came. But what is the point of a cycle if nothing and no one is
either the better off or the worse off for taking it. Even a simple current
which flows through a coil, increases or decreases in at least one aspect of
it, if not more. When any system is revamped and overhauled, the new
system that is put in place, is either better or worse than the existing one. At
the end of a software development life cycle, the resulting software will
either be better or worse than its preceding version. Thus life is really not a
cycle but a spiral, where if one chooses to be virtuous, one rapidly ascends
up the spiral and ones life becomes blessed, but if one chooses to be
wealthy one just as rapidly if not more descends down the spiral and ones
life becomes cursed. What, if I may be bold enough to ask is the shape of
our DNA, if not a spiral in the form of a double helix, as the scientific
pictures and photos of it clearly depict it to be ?

At the same time, I do not want to falsely downplay the benefits and the
value of adopting a systematic theoretically sound approach either. At the
risk of being accused of being a utilitarian, I tend to regard the truth as an
abstract theoretical principle that must be consciously, correctly and
concertedly applied and put into practice by the individual, in order to solve
concrete practical problems for the society and for the world at large. A true
scientist or any individual who harbors a scientific temperament, whenever
his sense of inquisitiveness and wonder is aroused, never snuffs out that
flame, rather he feeds it and he fuels it, till it becomes a conflagration and
all of his ignorance and his doubts are dispelled , burnt, consumed and
reduced to ashes.

Fair warning is always in order here as well. There are three aspects to
every principle. At its heart and in its core every principle is moral and
spiritual in its nature, in its character and in its constitution, next there is the
scientific or mental aspect of it and finally there is the physical or material
aspect of it. I reiterate, since the abstract, invisible ideals and principles
that a person consciously and deliberately and lovingly and out of his own
free-will chooses to serve, by believing in them , by valuing them , by
cherishing them, by deifying them, and by practicing them, by always
striving to stand for, to live by, and uphold them always shapes and molds
the concrete true internal character as well as the true external conduct and
behavior of that person, and the true armor of an individual is always the
moral fiber and the moral fabric which that individual makes for himself,
therefore it is always in the interest of both the individual as well as in the
interest of the institution to which the individual belongs, that the loyalty
and the love of the individual should first and foremost be to the underlying
abstract principle and only subsequently to other individuals, who are
supposedly diligent and wary to apply these moral/spiritual, and
mental/scientific and physical/material principles in practice always to help
others and to be of service to others; never to make money out of it, thus
being an embodiment and a personification of those principles but never
injudiciously and indiscriminately to every individual including to those
who do not care for them, who flagrantly disregard them or misuse them by
misapplying them. Only the individual or the institution may forsake the
truth , but the truth never forsakes either the individual or the institution,
but is always loyal to and is always faithful to every individual and to every
institution. Everything good exists to serve a noble and a worthy purpose,
and must be used for good. It does not exist to serve an ignoble and an
unworthy purpose and be used for evil. There is always a higher calling,
there is always a loftier ideal, there is always a greater purpose, there is
always a grander design, there is always a better deal.

I must hasten to point out, there is one major pitfall to the mathematical
method, as it is currently practiced which is that mathematical symbolism
is often regarded as being removed from reality, totally detached, distinct
to, even distant from it; rather than as a fairly accurate expression of it even
if the expression is not the most perfect and the most definitive one, but is
only a symbolic and a fairly limited expression, not encompassing all
aspects of reality, nevertheless I think it is still a very valid representation,
and a pertinent description of it. This is exemplified in the large number of
mathematical derivations that we carry out which only require manipulating
the forms of these expressions without regard to their actual meaning. In
other words, as far as mathematical proofs go we merely note the changes
in the form between the given expression and the desired expression, and
blindly apply the relevant rules or postulates which can effect and bring
about this change in form, while choosing to remain oblivious to what those
forms really mean. In a typical problem solving scenario, the strategy
adopted involves performing the following steps,
1. Identify the known and the unknown quantities of the problem,
together with those quantities which may be tacitly assumed and
name them.
2. Express the relationship between the know and unknown
quantities mathematically, using conventional mathematical
notation.
3. Manipulate these expressions, according to and by applying the
well defined rules and postulates of the mathematical system, in
an effort to isolate the known from the unknown and express the
unknown in terms of the known.
4. Simplify and solve for unknown, by substituting for the known
and evaluating.

Whilst this serves us well to an extent, particularly when we have an


arsenal of standard forms for substitution, I deem this to be a very bad
practice in general. Mainly because, every expression always has both a
supercilious or a shallow meaning to it as well as a deeper and a more
profound meaning to it, both of which it always conveys and exhibits,
which we inadvertently tend to be blind to and disregard, or deliberately
and blissfully choose to ignore, either way unfortunately failing to take
them into account. Not to mention the more number of obvious possibilities
these forms very loudly and glaringly suggest and the comparatively fewer
hidden possibilities they ever so silently and ever so gently nudge and prod
us with , point toward, hint and prompt at. All design must be undertaken
with deliberate intentionality and with a clearly defined purpose. A formal
theoretical approach immensely helps in this regard. As a case in point
consider the matter of compression coding itself, one minor form or branch
of which namely hashing is what is dwelt upon in this book. But that aside,
if we merely regard the problem or objective of (meaningless or
dumb)compression coding in its general form, is to code a larger number as
a smaller number, we start off with the conventional representation of a
number w.r.t a chosen base b which as per our usual understanding is,
i+1
∑ W i⋅di where 0⩽d i<b ∧ W i+1=W i⋅b=b
i

Here the successive smaller weights always decrease by a constant factor


b each time. But what if we wish to reduce the weights by a lesser
constant factor say lg(b)=log 2 (b) . In other words, we need
W i +1=W i⋅lg(b) . This means we have to define a function f (d)
such that ,

W
W⋅(f (d +1)−f (d))= assuming W i+1 =W
lg (b)

which clearly yields,

d
f (d)=
lg(b)

Other relations which are far more effective, potent and far more useful
where for instance the reduction factor is variable can always be defined
and formulated. Not to mention the simple technique of pulsing, as
described later can be employed to good effect.

Clearly this means there is no active and conscious use and application
of our intelligence and our faculties, and our abilities are not challenged
and put to good use, nor is our reasoning capabilities sharpened and made
more acute, more responsive and more vigorous, nor do we even become
aware of the enormous amount of possibilities and potentialities that do
exist within these forms, since our active involvement and participation
and our presence of mind in the derivation process itself is virtually nil or
completely absent. Any faculty which is not used but misused only
becomes dull and rusty, and any organ or body part that is not used but
abused only becomes weak and infirm.
Not to make much of it, or to sound like one defending and favoring
numbers over peoples sensitivities and sensibilities, but this low opinion,
and contemptuous view that people have of numbers is reflective and is
evident in the flippant and derisive remarks they often make about
numbers such as, “Numbers do not really mean anything”, or “Numbers
can be used to prove anything”, which tends to devalue and discredit
numbers as being unreal, illusory, objects conjured up and concocted to
represent equally fictitious facts and figures, when the plain fact of the
matter is, numbers never lie, but people do, for their own ungainly and
sordid purposes and in order to convincingly do so try to use numbers to
support and to substantiate their own often exaggerated, male-factory and
false claims and arguments. No one ever doubts the length and the time of
the day or their own bank balance, or their own life expectancy or their own
age,or their own height,or their own weight etc, even though these latter
personal measurements may be subject to change over time. The reality is ,
countries do prefer a democratically elected form of government, and
people do value the results of an opinion poll on their favorite product and
an opinion poll on their favorite personality.

Unfortunately, this lackadaisical, slipshod, often sly, slick, underhand,


glib, seditious, working at cross purposes, mutually destructive manner is
how we conduct our day to day affairs, and go about our daily business.
This attitude has permeated the core or our nations being and corrupted our
national ethos . Our confidence and our trust and our good faith is never in
virtue but only in wealth. We perceive money as the solution to and the
answer to all of our problems, when in fact the opposite is what is true.
Money is always the problem and morals is always the solution. Neither is
there ever a surplus in wealth, nor is there is a ever a deficit in virtue.
HASH TABLE MANAGEMENT

PRELIMINARIES

The most common use of the word “Hashing” is in the context of a Hash
table, which is a data structure providing speedy, to the point of near instant
access to data based on a key value, and we begin with that. Everyone who
is familiar with programming or has a background in computer science
knows that there are two principal ways of locating a piece of information
in memory, the computed address method which constitutes hashing, and
the linked method, which is by and large trees.

There are a couple of seemingly true affirmations made about Hashing


that can safely be dismissed as misconceptions or at best myths which are,

1. That its worst case behavior is in sharp contrast to its best case
behavior. As we shall see, this is not true. Its worst case behavior closely
shadows its best case behavior.

2. That Hashing cannot preserve the sorted order of key values. Rather
Hashing achieves its performance by scrambling key values alone. As we
shall see, there is no reason why Hashing cannot be used to maintain data in
a sorted order with negligible(not noticeable) loss in its performance when
compared with its scrambled counterpart.

Moreover, as we shall see, natural order is never an artificially imposed and


externally enforced rule, and is always concealed, coded and contained in
seeming disorder.
In most practical cases, the size of the key universe far exceeds the size
of the hash table by a inordinate factor, so an insignificant subset of the key
universe is what is actually stored in the Hash table. Owing to a seemingly
infinite number of key distributions , this almost always results in conflicts
or collisions which must be skillfully and deftly handled.

Techniques for countering collisions fall under one of these two


categories,

1. Techniques which minimize the collisions if not totally eliminate


them. This leads to the notion of Perfect Hashing.

2. Techniques which resolve collisions by Open Addressing (chaining)


or by resorting to Re-Hashing, both of which are covered in the ensuing
exposition.

It must be emphasized that this investigation is only concerned with


External Hashing or Hash tables stored in a file on the hard disk drive.

Before proceeding further, the basic concept of a Hash function is


introduced.

A Hash function merely maps each key value in the finite key universe
to exactly one of all the possible positions in the Hash table not just in a
deterministic way but also in a fairly distinctive way. This means that for
any two keys K 1 and K 2,

1. K 1=K 2 ⇒ Hash ( K 1 ) =Hash ( K 2 )


2. K 1 ≠K 2 ⇒ Hash ( K 1 ) ≠Hash ( K 2 ) with a high probability for
any two key values belonging to a random selection of key values. This is
possible, because not all binary combinations have the same probability of
occurring. Key values of different lengths have different statistical
probabilities of being selected, the probabilities increasing with increasing
length. This second requirement can be reasonably met as the section on
Perfect Hashing will demonstrate.

Most hash functions distribute the key values among the different
positions of the Hash table uniformly, although when the expected
distribution of keys is either completely known or can be reasonably
predicted beforehand, custom Hash functions that distribute the key values
non-uniformly among the different positions of the Hash table can also be
defined.

A Hash function always partitions the key universe into subsets(classes)


equal in number to the size of the Hash table with every key value from the
key universe belonging to exactly one class. All the key values belonging to
a class are equivalent as far as the Hash function is concerned. All classes
will have the same number of elements if the Hash function is uniform. As
one can imagine even the size of a class itself will be very large and the
probability of a collision occurring is almost a dead certainty. However this
fact is not to the detriment of Hashing, since the objective in Hashing is
never really to avoid the problem of collisions altogether but to minimize
the damage caused by them by fairly uniformly distributing the collisions
across the entire Hash table, for the vast majority of random distributions.

The general idea behind Hashing is always to divide the entire length of
the key into parts (usually equal), and either to use these parts selectively
and individually or to use several of them collectively by combining them
in some fashion. Regular hash functions that do not chop or churn,
normally tend to divide the key into two parts, a quotient and a remainder,
and can be classified as those that use the BY operator to determine the
quotient or those that rely on the MOD operator to obtain the remainder,
both relative to the Hash table size. Functions which maintain the relative
sorted order of key values, may be called gather hash functions while
functions which scramble the relative sorted order may be called scatter
hash functions.

As one might guess, the quotient has a narrower spread of values


compared to the remainder for most distributions of key values. Hence
Hashing using the BY operator even though it preserves the sorted order of
the key values usually results in more number of collisions with smaller fill
factor or table usage than Hashing using the MOD operator which
results in fewer collisions and hence better table usage.

An interesting observation or discovery that one makes early on in


scrambled hashing is that irrespective of either the size of the key universe
or the size of the Hash table, for any random selection of key values only
about half the locations in the Hash table are always filled up, and the rest
are all collisions. There is a simple reason for that. Let M be the size of the
key universe and let S be the size of the Hash table. Assuming that S is not
too small, since the hash function clearly associates each of the values of
S to S non-intersecting parts of M and even though they may or
may not be equal, under the assumption that they are equal, which is
reasonable as substantiated later if M is divided into S equal parts,
of all the terms in the following sequence all of which represent possible
distributions to which a random selection of S keys out of M could
be mapped to,

S
S S S
() ()
S ⋅ , ⋅ , …, 2 ⋅ S , …,
1 2 2
2 ()
S S
⋅ S , ⋅ S
S −1 S − 1 S S ( ) ()
the value of the middle term accompanied by a few neighboring ones
will always be dominant, even if this sequence is not exactly symmetric
S
() S
about the middle term. S is very close to 2 which will be considerably
2
large even for modest values of S , and often larger than the size of the key
universe itself. Perhaps this also explains why the average number of look
ups needed to find an element in a Hash table be it by chaining or by re-
hashing is always close to 2, the other factor in the middle term. It would
have to be so, if the assumption made here in the first place that the number
of uniform distributions will clearly outnumber and outweigh the number of
non-uniform distributions, is in fact reasonable. That it is reasonable can be
confirmed by expansion. The general form of a term of the above
sequence , for any distribution is,

( KS )⋅ P ( S , K ) ⋅ K ! for K =1,2 , …, S
where P ( S , K )is the number of K partitions of S. Clearly this general
term has the maximum value for K =S /2. Now, P ( S , K ) itself can in
turn be expressed as the summation,
l=K

∑ P ( S − K ,l )
l=1

The largest term in this summation will correspond to the case l=K /2.
In general if we assume K =S /c , where c is an arbitrarily chosen
constant, then the mean of the distribution will always be

S S
⋅ 1+ ⋅ 2 ⋅( c − 1)
2⋅c 2⋅c 2⋅ c −1
= ≈c
S 2
c
which makes the initial assumption very reasonable.
RE-HASHING

The idea behind Re-Hashing is exactly what that word suggests. To try
and to try again, but never endlessly, until one eventually succeeds or one
runs out of tries and is forced to give up and is condemned to admit failure.
The downside to Re-Hashing is that the chances of repeated success
drastically decreases with increasing success, since the number of positions
in the Hash table that are left to occupy decreases with each success and
these remaining vacant positions tend to be randomly scattered all across
the Hash table, but the maximum number of tries allowed for any given
key value is always fixed and always the same. It is somewhat like trying to
climb the professional tree in the real world, although in this case, in an
inverted sense, because the chances of future professional success only
increases the higher one climbs, since the number of contenders for the top
positions drastically decreases and it becomes easier, and ones prospects
only become brighter. But this is just a quantitative assessment.
Qualitatively, there is both healthy and unhealthy competition. Whereas
competing to be virtuous brings out the best in people, competing to be
wealthy since it is motivated by greed and by selfishness, always bring out
the worst in people therefore it promotes rivalry and unjust comparison
between them and does not recognize or improve upon their own God given
innate natural skill and ability. Pitting people against each other is wrong.
Besides the higher one climbs, the tougher the competition gets and neither
is the superior too eager to relinquish his seat to an aspiring yet unworthy
and usurping subordinate, unless he himself is moving on further up the
hierarchy and is not being ignominiously cast down and most likely also
out. This is clearly the wrong way to go. Survival of the fittest might be
alright for the primitive and the uncivilized, but not for the evolved. The
natural course is always the best course. Everything must take its own time
and everyone must have their own way. The only person one has to be like
or one has to be better than is oneself.

The definition of a Re-Hash function is similar to that of a Hash


function. Instead of associating a single randomly chosen position of the
Hash table with each key value, a Re-Hash function merely associates a
very small finite subset of positions with each key value. Only, the subsets
are such that for any given random selection of keys, the number of
intersecting positions is always fairly minimal and always fairly
uniformly(equally) distributed.

th
The general Re-Hash strategy adopted here for the i Re-Hash is
simply,

1. Transform key i times else based on i .


2. Take MOD Hash table size to obtain the Hash value.

An interesting Re-Hash function which does very well, as borne out


upon testing it is

K ⋅ ( i+2 )
H i=
( i+1 )
This is because, as the Hash table becomes increasingly filled up, the
number of near misses will matter more than the number of nowhere near
misses. So every time a location is guessed, it might be a good idea to
inspect close by pseudo-random locations as well to increase the chances of
a hit. This observation is the inspiration for the design of the following Re-
Hash function.

I now direct the reader to the Code Listing and Test Results sections in
the Appendix to see the performance of the rehash function rehash0.

A better strategy which should be more or less perfect would be looping


the loop. This means the rehash function incorporates both behaviors that of
clustering and dispersion by gradually increasing the stretch(part of the
table being searched) by increasing the number of nearby searches within
that stretch for increasingly larger sized, non-overlapping stretches
corresponding to i , and the order in which these stretches or parts are
accessed could depend on the value of the key itself. You could try defining
such a hash function as an exercise.

This is for single Re-Hashing, but pairing provides significantly better


th
results. In pairing, if K is the key value, H i is the i Hash value
corresponding to it, and S is the Hash table size, K is placed in one of
either of the two buckets, H i or S – H i – 1, whichever contains fewer
number of keys. Although, if the limit of tries is t , this requires examining
2 t Buckets in the worst case while searching, but this is an acceptable
overhead, considering that the majority of the cases will only be best, and
the average cases will be comparatively fewer and the worst cases will be
still fewer.

While inserting, if the pair of candidate buckets is full, the first bucket is
scanned for the key which Hashes in exactly one Re-Hash by pairing, to the
Bucket with the most room, and if one is found, that key is transferred to its
next bucket. The key being newly inserted takes the former position of the
relocated key in the scanned Bucket. If scan fails to find a Bucket with
room, the key being newly inserted is swapped with the key with the least
number of Re-Hashes in the scanned Bucket.

I now direct the reader to the Code Listing and Test Results sections in
the Appendix to see the performance of one rehash function rehash1,
which incorporates the above idea.

The performance of Re-Hashing with regard to search and insertion


time, or tries to find a possible Bucket is always minimal and fairly
constant for a hit, but always the artificially imposed limit for a miss.
OPEN ADDRESSING

Open Addressing is also called synonym chaining, since for all


theoretical and practical purposes, colliding keys can be treated as
equivalent or synonyms. Coming directly to the point this section focuses
on both ordered and unordered Perfect hashing which drastically minimizes
on the number of collisions, for nearly all the most likely random
distributions.

If Re-Hashing constitutes the worst of Hashing, its most ugly side, but
only when the table approaches being filled to near capacity, Perfect
Hashing constitutes the best of Hashing, its most beautiful side. As far as
my efforts have led me to conclude, and convinced me to believe, even its
ugly side is not too bad looking or unshapely at all. I have no reservation in
admitting that it was my attempt to fill the Hash table by pairing positions
during Re-Hashing that first prompted me and later instinctively
compelled me to come up with the unordered Prefect “Flip Hash” function.
UNORDERED PERFECT HASHING

The method of defining Perfect Hash functions examined here is based


on the observation that for any two key values to be unequal one of either
their respective quotients or their respective remainders (again relative to
the Hash table size) or both must be unequal.

The one desirable property a Hash function should possess is that it


distribute any random selection(distribution) of keys fairly evenly among
the different positions of the Hash table. Hashing using the MOD operator
is common and performs well in most cases and provides a good platform
and a good starting point for one looking to build a Perfect Hash function.
So this discussion focus's its attention on , and restricts itself to Hash
functions based on that operator alone.

Let M be the size of the key universe and S be the Hash table size.
Further,

Let D=M /S , R=M MOD S


It is reasonable to expect the Hash function to map S to different
permutations of S depending on the value of the quotient D , so the
remainder R is mapped to different positions in the Hash table for different
values of the quotient D . Ideally for a given value of the remainder R , the
Hash function must associate D different values of M which are
unlikely to occur in a cluster and be chosen together in any random
distribution of M . A Hash function which behaves in this manner enables
one to fix a practical and a reasonable upper limit on the Bucket size of the
Hash table, specific to that Hash function. Although defining such a Perfect
Hash function may seem like a theoretical impossibility yet it is practically
achievable for a majority of distributions.

At this point I refer the reader to the code listing and the test results
sections of the appendix to see the performance of one recursive function
the “Flip Hash”, which performs admirably for most random distributions.

Emboldened and encouraged by the results of this function one is


tempted to figure out other possible permutations for defining Perfect Hash
functions. Thinking along similar lines, since we need to solve the problem
for a permutation set of size S , suppose we assume we have already solved
the problem for a permutation set of size S /2, then by analogous reasoning
we have clearly also solved the problem for a permutation set of size S / 4 .
So we could say the solution to the problem of permutation set of size S ,
for one value of its bit in the quotient, is the same as its solution for the two
permutation sets of size S /2 corresponding to the two of the four possible
values of the two respective bits for them in the quotient, but with the two
permutation sets of size S /2 interchanged(reversed) for the other bit value
in the quotient for the permutation set of size S .

Let A and B be permutation sets of size S / 4 , corresponding to the two


possible values of its bit in the quotient respectively, and let C and D be
permutation sets of size S /2 again corresponding to the two possible
values of its bit in the quotient respectively, and finally let E and F be
permutation sets of size S , also corresponding to the two possible values of
its bit in the quotient respectively. Then we have

C=( A , B )
D=( B , A )
Likewise,

E= ( C , D )
F=( D , C )
For the summary listing of results of the performance of this Hash
function called “Mirror Hash” the reader is referred to see code listing
and test results section of the Appendix again. Its performance is similar to
that of “Flip Hash”.

The first function was inspired, the second function contrived. I call hash
functions which generate permutations of the remainder based on the bit
pattern of the quotient as “one way or the other” Hash functions.

One can try and figure out how the functions “flip hash” and “mirror
hash” themselves work. They always flip or mirror the value of part of the
remainder R based on both the number of one's in the value of the quotient
Q as well as the position of each one in the quotient Q , so the number of
distinct positions assigned to a single value of the quotient Q will always
depend on the exact value of the quotient Q itself, spread differently over
all the S possible values of the remainder R for different values of the
quotient Q . All these functions are doing is, assign a single value of the
remainder R to one of all the different possible values of the remainder S
depending on the value of the quotient Q . Clearly, this means that, a single
value of the remainder R will be spread differently over values of S , for
values of quotient Q with fewer number of ones in them, than for values of
the quotient Q with more number of ones in them.

It may be worthwhile to analyze and to find out the probability that two
keys which differ in exactly b bits result in the same Hash value when such
a Perfect Hash function is applied to them, but such an analysis is not
undertaken here.

This suggests that one can adopt a probabilistic approach to defining


Perfect Hash functions. There is a section containing a brief overview of
finite probability towards the end of this book. The variation in the bit
pattern of all the numbers of a fixed length is always the same, so a definite
probability of each number being chosen can always be calculated for a
finite universe. This fact can most certainly be brought to bear when
defining a Perfect Hash function, not to mention for probabilistic numerical
encoding, where numbers with a larger probability of occurring are
represented by sequences of shorter length and numbers with a smaller
probability of occurring are represented by sequences of longer length, in
other words the length of the encoded number varies inversely with the
probability of occurrence of its unencoded counterpart. The probability of a
key value being selected is always dependent on its exact value. Roughly,
though not precisely, if the length of the key is l , the probability of

selecting any key value which has b one's in it, will be ( bl ) which is the
l
2
l
( )
same as l− b which is the probability of selecting any of all their
2l
bitwise complements. So there is reasonable symmetry in the distribution of
probabilities among the values but only when seen in this broad light.
When we get into finer detail, there may not be an obvious symmetry in the
probabilities of different values being selected. For instance, the probability

of choosing exactly one value out of those (bl ) values will only be

( bl ) × 1 = 1
l l
2 l 2
( b)
which is uniform across the board and does not present an accurate picture.
Now the numerator on the LHS of the above equality can be expanded
using the combinatorial identity

l = l −1 + l− 1
() ( )( )
b b b −1

So the probability on the LHS can be split into two terms as


l−1 l −1
( ) +( )
b b−1
2⋅ l 2⋅ l
l l
(b) ( b)
The first term represents the probability of selecting any value of length
l
l whose length l – 1 has b ones in it from within the set of ()
b
values of

length l . Likewise,the second term represents the probability of selecting


any value of length l whose length l – 1 has b − 1 ones in it from within
l
the set of ()
b
values of length l .

The probability of selecting exactly one value from exactly one of these
l−1 l−1
two sets is one of,
( ) , ( )
b b −1
respectively. The
2 ⋅ l ⋅ l −1 2 ⋅ l ⋅ l −1
l l
( b ) ( b ) ( b ) ( b − 1)
same process of splitting the numerator using the combinatorial identity
must now in turn be recursively applied to each of these two probabilities.

If the length of a value v is l , and it has b one's in it, one can calculate its
probability p of being selected as under,

ONES ( v ,l )
1) c ←0
2) FOR i=0 …l −1 STEP 1
1) IF ( v MOD 2=1 ) THEN
1) c ← c+1
2) v ← v /2
3) RETURN c
PROB ( v , l ,b )
l
1) C ← ()
b
2) D ←1/C
3) m ←l
4) WHILE ( b >0 )
1) IF ( [ v /2l − 1 ] MOD 2=0 ) THEN
1) C ←C ⋅ ( l − b ) /l
2) ELSE
1) C ←C ⋅b /l
2) b ← b − 1
3) l ←l −1
4) D ← D /C
5) p ← D /2m
6) RETURN p

However, an alternative simplistic multiplicative method for reasonably


estimating the probability of a specific value can be employed as well as
follows,

FIRST ( v , l )
1) FOR i=0 …l −1 STEP 1
1) IF ( v MOD 2=1 ) THEN
1) BREAK
2) v ← v /2
2) RETURN i

PROB ( v , l ,b )
1) IF ( b=0 ) THEN
l
1) p ← 1/2
2) ELSE
1) IF ( b=1 ) THEN
1) p ← l/22 ⋅ l −FIRST ( v , l )
2) ELSE
1) b1 ←ONES ( v , b )
b
2) v1 ← v /2
3) p ← PROB ( v 1 ,l − b , b − b1 ) ⋅ PROB ( v ,b , b1)
3) RETURN p

This way the problem only needs to be solved for the base case of b=1
which is solved here by assuming that the larger number has a higher
probability of occurring than the smaller one, hence the probabilities halve
with each decreasing bit position. A slightly better estimate of the
probability of the base case can be had by making a uniform correction of
1/ ( l ⋅ 2l +1) for each of the l probabilities, or again apportioning the net
l +1
correction of 1/2 in an inverse binary progression, among the l
probabilities.

Based on the above two ways of generating probabilities, one can define
a perfect hash function in one of two possible ways,

1. The more complex exact theoretical way which may be a bit more
effective, and

2. The simpler approximate, practical way , since this involves the


slightest of binary splits, which may be a little less if not just as
effective.
Let l be the length in bits of the key universe M , and let j be the
length also in bits of the Hash table size . The theoretical way to define a
perfect hash function gradually, is as follows,

1. Assume that we have already generated all possible key values of


length l= j along with their respective probabilities and have the set
sorted in order of probabilities, and also assume that we have already
assigned corresponding Hash values to all these key values.

2. Generate all the possible key values of length l +1, along with their
respective probabilities, and sort this set also in order of
probabilities.

3. A single hash value corresponding to a key value of length l has to


be assigned as the hash value corresponding to two values of the key
of length l+1. This can be done as follows. Assign the hash value of
the least likely value of the key of length l , as the hash of the most
likely value of the key of length l +1, and since there may not be
symmetry in the probabilities of the key values, therefore assign the
same hash value of the key value of length l , also as the hash value
of the next more than least likely key value of length l+1. In like
manner, assign the hash of the next more than least likely key value
of length l , as the hash of the next less than most likely key value of
length l+1, and also as the hash of the next more than the next more
than the least likely key value of length l +1, and so on cyclically.

The practical way to define a perfect Hash function is as follows.

l
1. Let =c . Divide l into j parts of c bits each.
j

2. Now, clearly each of the 2c values of a single part of length l , have


to be assigned to one of the two possible values 0 , 1 of a single bit
corresponding to the unit length of j .
3. Generate all the 2cvalues, along with their respective probabilities
and sort the set in order of probabilities.

4. Since both values 0 and 1, have an equal probability of occurring,


therefore assign each of them as the part hash value to one of the
pair of alternative part key values, flipping or reversing their order
each time, thusly 1,0,0,1,1,0,0,1 ,…. But, this must be done only
to the set of 2c part key values sorted in order of their respective
probability.

5. Now the hash value for any value of the key is simply formed by
composing or concatenating or juxtaposing, the unit hash values
corresponding to the part key values of each of the j parts in that
order.
ORDERED PERFECT HASHING

This section is concerned largely with a formal approach to defining


ordered hash functions. The intention is to come up with a hash function or
perhaps even a set also called a family of possible hash functions, which
preserve the sorted lexicographic and/or numeric order of the keys, while
yet fairly uniformly distributing the collisions across the hash table, as a
result of which its occupancy or fill factor using computing terminology
would be automatically maximized. Suppose we wish to hash a key k < M
to a value h< S where M ≫ S . For the sake of simplicity, without
loss of generality I assume k to be an integer but take h to be a real
number although in practise h will also be an integer approximation of
itself. Assuming M and S to be implicit to the function, preliminarily such
a function along with its more or less exact inverse might be defined
basically in one of the following two ways, although the possibility of any
number of valid combinations thereof cannot be ruled out,

I. Product

h ( k ) =( S −1 ) ⋅ f ( k )

and,

−1 −1
h ( v )=f ( v / ( S − 1))

II. Power
h ( k ) =S f ( k ) −1

and ,

−1 −1
h ( v )=f ( log ( 1+v ) /log ( S ) )

where,

−1
0 ⩽ f ( k ) ⩽ 1∧ k 1 <k 2 ⇒ f ( k 1 ) ⩽ f ( k 2) ∧ f ( f ( k ) ) =k

always holds good. In practise ⌊h ( k ) ⌋is what will be used when it is not too
large to serve as the index to the hash table. Recovering the exact or the
close value of the key back from its integer hash by inversion although very
rarely a necessity and even if technically may not be a total impossibility,
could still prove to be a bit tricky.

Two canonical definitions of f ( k ) which readily suggest themselves to us


k log ( k )
and instantly spring to mind are f (k )= and f ( k ) = .
M log ( M )

It must be pointed out here that among a hoard of other possible


implications in general, at least the following apparent ones along with their
possible extentions must be noted since they prove to be fairly useful for
defining really good hash functions.

We can observe that,

n
0 ⩽ f ( k ) ⩽ 1∧ n ⩾ 1 ⇒ 0 ⩽ f ( k ) ⩽ 1

not to mention any number of morphs(mutations) of f ( k ) itself can be


defined for instance as under,
2
0 ⩽ f ( k ) ⩽ 1⇒ 0 ⩽ 2 − ⩽1
1+f ( k )

Also since,

w 1 ⋅ f 1 ( k ) +w 2 ⋅ f 2 ( k )
0 ⩽ f 1( k ) ⩽ 1 ∧ 0 ⩽ f 2( k ) ⩽ 1 ⇒ 0 ⩽ ⩽1
w 1 +w 2

which suggests that the desired statistical properties f ( k ) must possess


for a particular expedient not only matters and comes into the picture but
also plays a crucial and a vital role in its formulation.

Neither can one ignore the possibility of defining f ( k ) by way of


functional composition since,

−1
0 ⩽ f 1 ( k ) ⩽ 1 ∧ 0 ⩽ f 2 ( k ) ⩽1 ⇒ 0 ⩽ f 1 ( f 2 ( f 1 ( k ) ) ) ⩽ 1

Also, apart from the comparitively fewer number of definitions of f (k )


which lead to a fairly uniform distribution of the keys by h ( k ) , most
definitions of f ( k ) which maintain the orderedness property will more
often than not possess either one or the other of the following two
characteristics which conveniently for us, turns out to be well suited and
immensly helpful for defining really good hash functions by means of
double inversion.

If f ( k ) satisfies,

f ( k +2⋅ d ) − f ( k +d ) > f ( k +d ) − f ( k )
we define h(k ) as,

h ( k ) =( S ' −1 ) − h ' ( M ' − k ) if k ⩽ M '

and ,

h ( k ) =S ' +h ' ( k − M ' ) if k > M '

but if f ( k ) satisfies,

f ( k +2⋅ d ) − f ( k +d ) < f ( k +d ) − f ( k )

we define h(k ) in the opposite manner as,

h ( k ) =h ' ( k ) if k ≤ M '

and

h ( k ) =S ' +h ' ( k − M ' ) if k > M '

where h ' ( k ) is nothing but h ( k ) itself but defined over M '=M /2


and S ' =S / 2 which also is implicitly assumed. This basic scheme
defines a hash function in a mutually recursive manner.

For one possible implementation of a double inversion hash function and its
performance where the collisions is practially zero, see the code listing
followed by the test results sections of the Appendix.
HASH TABLE ORGANIZATION

This section presents several practical methods for breaking up an Open


Addressed Hash table.

Before proceeding, the idea of coalescing must be introduced. If several


logical nodes , like a tree node or a hash table or a hash table bucket, etc are
stored in one physical node, then the logical nodes are said to be coalesced.
This will result in floating logical nodes which is the periodic relocation of
the logical nodes among the available physical nodes. It will not be too
expensive, if done right. I assume that such type of coalescing is done for
managing the hash tables in expandable hashing, and the hash table buckets
in the other organizations.
EXPANDABLE HASHING

Herein I elaborate upon a technique called Expandable Hashing which is


a general method suitable for storing all sorts of keys including lengthy
string keys, and Piecemeal Hashing which is an extension of Expandable
Hashing.

Although theoretically the difference between a number and a string is


only perspective and each can be regarded as the other equally well, but for
practical reasons it helps to distinguish between short numeric keys
occupying a machine word and longer string keys spanning several bytes.

The method is both simple and elegant at the same time. For example,
consider typical numeric keys of length 32 bits. Assume that a modest byte
sized Open Addressed Hash table exists in a block in the file and further
assume that MOD Hashing is being used, and the lower order 8 bits of the
key value serve as an index onto the Hash table and equivalent(colliding)
keys are chained within that table itself,the way in-memory Open
Addressed Hash tables are maintained. Initially, there is only one table and
a pointer exists in memory to reference that table. When this table becomes
full, it is split into two tables stored in two separate blocks as follows. The
value of the pointer referencing the initial table is set to an integer index
onto an array of two columns present in memory(a Binary Trie), whose two
values are set to reference the two new tables. The value of the lsb of the
key value is now treated as a subscript to one of the two columns of the
array and the next higher order 8 bits of the key value skipping the lsb
serve as the index to exactly one of the two tables depending on the value
of the lsb. All the elements from the initial table are Hashed as above and
transferred to the two new tables. When either of the two tables becomes
full again this process of splitting is repeated on it, except now the bit
adjacent to the lsb now serves as an index onto the Trie and the 8 bits
adjoining it(to the left) serve as an index onto either one of the two split
Hash tables depending on the value of the bit used for splitting and so on
the Trie grows further along the array, and the Hash tables multiply further
along in the file.

Before splitting

Pointer
Table
Key

After splitting

Pointer
Key
1 0 Table 0

Table 1

Trie

This scheme offers lookup of the desired table in just 1 direct block
access, but the array housing the Trie though significantly smaller than the
entire collection of Hash tables, can still take up precious memory as it
grows. Clearly, we are interested in merely scaling up the size of the on-
disk table while at the same time scaling down the size of the in-memory
one, yet maintaining constant access time. To a reasonable extent, this is
accomplished by transferring a portion of the in-memory Trie itself onto a
block in the file on disk, as another level Hash table or a Hash table of Trie,
so now the two consecutive lower order 8 bits of the key value come into
play, and each serve as the index onto the two hash tables at each of the two
successive levels, the lower lower order 8 bits for the first level and the
higher lower order 8 bits for the second level, assuming that the numbering
of the various levels of a tree are in an inverted sense, with the root being
the first level or topmost level, and subsequent lower levels numbered
incrementally. The first level Hash table has a different organization. Its
structure or format is as follows

Before splitting After splitting

Key Key
1 0 1 0
Table Table

H H
a a
s s
h T h T
r r
t i t i
a e a e
b b
l l
e e

When the second level Hash table becomes full it is split and the Trie in
the first level Hash table grows. When the first level Hash table becomes
full, it is split similar to the way the second level Hash table is split, and
the in-memory Trie grows. Which means, now only the Trie for the first
level will be in memory considerably reducing memory usage, although
increasing the number of block access's by making it 2 or indirect access,
instead of just the 1 for direct access, which is acceptable for most
situations. Similarly one can have a three-tier or a double-indirect access
and so on. So this scheme incorporates both possible combinations of Trie
and Hash in that we have both a Trie of Hash as well as a Hash of Trie.

This form of Binary splitting is adequate though it can be increased to


more, for instance by setting aside two lower order bits of the key value and
splitting into four tables. However, this would be practical only if the size
of the on-disk table itself is large enough to justify splitting it into four
tables, since the usage of the four tables will only be a fourth of the original
table, assuming that the split distributes the keys in the original table evenly
among the four tables, but tables can always be coalesced with minimal
acceptable wastage, like for a Trie, by similarly distinguishing between a
physical Hash table and a Logical one, with a single Physical Hash table
capable of holding up to four different Logical Hash tables. If a higher
order Trie is used it might be more efficient for the in-memory Trie node to
be represented as a linked list rather than as an array. Just as for the case of
the Binary Trie, one can have either a direct or any level of indirect access
to the Hash table for the higher order Trie as well. If indirect access is used,
a compromise could be for the first level Hash table or Hash of Trie alone
to be a higher order Trie. This means only the in-memory Trie will be a
higher order Trie and not the on-disk Trie. The on-disk Trie itself will
remain a Binary Trie and the second level Hash table will always be split
into two.

For MOD Hashing we move from rightmost bit (lsb) to the leftmost bit
(msb) of the key value. Similarly for BY Hashing we need to move from
msb to lsb of the key value. The principle is symmetric. Hashing using the
BY operator will preserve the sorted order of the key values, without any
appreciable or discernible loss in efficiency.

The above technique can be extended to Hashing lengthy string keys in a


recursive manner. The idea for hashing strings is well know. We first hash
the string to generate a numeric code of the string and use that Hash code
for referring to a subset of the keys which Hash to the same numeric Hash
code, counting on the Hash function to generate a fair numeric spread for
any given distribution of string keys. Here the entire length of the key is
divided into smaller units or fragments usually the size of a
number(machine word) and the first fragment is Hashed and if that
fragment is associated with a unique key its corresponding record reference
is kept, else a reference to a bucket of (key, record reference) pairs is
kept. When that bucket becomes full, the value of the bucket reference is
set to point to another Hash table, and later when that table becomes full, it
is split.

Whereas expandable hashing is equally applicable to Hash tables that do


not preserve the sorted order of keys as well as to those that do, a few
other better ways of organizing an ordered hash table, are also possible.
HASH COMPRESSION

This section can also be treated as an extention of the previous section


on ordered perfect hashing, since it arises naturally as a continuation of it. It
introduces hashing as a technique for lossless coding of information. Since
the argument for compression always presupposes itself, and in trying to
make any such argument, or come up with any such method, we invariably
discover that gain on one side is always offset by loss on the other side, we
mistakingly tend to think that the argument for compression is always
circular, but in my book, that is never any reason for one to be disheartened
and to be discouraged and to give up; since clever non-circular arguments
for compression can always be constructed and come up with. Only on the
surface does lossless compression seem to be an impossibility, but in fact
excellent lossless compression is a stark reality.

One universally applicable technique which can be used to significantly


enhance the efficacy of most of these hashing methods detailed here is that
of “pulsing”, by which I mean the spontaneous lowering of the power and
the raising of it back up again, a technique which unlocks the door to all
sorts of interesting possibilities each of which in turn lead to the unraveling
and to the unfoldment of many similar as also many dissimilar methods.

Besides, other clever artifices and devices can always be invented and
employed to good effect either independently, or in conjunction with these
basic techniques to enhance and to improve them considerably.
HASHING BY PARTS

A natural way to define h ( k ) so as to be able to effectively build it


piecemeal is by apportioning or splitting its domain and its range into
relatively distinct parts and to map each part of the domain into the
corresponding and respective part of its range, which is basically just
apportioning. This can be done by expressing h ( k ) either as a product or
as a sum of hash functions defined over smaller domains and their
corresponding smaller ranges. What sort of an abstract algebraic entity,
such as a homomorphism or an isomorphism etc, should/could this
mapping constitute could be another interesting and engaging and
engrossing line of thought and inquiry. The definition of h ( k ) expressed as
a product occurs to us readily and is simply as,

h ( k ) =∏ h i ( k i )

Two pairs of simple techniques, along similar lines with varying


emphasis and varying ways of performing the apportioning are outlined
here.

Suppose we wish to map M → S , and we wish to perform the


coding of M in S in p steps. This means we will have to divide
1/ p
M into p equal parts of length M each. However, in view of
the fact that M and S are of different lengths with S being
considerably shorter in length thanM we figure the radix to which
S must be raised may or may not be the same as that of M .Let the
tentative corresponding radix to which S must be raised be
n
x where 0< x <1 . If now M =S , we can conjecture that
x will depend on n , though in what way is as yet unknown.

At the end of the first step, the new effective values for M and for
S will be M ( p−1)/ p and S x ( p−1) respectively. This means that, for
this mapping to to effective, at the end of p – 1 steps we require,

n/ p
S
x
=1
S
or,
n
x=
p

and x <1 can always be ensured by taking p>n . Which tells us that
it should be possible to reasonably code M in S and the argument is
clearly non circular.

In an almost analogous manner, but with the emphasis shifted from S


x 1/ p
to M , in other words the mapping, M → S can be done, which
yields a value for x as x=1/(n⋅p) .

Now we look at another way of apportioning which offers better


compression.

Suppose we wish to map M → S , and we intend to do so in p


repetitive steps or a procedure performed p times over. We fix
reasonable integer b and determine real 0< x<1 such that,
xp 1/ p
M =S =b

which yields,

1/ p
log (S)
x=
[
p⋅log( M ) ]
Now the mapping is carried out as follows,

xp 1/ p
M →S

and,

1− x p ( p−1)/ p
M →S

Whereas, the first mapping is clearly always possible, we need to see


whether the second mapping improves matters and makes the situation
better. In order for it to do so we require,

1−x p
M M
( p−1)/ p
<
S S

which means,

S1/ p < M log (S)/ log( M )

as straightforward substitution and simplification yields, which will


almost always be true, and the case when it is not true, only represents a
situation which is more advantageous, since it means the value of S is
excessive , and we are being wasteful, and we can afford to be more
conservative and sparing in taking the value of S , otherwise we can
afford to code a larger M for this same value of S .

In order to see how tenable it is, we need to ascertain the efficacy of this
map. Let us try and put forth a general argument which is convincing and
optimistic, even if does not settle the issue once and for all.
We begin by considering the ratio,

M log (S)/ log (M )


1/p
S

Proceeding further with the argument and reasoning non-inductively, for


the next step this yields the ratio,

log (S (p −1)/ p )/ log( M)


M
1/ p
S

This clearly means, at the end of p−1 steps we require,

log (S 1/ p)/ log( M )


M
1/p
=1
S

or,

(2 / p)⋅log (S)
⋅log( M )=(1/ p)log (S )
log(M )

in other words,

1=1
which seems to suggest that it should infact be possible to code a fair bit
of M in S .

Likewise, the coding with the emphasis shifted fromM to S can


n
be done in more or less an analogous manner. Again assuming M =S
, and that the coding is to be done in p steps, and that 0< x<1 , the
base case mapping will have to be

( p−1)/ p xp S⋅(1−x)
M →S ' where S ' = and again the reasoning
x
here must be non inductive, along similar lines as for the previous
( p – n)
technique which yields x= , which means p>n must be
p
true, which can always be ensured.
Another natural way to build h ( k ) effectively would be to decompose or
to disjointly split or partition its domain as
M =M 0 ∪ M 1 … M n −2 ∪ M n − 1 as also its range as
S=S0 ∪ S1 … Sn − 2 ∪ S n− 1 and to define the hash function as mapping
each disjoint part of its domain to the respective and corresponding disjoint
part of its range simply as,

h ( k ) =∑ hi ( k i )

As I initially discovered it, one way of recursively partitioning the


Domain of the hash function along with its corresponding Range gives rise
to a form of coding which might be called replacement coding which
proves to be one of the principal ways of hashing. Any method of coding
which essentially replaces the more expensive arithmetic operations by the
less expensive ones, is essentially nothing but a form of replacement
coding.The following facts regarding the growth of numbers as a result of
performing the common arithmetic operations of addition, multiplication
and exponentiation on them serve as the motivation for discovering this
form of coding. These observation’s are pretty plain, but I list them below
nonetheless.

 Addition increments lengths. If we add two numbers of length l bits


each, the resulting sum will be a number of length not more than l +1 bits.
 Multiplication adds lengths. If we multiply two numbers of length l
bits each, the resulting product will be a number of length not more than
2. l bits.
 Exponentiation scales lengths by powers of 2. If we raise to the
power two numbers of length l bits each, the resulting power will be a
number of length no more than 2l ⋅l bits.
As far as replacing mutifplication with addition goes, four principal
combinations of such hash maps are possible namely ,

 Base of M → Exponent of S
 Base of M → Base of S
 Exponent of M → Base of S
 Exponent of M → Exponent of S

I only detail and outline basic versions of the first two, the following two
can be performed in an analogous manner. You could try doing it as a mild
intellectual drill.

Base of M → Exponent of S

Hash ( k )
1) n ←⌊lg ( S ) ⌋+2
2) h ← 0
3) WHILE ( n>0 )
1) l ←lg ( M ) /lg ( n )
1/ l
2) d ←⌊k ⌋
3) h ← h+2d − 2
4) k ← k − d l
l
5) M ← ( d +1 ) −d l
6) n ← n − d +2

Only on first look does the second mapping seem impossible, but upon
closer inspection and careful examination we gain the necessary insight.
Since the base must be identical, therefore the association or
correspondence will only have to be made between the exponent of M
and the exponent of S . Plainly speaking, in a way we are merely peforming
hashing on the exponents. So the question now becomes what form should
such an association take so as to maximize information gain and minimize
information loss. It must be noted that due to the very nature of
compression based coding the relationship between the two just mentioned
factors will not be mutually complementary, since they represent different
measures. Whereas, information gain is more of a quantitative measure
which is simply the compression ratio, information loss albeit possibly just
another ratio would be more of a qualitative measure of the variation or the
discrepancy between the original information that was encoded and its
decoded counterpart, since any sort of compression coding including
hashing may not always be exactly invertible, in the sense that the entire
and the exact original information may not always be absolutely and
completely recoverable and reconstructable from its coded counterpart.

Hash ( k )
1) b← M 1/ lg(S)
2) l ← log b (S)
3) H ←0
4) WHILE (l >0)
1/ l
1) b'← M
2) l ' ←l⋅log(b ' )/ log(b)
1/ l
3) d '←k
4) d ←⌊d '( log(b)/ log(b ')) ⌋
5) H ←( H⋅b)+d
l'
6) k ← k−d
l' l'
7) M ←(d +1) −d
8) l←l−1

Although these solutions are fairly good we are still not sure if they are
ideal. When initially posed with this problem of ideally defining h ( k ) by
splitting, the question we eventually end up asking ourselves is , “If we
randomly generate a set of l bit numbers , what discernable pattern, what
typical and predictable behaviour, what characteristic property, might we
expect such a randomly generated set of numbers to exhibit ? ”

When viewed in the light of the following fairly well known


combinatorial facts,

b1 >b2 ∧ b1 , b2 ⩽ l/2 ⇒ l > l


i.
( )( )
b 1 b2

l ≈ 2l ∧l >l ⇒ l 1 > l2
ii.
( )
l/2
1 2
( )( )
l1 /2 l 2/2

l = l
iii.
() ( )
b l−b

we can safely conjecture that the likely distribution or pattern the


generated random selection of l bit numbers might be expected to follow
will be a combinatorial one, whose shape by the way will resemble that of a
recursive normal distribution rather than the conventional normal
distribution. I put forth the following argument to justify this conclusion.
The above facts clearly suggest that the probability of any l bit randomly
generated number lying closer toward the middle (point/region) of the
interval will be considerably greater than the probability of the value lying
closer toward the extreme end (points/regions) of the interval, and the
probability distribution will be symmetric about this middle (point/region).
l
Now, we divide the interval of M =2 values into two equal halves.If a
generated random number falls in the leading(upper) half we take M /2
away from it, otherwise we retain it as it is. Either way, we are now left
with an l −1 bit randomly generated number which lies in an interval of
length M /2, and we are left asking ourselves the same question about this
new random number and coming to the same conclusion about it.

While apportioning M and S one must take into account the above
combinatorial nature of the most likely distribution together with the
considerable difference in the information length and in the code length
both of which facts clearly suggest two possible ways one might go about
performing this apportioning, either directly on the actual magnitudes of the
domain and the co-domain, |M| and |S| or indirectly on the magnitudes
of their lengths, |log b ( M )| and |log b ( S )| relative to a particular chosen
base b or even a combination of both the direct and the indirect which I
later discovered as another possibility. A pictorial illustration(not to scale,
since for the sake of convenience both M and S are shown to be of
the same size) of each of these two possible ways is shown below,
|M|→|S|

M M /2 0

Illustration 1: Scheme Direct

S S /2 0

|log b ( M )|→|logb ( S )|
log b ( M ) log b ( M )/2 0

Illustration 2: Scheme Indirect (Length Based)


log b ( S )
log b (S )/2 0

Any number of excellent variants and improvements to the above


schemes by say incorporating the ideas of varying the corresponding direct
or indirect parts of M and S in an inverted and in a mutually
complimentary manner as basic progressions both of them either of the
same kind or of different kinds are all conceivable and possible.Again
combining both the Direct and Indirect Schemes is also possible.I have
already broached the idea for enhancing compression by elucidating the
technique of pulsing, which variant can be succesfully adapted here to
further increase gain. Thereafter, strategically combining and composing
two different hash functions , one of either category is also a clear
possibility. Since we are dealing with combinations, a natural
decomposition of M w.r.t S would be combinatorial.Suppose
M =D⋅S we divide both M and S into n=lg(S) parts and
recursively map,

((n/ 2)+(2⋅ii / n)−i)⋅D →


( ni ) where 0⩽i⩽n
Finally we note that, in all of the above techniques only one of either the
base or the exponent is varied. Varying both of them results in considerable
further gain. The idea is very simple. Two approaches are adoptable, vary
the exponent through it’s entire set or range of values for each possible
value of the base, or vary the base through its entire set or range of values
for each possible value of the exponent. As an example, suppose we have,

l
M =b

b=2. Now, for this initial value


We start off with the initial value for
of b we vary l from its least value 1 upto its largest value l . For b=3
however, we need to determine the new effective value of b such that
2 l
b =2 since b=2 is its preceding value, and we exclude the possibilty of
1 for the exponent also. This yields the next effective value for b as
l /2
b=2 . In like manner, for b=3 we need to determine the new effective
value of b such that b2=2(3⋅l)/ 2 which yields the effective value of b as
3⋅l/ 4
b=3 b=4 we need to determine the new effective
and likewise for
2 (7⋅l )/ 4
value of b such that b =3 which yields the effective value of b as
7⋅l /8
b=3 ,so we can clearly discern the pattern emerging. Such a variation
can be done the other way around also. Varying M as exponent w.r.t base
as descibed above, and varying S the other way around as base w.r.t
exponent naturally suggests itself. This idea can be incorporated into other
schemes of coding as well.

Whereas the above set of functions primarily replace multiplication by


addition, the following technique replaces exponentiation with
multiplication and addition and hence would come under the category of
replacement coding or hashing as well. The idea for this is very simple but
does take a flash of inspiration and a bit of imagination on one’s part.
Loosely speaking this form of coding can be called guess work coding.

Suppose we wish to code a number n ⩽ M , we first choose a base b


(arbitrary for now, optimal later). Clearly, we will have a length l such that,
l
M =b . This means, there will exist l largest digits
d 0 , d 1 , d2 , … , d l − 1 such that,

l l/2 2 l /3 l−1 l/l−1


( d 0 ) ⩽ n , ( d 1 ⋅ b ) ⩽ n , ( d2 ⋅ b ) ⩽ n , … , ( d l − 1 ⋅ b ) ⩽n

We bear in mind that x


p /q
is always ⌊ √q x p ⌋ and not ⌊ √q x⌋ p .
l/i
Let d i be one of the above digits for which ( d i ⋅ bi ) is the closest
value not more than n . In all, there are l digits so d i can be coded as a
value <b ⋅ l which will be the base of the code. Now there will exist some
l/ j
digit d j along with some radix j such that ( d j ⋅ b j ) is the smallest value
l /i
larger than n . So the new value of n will now simply be n − ( d i ⋅bi )
l/ j l/ i
and the new value of M will be (d j ⋅ b j) − ( di ⋅ b i ) . Plainly
speaking, this code is simply based on the near identity,

l 2 l /2 3 l −1l/3 l /l − 1
b ≈ ( b ) ≈ ( b ) …≈ ( b )

An Improvement

By enlarging the base of the code we should get much better


i
compression. In general each of the terms d i ⋅ b can be replaced by one of
j i− j +1
the values in the range over ( d i ) ⋅b where 0 ⩽ j⩽i . This increases
the base of the code to b ⋅ l⋅ ( l+1 ) /2, but should offer way much better
compression.

As it turns out, this is not the only approach to defining a hash function.
A somewhat complementary approach might be adopted and proves to be
more helpful.

By altering our perspective and adopting a different view, which is


somewhat akin to changing our frame of reference, an alternate form of
defining a hash function expressed either as a product or as a power can be
made.

Here, h ( k ) is merely defined as the smallest value v such that,

i. Product

( M − 1 ) ⋅ g ( v )=k

ii. Power
M g ( v ) −1=k

where g ( v ) has the same properties and satisfies the same conditions as
f ( k ) . For the sake of completeness I restate them again,

0 ⩽ g ( v ) ⩽1 ∧ v1 <v2 ⇒ g ( v1 ) ⩽ g ( v2 )

Each of these latter two forms of defining a hash function acts as its own
−1
inverse h ( v )=h ( k ) .

This definition instantly reveals that if one were to substitute the


following binary series for g(v ) ,

2 3
1/2+ ( 1 /2 ) +( 1/2 ) +.....

in the first definition as a product this yields nothing but the regular
binary representation of k as a value for v .However performing the
same substitution in the second definition as a power and its recursive
coding of the necessary factors of M either corresponding to the
factorization of k or independent of it yields what is perhaps replacement
coding in its most simplest and its most purest form,and should offer fairly
good compression.

When we look at the usual binary representation of a number we make


the observation that the weights associated with each bit in each position is
fixed and wonder what would happen if the weights of these positions were
made variable , for the binary case for instance, what if it were possible to
associate two weights for the next bit position, one weight each, based on
whether the bit value of the previous position was zero or if its bit value
was one. The following general identity which is true for any r <1 enables
us to accomplish just that.
2 3
r +r ⋅ ( 1− r ) +r ⋅ ( 1− r ) +r ⋅ ( 1 − r ) +…≈ 1

The binary series is a special case of this for which r=1/2 . The
exact equality holds true for the limiting case, and as with any other
monotonic convergent infinite series, this series can be used to introduce
the notion of a limit,

i
lim ∑ r ⋅ ( 1− r ) =1
n →∞ 0 ⩽ i ⩽ n

So substituting this series for g(v ) in the above hash functions


assuming a different value for r other than r=1/2 and a coding of it,
yields comparitively better results, both for hashing as a product as well as
for hashing as a power. Trivially, we could go about determining r as
1/ n
r=1 − ( 1/n ) where n=⌊lg ( S ) ⌋. The following algorithm can be used
to roughly code k .
1) W ←M ×r
2) H ←0
3) WHILE ( S>0 )
1) IF ( k ≥ W ) THEN
1) H ← H +1
2) k ← k − W
3) W ←W × ( 1 − r )
4) S ← S − ( S /2 )
2) ELSE
1) W ←W × ( r )
2) S ← ( S /2 )
3) H ←2×H
4) RETURN H

Pulsing is again another possibility, since r will increase and tend


toward 1. If we would like constant weights rather than variable weights,
since we know that the probability of a number lying close to the middle of
M is very large therefore we could converge to the middle from both
sides by substituting 2 – r in place of r in the above expansion.

Also, when we closely examine the usual interpretation of such a bit


representation of a number for any vaule of r , in general we can make the
following interesting observation that whereas an operation of addition is
perfomed if the value of a bit is one, no corresponding operation is
performed if the value of the bit is zero, and wonder if data were to always
mean something why not associate an operation with this value as well, so
if addition is associated with the bit value one , then clearly its inverse
namely subtraction could be associated with the bit value zero. This idea
can also be incorporated into the above coding schemes.

We can also assume a non-binary base b>2. Suppose M =bl. Then


there will exist somen ≤ lg ( l ) largest digits d 0 , d 1 , …, d n− 1 such that
l /2 j
i

∏d i ≈ k where 1≤ ji ≤n. These n digits together with the n


i ≤n
j
possible values for l mod 2 which encodes the closest value not more
i

than k constitutes the code.

An interesting form of coding which I came up with first, which got me


thinking along the above lines and eventually lead me to discovering the
above alternate form of defining a hash function is simply to code the near
identity,

1/ 2+1/ 4 +1/ 8+… 1/ 2 1/ 4 1/ 8


k=k =k ⋅k ⋅k ⋅…

1/ 2n
In practice only a finite number of factors n will be required. k
represents the bulk of the code. The preceeding factors are calculated by
repeated squaring. For determining k with fair measure of exactitude,
the following additional information will have to be maintianed,

1) A bit b for the exponent of each factor since,


n –1 n
1/2 ≈(1/2 )∗2+b .
2) A bit b for the base of each factor. This bit specifies whether after
the squaring of the previous(smaller) factor to obtain the next(larger) factor
the value of the previous factor(or possibly twice its value) must be added.
3) A bit b which says whether the floor or the ceiling value of the
factors have to be considered.
These investigations into hashing clearly highlight the inestimable value
of seeing things in a new light and viewing things from a fresh and a
different perspective and the inverted view can always be adopted. In the
context of compression this means instead of trying to solve the forward
problem of compression, we could always try solving the inverse problem
of expansion. In other words, we could begin by assuming we have the
compressed code in our hand and then try expanding the compressed code
to it’s original information, but always do so using only strictly invertible
operations as much as possible. This is the inspiration for the following
couple of posssible definitions.

If we wish to define a coding function C : M → S , we first express


n
M =m0+m1and express S=Q ⋅ D and define C − 1 : M ' × S ' → M as

−1 2 ⋅ ( n/ 2 ) +Q mod 2 −1
C ( m0 ,⌊Q/2⌋ ) +C ( m 2 , D ) where m2 is any
residual information left unencoded by the first term in conjunction with
m 1 . To preserve the order, the concept of varying both the base as also the
exponent for Q as explained earlier must be adopted, and the divisor of Q
can be >2, which will only make it considerably better.

A better, more general approach would be as follows. Suppose we wish


to define a coding function C : M → S . We decompose or break up S
by expressing it as S=∏ si and gradually build up a value for M as
i<n
successively larger values
m0 , m1 , m2 , ...., mn− 1 where ( m0=M ) >m1 >m2 >…>mn − 1using the
−1
inverse decoding function C : M ' × S ' → M as
C − 1 m 0 , s 0 ×C − 1 ( m1 , s1 ×C −1 ( m2 , s2 ×C −1 ()) ) . Now we know
( )
that when we encode there is some loss of information, so in practise a
more realistic definition could be
( ( (
C − 1 m 0 ,C s0 × C − 1 m 1 , C s 1 ×C −1 ( m2 , C ( s 2 ×C − 1()) )
( )))),
although this latter form of definition introduces the possibility of
increasing the computational complexity, but hopefully it is not to such a
great extent say like that of the Ackermann function for which the
computation becomes intractable very soon. There need not be just one
coding function C and one inverse function C − 1, but there could be a
choice family of such functions and their respective inverses which family
itself could be coded, depending on the type of values being encoded. An
interesting observation which leads to considerable improvement in the
definition and opens up a lot of interesting and excellent possibilities is
noting that the values of m 0 , m1 , … , m n – 1 need not be fixed but can be
made variable from right to left , and each successive larger value could
i
depend on the value of all of its predecessors . This means there will be 2
possible new values for m i for the already existing 2i possible values for
all of its precdecessors over and above their values, so the growth of M
will be very rapid.

It is because such hashing schemes can be extended, drawn-out, evolved,


improved upon and developed all the time, that makes hashing an excellent,
highly productive and extremely versatile technique and a must implement
in the computing enthusiasts knapsack, and the programmers tool kit.
POSTFACE

We set foot into the captivating world of computing and proceed past it’s
threshold, by defining the terms data and information, wherein data is
defined as the symbolic representation of information and information in
turn is defined as the meaning associated with the data, either the meaning
intrinsic and latent to it or the meaning purposefully attributed to, ascribed
to, accorded to, or imputed to it. This view is concordant with the alternate
practical definition of data as simply being a reference to raw data supplied
as input to a process, and information as being refined or processed data,
which is delivered and produced as a result or as an outcome or output, by
the process. This naturally gives rise to the question in our minds as to,
“How much information can a given piece of data contain ?”. As it turns
out, to put it mildly, the answer is, “Pretty much as much as one would like
it to, provided one can come up with a really clever way of coding it”. In
other words, just like with everything else in this world, there is always the
obvious superficial connotation and significance of data and there is also
the less apparent , more hidden, more deeper, more profound, more subtle
albeit also more suggestive significance and interpretation of it. Can a drop
contain a vertiable ocean ? Can a seed contain a vertiable orchard ? As we
have seen, for all theoretical and practical intents and purposes, it always
can.
APPENDIX

PROBABILTY OVERVIEW

Here is a brief overview of the basic laws governing finite probability.

If U is a finite Universe and S ⊆ U , then

|S|
P ( S )=
|U|
and,

P ( S ' )=1 − P ( S ) , where S '=U − S

In general, of course P ( A − B )=P ( A ) − P ( A ∩ B )

If A ⊆ U , B ⊆U since by the principle of inclusion and exclusion for


two sets, we have

|A ∪ B|=|A|+|B|−|A ∩ B|
therefore

P ( A ∪ B ) =P ( A ) + P ( B ) − P ( A ∩ B )
From this principle we can gather that by and large the binary arithmetic
operation of ' +' corresponds to the binary logical operation of ' ∨ ' which
in turn corresponds to the binary set theoretic operation of ' ∪ ' . However
the ' ∨ ' is always in an inclusive sense, never in an exclusive sense, hence
the cardinality of the intersecting set, or the set of common
elements(possibilities) must always be subtracted. This set will be the null
set ∅if the sets are mutually disjoint.

This principle can be extended to more than two sets by repeated


substitution and distribution. The set theoretic operations of ' ∪ ' and ' ∩ '
distribute over each other. Likewise, De Morgans laws can also be applied
to calculate the probabilities of specific sets.

The law of conditional probability is as follows,

if B ⊂ A then P ( B )= P ( A ) ⋅ P ( B ∨ A ) where P ( B ∨ A ) is the


probability of B given A or the probability of B relative to A , where A
serves as U and B serves as S respectively.

P ( A)⋅ P ( A ∩B ∨ A )
P ( A ∩ B )=
{ P (B)⋅ P ( A ∩B ∨ B)
P ( A ) ⋅ P ( A ∩ B ∨ A ) +P ( B ) ⋅ P ( A ∩ B ∨ B )
2
}
The probability is always a number that lies between 0 and 1 inclusive,
0 meaning absolute impossibility and 1 meaning absolute certainty and
intermediate ratios indicating relative degrees of certainty >1/2, or
uncertainty <1/2 or evenly poised =1/2, as the case may be.

When in doubt, as to whether to add or to multiply probabilities, a


simple rule of thumb is to note that addition always increases the
probability, whereas multiplication always decreases the probability. So, if
the desired resultant event or outcome is more rarer and more uncommon
than its individual constituent events, you are probably looking to multiply,
but if the desired resultant event or outcome is more common and more
likely than its individual constituent events, you are probably looking to
add.
CODE LISTING

FILE : hash.h

#ifndef HASH_H
#define HASH_H

/*
Perfect hash library. As of now only some functions for
Hash table lookup which are basically one way or
the other hash functions are included. Functions for hash
based compression are not present.
*/

/*--------------------Ordered hash---------------------------*/

/* The hash function type. A hash function is an increasing hash if


more larger keys map to the same hash value and fewer smaller keys
map to the same hash value. If fewer larger keys map to the same hash
value and more smaller keys map to the same hash value, it is a
decreasing hash.
*/
enum hashfn_type { INCREASING_HASH , DECREASING_HASH };

#ifndef HASHFN_T
typedef struct hashfn {
int t; /* The hash function type. */
/* The near inverse hash function. */
unsigned long long (*hash_inv)(unsigned v, unsigned s, unsigned
long long m);
} hashfn_t;
#define HASHFN_T
#endif

/* The hash function. */


unsigned hash(unsigned long long k, unsigned long long m, unsigned s,
hashfn_t *h);

/* A few possible hashfn. Others can be included here. */


#define HASH1 INCREASING_HASH
unsigned long long hash1_inv(unsigned v, unsigned s, unsigned long long
m);

#define HASH2 DECREASING_HASH


unsigned long long hash2_inv(unsigned v, unsigned s, unsigned long long
m);

#define HASH3 DECREASING_HASH


unsigned long long hash3_inv(unsigned v, unsigned s, unsigned long long
m);

#define HASH4 INCREASING_HASH


unsigned long long hash4_inv(unsigned v, unsigned s, unsigned long long
m);

#define HASH5 INCREASING_HASH


unsigned long long hash5_inv(unsigned v, unsigned s, unsigned long long
m);

/* The double inversion hash function and it's near inverse.


Ordered one way or the other function. */
unsigned di_hash(unsigned long long k, unsigned long long m, unsigned s,
hashfn_t *h);
unsigned long long di_hash_inv(unsigned v, unsigned s, unsigned long long
m, hashfn_t *h);

/* The string hash function */

unsigned long long strhash(unsigned char *k,int l, hashfn_t *f);


/*-----------------------Unordered hash------------------------*/

/* Flip hash. Unordered one way or the other hash function. */


int flip_hash(k ,s);

/* Mirror hash. Another unordered one way or the other hash function. */
int mirror_hash(k ,s);

/*--------------------Rehashing---------------------------------*/

#define REHASH_TABLE_SIZE (1 << 16)


#define REHASH_BUCKET_SIZE (1 << 8)

rehash0(int i, unsigned k);


rehash1(int i, unsigned k);

#endif

FILE : hash.c

#include<math.h>
#include<limits.h>

#include"hash.h"

unsigned hash(unsigned long long k, unsigned long long m, unsigned s,


hashfn_t *h) {
unsigned u = 0, v = s - 1, t;
unsigned long long l;

while(v - u > 1) {
t = (u + v) / 2;
l = h->hash_inv(t,s,m);
if(l < k) u = t;
else {
if(l > k) v = t;
else u = v;
}
}
return h->hash_inv(v,s,m) <= k ? v : u;
}

unsigned long long hash1_inv(unsigned v, unsigned s, unsigned long long


m) {
unsigned long long k;

k = (m / ((2 * s) - v)) * v;
return k;
}

unsigned long long hash2_inv(unsigned v, unsigned s, unsigned long long


m) {
long double x;
unsigned long long k, n;

assert((s > 0) && (v < s) && (s < m));


v = (s - 1) - v;
x = logl(m) / (2 * logl(s));
n = (unsigned long long)ceill(powl(m , 1 / x));
k = (unsigned long long)floorl(powl(n - (n / (s - v)), x));
return k;
}

unsigned long long hash3_inv(unsigned v, unsigned s, unsigned long long


m) {
long double r;
unsigned long long k;

r = logl(m) / ((2 * logl(s)) + logl(2));


k = (unsigned long long)floorl(powl(powl(v + 1 , 3) - powl(v, 3), r));
return k < m ? k : m;
}

static unsigned long long rhash4_inv(unsigned v, unsigned s, unsigned long


long m, long double b) {
if(s <= 3) return (m * (v + 1)) / 2;
return rhash4_inv(v / 2, s / 2, (unsigned long long)floorl(m / (b)), b)
+ (hkey_t)floorl((m / (b)) * ((b) - 1 + v % 2));
}
unsigned long long hash4_inv(unsigned v, unsigned s, unsigned long long
m) {
long double b;

assert((s > 0) && (v < s) && (s < m));


b = powl(m, logl(2) / log(s));
return rhash4_inv(v, s, m, b);
}

unsigned long long hash5_inv(unsigned v, unsigned s, unsigned long long


m) {
unsigned long long k , l;
long double b, x;

assert((s > 0) && (v < s) && (s < m));


if(s <= 3) return (m * (v + 1)) / 2;
x = logl(s) / logl(2);
b = powl(m , 1 / x);
l = (unsigned long long)floorl(powl(b , (x / 2)));
if(v >= (s / 2)) { k += l; v -= (s / 2); }
m -= l;
s -= (s / 2);
return k + hash5_inv(v, s, m);
}

unsigned di_hash(unsigned long long k, unsigned long long m, unsigned s,


hashfn_t *h) {
unsigned v = 0, c = 0, t;
unsigned long long l, n;

if(h->t & DECREASING_HASH) {


if(k >= (m / 2)) { k = m - k; c = 1; }
}
else {
if(k < (m / 2)) { k = (m / 2) - k - 1; c = 1; }
else k = k - (m / 2);
}
m = (m / 2) + (m % 2);
if(s <= 256) {
v = hash(k,m,128,h);
if(! (h->t & DECREASING_HASH)) v += 128;
}
else {
v = hash(k,m,128,h);
l = h->hash_inv(v,128,m);
n = h->hash_inv(v + 1,128,m);
k -= l;
m = n - l;
if(! (h->t & DECREASING_HASH)) v += 128;
t = (s / 256) + (s % 2);
v = (v * t) + di_hash(k, m, t, h);
}
if(c == 1) { v = (s - v); }
return v;
}

unsigned long long di_hash_inv(unsigned v, unsigned s, unsigned long long


m, hashfn_t *h) {
unsigned long long k = 0, l, n, o;
unsigned c = 0, t, w;

n = m;
if(v > (s / 2)) { c = 1; v = s - v; }
m = (m / 2) + (m % 2);
if(s <= 256) {
if(! (h->t & DECREASING_HASH)) v -= 128;
k = h->hash_inv(v,128,m);
}
else {
t = (s / 256);
w = v / (t + 1);
if(! (h->t & DECREASING_HASH)) w -= 128;
l = h->hash_inv(w,128,m);
v %= (t + 1);
m = h->hash_inv(w + 1,128,m) - l;
k += l + di_hash_inv(v, t, m, h);
}
if(c == 1) {
if(h->t & DECREASING_HASH) { k = n - k; }
else { k = (n / 2) - k - 1; }
}
else if(! (h->t & DECREASING_HASH) ) { k = k - (n / 2); }
return k;
}

unsigned long long strhash(unsigned char *k, int l, hashfn_t *f) {


unsigned long long v = 0, u, w, t, m, n, o, p = ULLONG_MAX;
int i;

n = *(unsigned long long *)k;


u = v = di_hash(n, ULLONG_MAX, 65536, f);
if(l > 8) {
w = strhash(k + 8, l - 8, f);
for(i = 0; i < 4; i++) {
t = di_hash_inv(u, 65536, ULLONG_MAX, f);
o = n - t;
m = di_hash_inv(u + 1, 65536, ULLONG_MAX, f) - t;
p = ULLONG_MAX / p;
p = (p < m) ? m / p : 1;
n = o * (ULLONG_MAX / m) + (w / p);
w %= p;
u = di_hash(n, ULLONG_MAX, 65536, f);
v = (v * 65536) + u;
}
}
else {
for(i = 0; i < 8; i++) {
v <<= 8;
if(i < l) v += k[i];
}
}
return v;
}

static int rflip_hash(unsigned long long q,unsigned long long r,int s) {


int t, h;

h = 0;
if(s > 1) {
t = s / 2;
h = ((r / t) * t) + rflip_hash(q / 2,r % t,t);
if(q % 2) h = s - h - 1;
}
return h;
}

int flip_hash(unsigned long long k, int s) {


return rflip_hash(k / s, k %s, r);
}

static int rmirror_hash(unsigned long long q,unsigned long long r,int s) {


int t, h;

if(s > 4) {
t = s / 2;
h = rmirror_hash(q / 2, r % t, t);
if(q % 2) { if(h >= s/4) h = h - (s/4); else h = h + (s/4); }
if(r / t) { if(h >= s/4) h = h - (s/4) + (s/2); else h = h + (s/4) +
(s/2);
}
else h = r;
return h;
}

int mirror_hash(unsigned long long k, int s) {


return rmirror_hash(k / s, k % s, s);
}

int rehash0(int i,unsigned k) {


int q,l,m,n;

q = k / TABLE_SIZE;
l = i / 8; i %= 8;
m = i / 4; i %= 4;
n = i;
if(i == 1) k ^= (q & HIGH_BITS);
if(m == 1) k ^= (q & MID_BITS);
if(n/2) k ^= (q & LOW_HIGH_BITS);
if(n%2) k ^= (q & LOW_LOW_BITS);
return k % TABLE_SIZE;
}

int rehash1(int i, unsigned k) {


int j = 0;

if(!k) k++;
while(j < i) { j++; k += (k / TABLE_SIZE) + j; }
return k % TABLE_SIZE;
}
TEST RESULTS

REHASHING

The following is the summary listing of the test results of the


performance of this Re-Hash function for 100 trials. Here “Capacity”
represents the maximum number of keys the Hash table can contain which
is “Table size” times “Bucket size”. “Filled” is the maximum number of
locations that were actually filled up, and “Unfilled” is the number of
empty locations which is also the number of keys not inserted into the Hash
table. “Maximum” is the maximum number of Re-Hashes done and
“Mean” is the average number of Re-Hashes necessary to find a key minus
one.

Table size : 65536, Bucket size : 256, Capacity 16777216


Trial Filled Unfilled Ratio Usage% Max Mean
1 16768982 8234 2037 99.95 16 1
2 16769582 7634 2197 99.95 16 1
3 16769861 7355 2281 99.96 16 1
4 16769852 7364 2278 99.96 16 1
5 16769525 7691 2181 99.95 16 1
6 16770333 6883 2437 99.96 16 1
7 16769936 7280 2304 99.96 16 1
8 16768677 8539 1964 99.95 16 1
9 16769922 7294 2300 99.96 16 1
10 16770120 7096 2364 99.96 16 1
11 16769239 7977 2103 99.95 16 1
12 16769934 7282 2303 99.96 16 1
13 16770915 6301 2662 99.96 16 1
14 16768037 9179 1827 99.95 16 1
15 16770415 6801 2466 99.96 16 1
16 16770264 6952 2413 99.96 16 1
17 16770757 6459 2597 99.96 16 1
18 16770233 6983 2402 99.96 16 1
19 16769887 7329 2289 99.96 16 1
20 16770132 7084 2368 99.96 16 1
21 16770309 6907 2429 99.96 16 1
22 16769772 7444 2253 99.96 16 1
23 16770083 7133 2352 99.96 16 1
24 16769599 7617 2202 99.95 16 1
25 16770358 6858 2446 99.96 16 1
26 16768352 8864 1892 99.95 16 1
27 16769863 7353 2281 99.96 16 1
28 16770647 6569 2553 99.96 16 1
29 16770068 7148 2347 99.96 16 1
30 16770407 6809 2463 99.96 16 1
31 16768882 8334 2013 99.95 16 1
32 16771669 5547 3024 99.97 16 1
33 16769843 7373 2275 99.96 16 1
34 16769281 7935 2114 99.95 16 1
35 16769848 7368 2277 99.96 16 1
36 16769909 7307 2296 99.96 16 1
37 16769427 7789 2153 99.95 16 1
38 16770374 6842 2452 99.96 16 1
39 16768840 8376 2003 99.95 16 1
40 16770107 7109 2359 99.96 16 1
41 16769530 7686 2182 99.95 16 1
42 16768944 8272 2028 99.95 16 1
43 16770026 7190 2333 99.96 16 1
44 16770055 7161 2342 99.96 16 1
45 16768997 8219 2041 99.95 16 1
46 16768819 8397 1998 99.95 16 1
47 16770241 6975 2405 99.96 16 1
48 16768775 8441 1987 99.95 16 1
49 16770519 6697 2505 99.96 16 1
50 16769705 7511 2233 99.96 16 1
51 16768648 8568 1958 99.95 16 1
52 16770752 6464 2595 99.96 16 1
53 16770318 6898 2432 99.96 16 1
54 16770371 6845 2451 99.96 16 1
55 16769607 7609 2204 99.95 16 1
56 16769743 7473 2245 99.96 16 1
57 16771280 5936 2826 99.96 16 1
58 16770374 6842 2452 99.96 16 1
59 16770156 7060 2376 99.96 16 1
60 16770560 6656 2520 99.96 16 1
61 16769604 7612 2204 99.95 16 1
62 16769569 7647 2193 99.95 16 1
63 16769915 7301 2297 99.96 16 1
64 16769677 7539 2225 99.96 16 1
65 16769803 7413 2263 99.96 16 1
66 16770579 6637 2527 99.96 16 1
67 16770256 6960 2410 99.96 16 1
68 16771339 5877 2854 99.96 16 1
69 16769733 7483 2242 99.96 16 1
70 16769960 7256 2312 99.96 16 1
71 16771433 5783 2901 99.97 16 1
72 16769607 7609 2204 99.95 16 1
73 16769575 7641 2195 99.95 16 1
74 16770153 7063 2375 99.96 16 1
75 16769522 7694 2180 99.95 16 1
76 16770230 6986 2401 99.96 16 1
77 16769948 7268 2308 99.96 16 1
78 16770040 7176 2337 99.96 16 1
79 16770737 6479 2589 99.96 16 1
80 16770396 6820 2460 99.96 16 1
81 16769099 8117 2066 99.95 16 1
82 16770502 6714 2498 99.96 16 1
83 16770699 6517 2574 99.96 16 1
84 16770669 6547 2562 99.96 16 1
85 16768791 8425 1991 99.95 16 1
86 16768285 8931 1878 99.95 16 1
87 16769467 7749 2165 99.95 16 1
88 16770263 6953 2412 99.96 16 1
89 16770089 7127 2354 99.96 16 1
90 16769448 7768 2159 99.95 16 1
91 16769495 7721 2172 99.95 16 1
92 16769303 7913 2120 99.95 16 1
93 16769367 7849 2137 99.95 16 1
94 16769728 7488 2240 99.96 16 1
95 16769137 8079 2076 99.95 16 1
96 16771062 6154 2726 99.96 16 1
97 16770583 6633 2529 99.96 16 1
98 16770197 7019 2390 99.96 16 1
99 16769715 7501 2236 99.96 16 1
100 16768779 8437 1988 99.95 16 1

Following is a summary listing of rehash1 function’s performance for


100 trials.

Table size : 65536, Bucket size : 256, Capacity 16777216


Trial Filled Unfilled Ratio Usage% Max Mean
1 16777201 15 1118481 100.00 6 1
2 16777196 20 838860 100.00 6 1
3 16777193 23 729444 100.00 6 1
4 16777192 24 699050 100.00 6 1
5 16777191 25 671088 100.00 6 1
6 16777187 29 578524 100.00 6 1
7 16777195 21 798915 100.00 6 1
8 16777192 24 699050 100.00 6 1
9 16777190 26 645277 100.00 6 1
10 16777194 22 762600 100.00 6 1
11 16777196 20 838860 100.00 6 1
12 16777205 11 1525201 100.00 6 1
13 16777194 22 762600 100.00 6 1
14 16777194 22 762600 100.00 6 1
15 16777194 22 762600 100.00 6 1
16 16777194 22 762600 100.00 6 1
17 16777194 22 762600 100.00 6 1
18 16777195 21 798915 100.00 6 1
19 16777191 25 671088 100.00 6 1
20 16777193 23 729444 100.00 6 1
21 16777192 24 699050 100.00 6 1
22 16777190 26 645277 100.00 6 1
23 16777197 19 883011 100.00 6 1
24 16777192 24 699050 100.00 6 1
25 16777196 20 838860 100.00 6 1
26 16777194 22 762600 100.00 6 1
27 16777195 21 798915 100.00 6 1
28 16777187 29 578524 100.00 6 1
29 16777192 24 699050 100.00 6 1
30 16777196 20 838860 100.00 6 1
31 16777186 30 559240 100.00 6 1
32 16777188 28 599186 100.00 6 1
33 16777193 23 729444 100.00 6 1
34 16777191 25 671088 100.00 6 1
35 16777191 25 671088 100.00 6 1
36 16777192 24 699050 100.00 6 1
37 16777194 22 762600 100.00 6 1
38 16777199 17 986895 100.00 6 1
39 16777199 17 986895 100.00 6 1
40 16777192 24 699050 100.00 6 1
41 16777195 21 798915 100.00 6 1
42 16777189 27 621378 100.00 6 1
43 16777199 17 986895 100.00 6 1
44 16777195 21 798915 100.00 6 1
45 16777195 21 798915 100.00 6 1
46 16777194 22 762600 100.00 6 1
47 16777192 24 699050 100.00 6 1
48 16777187 29 578524 100.00 6 1
49 16777193 23 729444 100.00 6 1
50 16777195 21 798915 100.00 6 1
51 16777190 26 645277 100.00 6 1
52 16777198 18 932067 100.00 6 1
53 16777199 17 986895 100.00 6 1
54 16777193 23 729444 100.00 6 1
55 16777193 23 729444 100.00 6 1
56 16777194 22 762600 100.00 6 1
57 16777198 18 932067 100.00 6 1
58 16777192 24 699050 100.00 6 1
59 16777190 26 645277 100.00 6 1
60 16777199 17 986895 100.00 6 1
61 16777197 19 883011 100.00 6 1
62 16777191 25 671088 100.00 6 1
63 16777188 28 599186 100.00 6 1
64 16777197 19 883011 100.00 6 1
65 16777190 26 645277 100.00 6 1
66 16777198 18 932067 100.00 6 1
67 16777198 18 932067 100.00 6 1
68 16777194 22 762600 100.00 6 1
69 16777197 19 883011 100.00 6 1
70 16777195 21 798915 100.00 6 1
71 16777196 20 838860 100.00 6 1
72 16777194 22 762600 100.00 6 1
73 16777199 17 986895 100.00 6 1
74 16777193 23 729444 100.00 6 1
75 16777195 21 798915 100.00 6 1
76 16777190 26 645277 100.00 6 1
77 16777199 17 986895 100.00 6 1
78 16777196 20 838860 100.00 6 1
79 16777193 23 729444 100.00 6 1
80 16777192 24 699050 100.00 6 1
81 16777196 20 838860 100.00 6 1
82 16777194 22 762600 100.00 6 1
83 16777195 21 798915 100.00 6 1
84 16777195 21 798915 100.00 6 1
85 16777196 20 838860 100.00 6 1
86 16777196 20 838860 100.00 6 1
87 16777196 20 838860 100.00 6 1
88 16777196 20 838860 100.00 6 1
89 16777197 19 883011 100.00 6 1
90 16777193 23 729444 100.00 6 1
91 16777192 24 699050 100.00 6 1
92 16777194 22 762600 100.00 6 1
93 16777200 16 1048576 100.00 6 1
94 16777193 23 729444 100.00 6 1
95 16777193 23 729444 100.00 6 1
96 16777193 23 729444 100.00 6 1
97 16777196 20 838860 100.00 6 1
98 16777201 15 1118481 100.00 6 1
99 16777199 17 986895 100.00 6 1
100 16777196 20 838860 100.00 6 1
UNORDERED PERFECT HASHING

Following is a summary listing of the test results for Flip Hash function ,
for 500 trials. Here, “Ratio” is the ratio of “Sample size” to “Table size”,
and “Least” is the minimum number of keys in a Bucket, “Largest” is the
maximum number of keys in a Bucket and “Mean” is the
(weighted)average number of keys in a Bucket. This result simply states
that, for any given random sample of keys, one can expect a bucket to have
on an average around 21 keys instead of the ideal value of 16 keys, with the
number of keys in a Bucket ranging from a minimum of around 3 to a
maximum of around 33, for this Hash function.

Table size : 16384, Sample size : 262144, Ratio : 16


Trial Least Largest Mean
0001 03 34 21
0002 03 34 21
0003 03 35 21
0004 01 35 21
0005 03 37 22
0006 04 34 22
0007 03 36 21
0008 03 33 23
0009 02 33 21
0010 03 36 20
0011 03 34 22
0012 03 34 22
0013 01 41 21
0014 04 36 21
0015 02 34 20
0016 03 35 21
0017 03 33 23
0018 04 34 24
0019 03 34 25
0020 03 34 22
0021 03 33 21
0022 04 34 22
0023 02 35 22
0024 03 33 24
0025 03 37 20
0026 02 35 22
0027 04 32 20
0028 03 34 20
0029 03 34 21
0030 03 33 21
0031 04 34 21
0032 04 32 20
0033 03 35 21
0034 03 33 22
0035 03 33 22
0036 03 32 23
0037 03 35 21
0038 03 35 20
0039 03 33 23
0040 03 33 20
0041 02 35 22
0042 04 33 23
0043 03 35 23
0044 03 32 22
0045 04 34 21
0046 02 34 22
0047 02 33 20
0048 03 35 22
0049 03 33 22
0050 03 34 21
0051 04 36 22
0052 03 31 21
0053 03 35 21
0054 03 33 22
0055 04 35 21
0056 04 32 21
0057 03 35 22
0058 03 32 23
0059 04 34 21
0060 02 33 23
0061 03 36 22
0062 02 33 21
0063 04 35 22
0064 04 35 22
0065 02 37 23
0066 04 38 21
0067 04 33 23
0068 03 40 22
0069 02 37 20
0070 04 34 21
0071 03 33 23
0072 04 34 20
0073 03 33 24
0074 01 34 22
0075 03 33 24
0076 03 32 22
0077 03 33 22
0078 03 34 23
0079 03 33 24
0080 03 33 22
0081 03 35 23
0082 03 32 23
0083 03 32 21
0084 02 34 21
0085 03 34 19
0086 03 32 22
0087 03 32 21
0088 03 36 22
0089 04 35 22
0090 03 32 22
0091 02 36 19
0092 04 34 23
0093 03 34 22
0094 03 34 22
0095 03 37 21
0096 03 33 22
0097 04 32 22
0098 02 33 21
0099 02 33 22
0100 02 35 22
0101 02 34 21
0102 02 35 23
0103 03 34 22
0104 02 34 22
0105 04 35 23
0106 04 33 21
0107 02 34 23
0108 02 32 21
0109 03 34 22
0110 03 33 23
0111 04 34 21
0112 03 35 22
0113 02 34 21
0114 04 34 23
0115 03 36 22
0116 02 33 21
0117 03 34 21
0118 04 32 21
0119 03 33 22
0120 01 34 21
0121 02 33 21
0122 02 35 22
0123 04 36 21
0124 03 34 22
0125 02 34 21
0126 03 33 22
0127 02 32 21
0128 04 32 21
0129 03 35 22
0130 02 37 23
0131 03 33 21
0132 03 34 22
0133 02 33 20
0134 03 35 22
0135 03 33 22
0136 03 34 22
0137 04 33 21
0138 03 33 22
0139 02 33 20
0140 03 33 22
0141 03 34 22
0142 02 33 23
0143 03 35 22
0144 03 35 22
0145 04 35 20
0146 03 35 22
0147 03 36 23
0148 04 33 22
0149 04 34 23
0150 03 36 23
0151 03 33 24
0152 03 35 22
0153 03 33 23
0154 03 33 20
0155 03 33 21
0156 03 36 22
0157 03 33 21
0158 03 34 22
0159 03 35 21
0160 02 34 22
0161 02 35 21
0162 03 32 21
0163 02 33 22
0164 04 35 22
0165 04 33 23
0166 03 34 23
0167 03 32 23
0168 02 37 20
0169 03 34 23
0170 02 36 22
0171 03 33 19
0172 03 34 23
0173 03 35 22
0174 03 35 21
0175 02 35 20
0176 04 36 22
0177 03 37 24
0178 03 32 24
0179 02 33 21
0180 04 35 23
0181 03 32 22
0182 02 35 22
0183 04 37 22
0184 03 35 21
0185 02 32 21
0186 02 34 22
0187 02 39 19
0188 04 34 21
0189 03 36 23
0190 02 34 22
0191 03 32 21
0192 02 34 24
0193 03 33 22
0194 03 36 20
0195 02 34 22
0196 03 33 21
0197 02 34 21
0198 03 34 21
0199 03 34 20
0200 03 33 21
0201 02 33 21
0202 03 34 22
0203 03 34 23
0204 03 34 22
0205 03 34 24
0206 03 34 22
0207 02 34 22
0208 04 33 21
0209 02 34 22
0210 02 32 22
0211 03 33 23
0212 03 36 20
0213 03 33 21
0214 02 32 21
0215 03 34 22
0216 02 32 20
0217 03 35 22
0218 03 34 20
0219 02 35 21
0220 04 35 23
0221 02 34 22
0222 02 34 21
0223 03 34 22
0224 03 37 24
0225 03 37 21
0226 02 37 22
0227 04 33 22
0228 02 33 22
0229 03 36 21
0230 02 34 22
0231 04 33 22
0232 03 32 22
0233 04 36 22
0234 04 33 22
0235 04 32 23
0236 02 34 23
0237 02 35 24
0238 03 34 20
0239 04 33 21
0240 03 34 22
0241 03 34 22
0242 03 33 22
0243 03 32 23
0244 03 33 23
0245 02 33 23
0246 03 33 21
0247 03 33 23
0248 02 36 19
0249 03 35 20
0250 04 36 23
0251 04 36 23
0252 04 34 20
0253 02 39 21
0254 03 35 22
0255 03 33 20
0256 02 35 21
0257 02 35 22
0258 02 33 22
0259 04 33 23
0260 03 33 22
0261 03 33 23
0262 03 35 23
0263 02 35 21
0264 03 36 22
0265 03 34 22
0266 02 33 22
0267 03 32 22
0268 02 35 23
0269 03 34 23
0270 05 35 23
0271 02 36 20
0272 03 35 20
0273 03 32 22
0274 04 33 23
0275 02 34 22
0276 03 34 22
0277 03 35 21
0278 03 33 20
0279 02 34 20
0280 02 34 23
0281 03 37 22
0282 03 33 22
0283 02 33 20
0284 03 37 21
0285 04 32 23
0286 02 36 22
0287 03 33 21
0288 03 34 20
0289 04 32 23
0290 03 34 21
0291 03 32 22
0292 03 35 22
0293 03 34 22
0294 04 37 21
0295 02 34 21
0296 02 34 21
0297 03 35 20
0298 04 33 20
0299 03 34 23
0300 03 32 21
0301 03 33 21
0302 02 32 23
0303 03 34 21
0304 04 34 22
0305 03 35 20
0306 03 36 22
0307 04 31 21
0308 03 34 24
0309 01 35 22
0310 03 34 21
0311 04 34 24
0312 03 33 20
0313 03 34 22
0314 01 34 22
0315 03 35 22
0316 03 36 22
0317 03 33 25
0318 03 37 23
0319 03 33 22
0320 02 37 21
0321 03 34 22
0322 02 33 22
0323 03 36 22
0324 04 32 20
0325 04 33 22
0326 03 34 21
0327 02 33 22
0328 03 33 22
0329 03 34 24
0330 03 34 22
0331 03 35 22
0332 03 33 22
0333 02 33 21
0334 04 35 20
0335 04 33 23
0336 03 33 22
0337 03 36 22
0338 03 34 22
0339 04 34 22
0340 03 38 22
0341 03 33 22
0342 03 34 24
0343 02 34 23
0344 04 32 23
0345 03 34 22
0346 04 35 22
0347 04 33 24
0348 02 34 23
0349 02 37 21
0350 04 34 22
0351 04 33 21
0352 03 32 21
0353 02 36 21
0354 03 35 24
0355 03 35 22
0356 03 34 22
0357 02 35 22
0358 02 35 23
0359 04 35 21
0360 03 34 23
0361 03 33 23
0362 04 34 20
0363 03 35 22
0364 04 32 22
0365 02 33 21
0366 04 36 21
0367 04 39 22
0368 03 34 23
0369 04 33 23
0370 03 33 23
0371 03 33 20
0372 03 35 22
0373 02 32 23
0374 02 34 22
0375 03 32 22
0376 01 33 21
0377 03 33 21
0378 03 33 22
0379 03 36 22
0380 03 34 22
0381 04 34 23
0382 04 34 21
0383 04 33 21
0384 04 34 22
0385 03 35 20
0386 03 34 21
0387 02 34 21
0388 03 34 22
0389 03 35 23
0390 03 33 19
0391 02 33 24
0392 02 34 20
0393 04 34 22
0394 03 32 22
0395 01 36 21
0396 04 33 21
0397 02 33 21
0398 04 38 23
0399 04 33 21
0400 04 33 21
0401 02 32 24
0402 03 36 22
0403 03 36 22
0404 03 34 21
0405 03 35 22
0406 03 35 21
0407 03 33 20
0408 03 32 23
0409 03 33 22
0410 04 36 22
0411 04 32 22
0412 03 33 22
0413 02 34 22
0414 03 33 21
0415 03 38 22
0416 03 35 22
0417 03 35 21
0418 04 36 21
0419 03 37 23
0420 03 36 21
0421 04 34 25
0422 03 34 24
0423 03 34 22
0424 04 33 23
0425 03 35 23
0426 04 34 22
0427 03 32 21
0428 04 36 23
0429 04 35 23
0430 03 33 20
0431 02 35 20
0432 03 34 22
0433 03 33 23
0434 02 33 22
0435 02 33 21
0436 03 33 22
0437 03 33 22
0438 03 34 23
0439 03 31 23
0440 02 33 23
0441 03 34 22
0442 04 36 21
0443 04 34 22
0444 03 34 20
0445 04 35 22
0446 02 35 23
0447 04 32 23
0448 03 35 21
0449 03 34 21
0450 04 33 22
0451 02 35 23
0452 04 36 23
0453 02 34 23
0454 03 33 22
0455 04 35 22
0456 02 35 21
0457 03 33 22
0458 03 32 23
0459 03 35 21
0460 04 33 20
0461 03 33 22
0462 03 32 21
0463 04 33 20
0464 03 35 23
0465 02 35 22
0466 03 32 21
0467 03 32 21
0468 03 38 22
0469 03 34 22
0470 03 33 21
0471 03 33 21
0472 03 36 23
0473 02 36 23
0474 04 33 22
0475 03 33 21
0476 01 33 24
0477 02 34 22
0478 03 33 22
0479 04 38 22
0480 03 35 21
0481 03 33 23
0482 03 34 23
0483 03 33 22
0484 04 34 24
0485 03 33 23
0486 04 35 22
0487 04 33 22
0488 03 38 23
0489 04 34 22
0490 03 35 22
0491 04 33 21
0492 03 34 21
0493 03 33 22
0494 04 34 22
0495 03 36 22
0496 01 34 21
0497 03 39 21
0498 02 35 23
0499 03 33 20
0500 04 33 22

Following is a summary listing of the test results for Mirror Hash


function also for 500 trials.

Table size : 16384, Sample size : 262144, Ratio : 16


Trial Least Largest Mean
0001 03 34 20
0002 02 35 23
0003 03 32 21
0004 03 34 21
0005 03 35 22
0006 03 34 21
0007 02 34 23
0008 02 34 21
0009 02 33 22
0010 03 34 21
0011 04 34 22
0012 01 33 21
0013 03 37 22
0014 03 37 24
0015 03 35 23
0016 03 32 22
0017 02 32 21
0018 03 33 21
0019 03 34 21
0020 03 32 21
0021 03 36 21
0022 03 37 22
0023 03 33 22
0024 03 33 24
0025 02 35 22
0026 03 37 23
0027 01 34 22
0028 04 34 21
0029 03 33 21
0030 02 33 22
0031 03 34 22
0032 03 37 21
0033 03 35 20
0034 03 36 20
0035 03 34 23
0036 03 33 22
0037 03 32 22
0038 03 34 20
0039 04 36 23
0040 03 32 21
0041 03 32 22
0042 03 33 21
0043 03 33 22
0044 02 40 20
0045 04 31 22
0046 03 37 23
0047 01 33 22
0048 03 32 21
0049 03 35 24
0050 04 31 23
0051 02 33 21
0052 03 34 21
0053 03 33 22
0054 03 33 22
0055 03 35 23
0056 03 33 20
0057 03 35 21
0058 02 32 21
0059 03 34 22
0060 03 33 22
0061 03 37 20
0062 03 32 23
0063 04 37 21
0064 03 33 21
0065 03 36 22
0066 03 33 21
0067 03 33 19
0068 04 33 22
0069 04 38 22
0070 03 35 22
0071 03 36 22
0072 03 36 23
0073 02 33 22
0074 04 33 21
0075 03 33 22
0076 03 34 21
0077 02 34 23
0078 03 35 21
0079 03 35 20
0080 04 32 21
0081 03 32 23
0082 03 39 20
0083 03 33 21
0084 02 35 21
0085 03 36 22
0086 04 33 23
0087 01 33 21
0088 05 34 24
0089 02 34 22
0090 03 34 21
0091 03 35 22
0092 01 35 22
0093 02 35 22
0094 03 34 24
0095 03 36 23
0096 01 34 20
0097 03 37 22
0098 03 35 22
0099 03 33 22
0100 02 34 22
0101 02 37 22
0102 02 34 22
0103 03 34 22
0104 03 35 22
0105 03 35 23
0106 03 35 21
0107 03 33 22
0108 02 36 22
0109 04 33 21
0110 04 33 23
0111 04 36 24
0112 03 33 22
0113 03 33 22
0114 03 36 22
0115 04 33 22
0116 04 33 23
0117 02 36 23
0118 02 32 23
0119 02 35 22
0120 02 36 23
0121 04 33 22
0122 01 34 21
0123 03 35 22
0124 04 32 21
0125 03 34 20
0126 05 35 22
0127 03 33 22
0128 02 33 19
0129 03 35 23
0130 03 34 23
0131 02 34 20
0132 03 35 23
0133 03 37 22
0134 03 33 24
0135 02 37 23
0136 03 33 22
0137 03 34 21
0138 03 33 23
0139 03 33 23
0140 03 35 20
0141 04 34 20
0142 03 33 23
0143 02 35 24
0144 03 35 22
0145 03 33 22
0146 03 32 22
0147 03 32 22
0148 03 33 24
0149 03 35 23
0150 02 33 23
0151 03 36 20
0152 03 35 22
0153 03 32 24
0154 03 36 22
0155 02 38 22
0156 03 33 23
0157 02 35 21
0158 03 35 23
0159 04 36 21
0160 04 35 22
0161 01 34 23
0162 03 36 21
0163 03 32 22
0164 02 36 24
0165 04 34 21
0166 04 33 21
0167 03 35 22
0168 02 34 24
0169 04 33 21
0170 04 37 23
0171 02 34 21
0172 04 38 23
0173 03 34 22
0174 03 34 21
0175 02 34 21
0176 03 34 21
0177 02 33 21
0178 03 32 22
0179 03 32 21
0180 04 32 21
0181 04 35 22
0182 02 34 22
0183 03 34 22
0184 04 37 24
0185 02 34 20
0186 03 34 22
0187 03 33 20
0188 04 35 20
0189 02 38 22
0190 03 33 21
0191 03 35 22
0192 03 33 23
0193 02 33 22
0194 04 34 22
0195 03 32 22
0196 03 36 23
0197 03 35 23
0198 03 34 21
0199 02 35 22
0200 04 34 22
0201 03 34 23
0202 02 34 22
0203 03 34 22
0204 03 35 23
0205 02 37 22
0206 04 34 21
0207 03 37 22
0208 02 34 20
0209 04 37 23
0210 03 34 19
0211 03 33 23
0212 03 34 21
0213 03 37 21
0214 02 34 22
0215 03 34 21
0216 03 36 22
0217 03 33 22
0218 03 34 22
0219 04 34 23
0220 04 33 21
0221 03 33 23
0222 04 34 21
0223 02 33 24
0224 04 35 22
0225 02 34 21
0226 02 33 21
0227 04 37 22
0228 03 34 21
0229 04 34 23
0230 03 34 22
0231 03 36 21
0232 04 32 21
0233 03 33 22
0234 03 33 22
0235 02 33 24
0236 02 34 19
0237 04 36 21
0238 03 36 24
0239 04 33 23
0240 03 35 21
0241 03 35 23
0242 03 36 23
0243 01 36 21
0244 03 35 21
0245 04 36 21
0246 03 33 20
0247 04 34 22
0248 03 35 23
0249 03 33 22
0250 04 34 22
0251 02 34 22
0252 04 37 23
0253 04 35 22
0254 03 33 22
0255 03 35 23
0256 02 35 21
0257 03 34 21
0258 03 34 22
0259 04 35 22
0260 02 33 20
0261 04 34 20
0262 01 36 22
0263 03 33 20
0264 04 35 22
0265 03 34 20
0266 03 33 20
0267 02 34 22
0268 02 33 20
0269 02 33 20
0270 04 34 24
0271 02 33 22
0272 03 33 21
0273 03 33 23
0274 03 33 23
0275 01 33 21
0276 04 36 24
0277 03 34 22
0278 03 33 20
0279 03 33 23
0280 03 34 21
0281 02 36 22
0282 02 37 22
0283 02 36 23
0284 03 33 23
0285 02 34 23
0286 02 34 22
0287 02 34 21
0288 03 34 22
0289 04 35 23
0290 02 34 23
0291 03 33 23
0292 03 34 22
0293 03 35 22
0294 03 31 21
0295 04 34 23
0296 03 33 21
0297 04 34 23
0298 03 32 22
0299 02 35 22
0300 02 33 20
0301 03 39 23
0302 03 35 22
0303 03 33 22
0304 03 33 20
0305 02 38 23
0306 03 34 22
0307 04 33 24
0308 04 35 20
0309 04 34 20
0310 03 34 22
0311 03 35 22
0312 02 38 23
0313 03 33 23
0314 03 33 22
0315 04 32 23
0316 03 33 22
0317 02 33 20
0318 03 37 23
0319 02 33 21
0320 02 34 24
0321 03 34 21
0322 03 34 22
0323 02 36 19
0324 03 37 20
0325 02 33 22
0326 03 33 22
0327 04 35 22
0328 04 37 22
0329 02 33 23
0330 03 31 22
0331 03 33 22
0332 02 32 23
0333 03 33 22
0334 03 33 22
0335 02 34 22
0336 04 32 21
0337 03 34 21
0338 03 33 23
0339 03 32 23
0340 03 34 22
0341 04 33 22
0342 01 33 22
0343 02 36 23
0344 03 32 21
0345 02 36 23
0346 03 36 22
0347 04 34 22
0348 03 35 22
0349 03 34 22
0350 03 40 22
0351 03 36 22
0352 04 36 20
0353 03 34 22
0354 03 33 22
0355 03 40 23
0356 03 34 21
0357 03 33 22
0358 04 32 22
0359 04 34 21
0360 03 33 21
0361 03 32 22
0362 03 34 23
0363 03 34 22
0364 02 32 21
0365 01 34 21
0366 03 32 20
0367 02 34 21
0368 03 33 20
0369 03 34 23
0370 04 35 22
0371 03 32 21
0372 03 34 23
0373 04 34 22
0374 03 33 21
0375 03 34 23
0376 03 35 21
0377 02 33 22
0378 02 34 21
0379 04 37 21
0380 04 34 22
0381 02 33 22
0382 03 35 20
0383 02 37 22
0384 04 36 22
0385 03 34 24
0386 02 32 21
0387 04 32 21
0388 03 34 22
0389 04 34 22
0390 03 33 23
0391 03 33 20
0392 03 32 23
0393 04 33 24
0394 04 34 20
0395 03 31 21
0396 02 33 22
0397 03 34 23
0398 03 34 22
0399 03 32 23
0400 04 34 22
0401 02 34 21
0402 04 32 20
0403 04 38 21
0404 04 35 22
0405 03 34 21
0406 03 38 22
0407 03 33 22
0408 03 35 21
0409 03 36 23
0410 03 35 21
0411 03 35 22
0412 03 35 23
0413 03 34 21
0414 02 33 23
0415 03 32 20
0416 02 35 21
0417 02 36 22
0418 04 38 21
0419 03 34 21
0420 04 33 21
0421 03 33 22
0422 03 37 20
0423 03 32 24
0424 03 33 22
0425 03 33 22
0426 04 33 21
0427 03 39 23
0428 03 32 22
0429 01 34 21
0430 03 38 22
0431 03 33 21
0432 04 32 21
0433 04 34 22
0434 04 33 23
0435 02 34 21
0436 03 33 22
0437 03 31 22
0438 04 33 21
0439 03 33 24
0440 04 35 19
0441 03 34 23
0442 02 32 23
0443 03 36 22
0444 03 34 23
0445 03 34 20
0446 03 33 21
0447 03 33 22
0448 03 33 23
0449 01 33 20
0450 02 33 21
0451 04 35 19
0452 04 35 21
0453 03 33 21
0454 03 33 21
0455 03 31 24
0456 03 32 22
0457 03 34 21
0458 03 33 21
0459 04 35 23
0460 04 35 22
0461 02 35 23
0462 03 32 22
0463 03 37 22
0464 02 35 21
0465 02 33 23
0466 03 33 23
0467 03 33 21
0468 02 33 23
0469 02 35 21
0470 03 32 21
0471 04 36 23
0472 02 34 23
0473 03 33 23
0474 03 37 21
0475 04 33 22
0476 01 34 22
0477 03 35 23
0478 04 33 22
0479 02 34 23
0480 03 34 21
0481 02 32 23
0482 02 36 20
0483 03 34 23
0484 04 37 22
0485 03 34 22
0486 04 33 20
0487 03 34 21
0488 04 33 22
0489 03 33 20
0490 03 37 21
0491 03 37 22
0492 04 39 23
0493 04 35 22
0494 03 34 21
0495 04 33 23
0496 03 33 22
0497 03 33 22
0498 04 33 22
0499 02 36 22
0500 04 33 20
ORDERED PERFECT HASHING

The summary test results of the performance of one possible


implementation of the double inversion hash “di_hash” for one specific
hash function, namely “hash1_inv”, for a trial run of a 1000 64 bit
randomly generated keys hashed to 32 bit values. The first column is the 64
bit key and the second column is the 32 bit hash value. Whereas the printed
results is only for a sample of 1000 keys, this function was tested
repeatedly for samples of 100000 keys each time with virtually no collision
at all, except for very small keys.

000000001142901477445, 19
000000012549321775006, 368
000000046814201962286, 1464
000000085697934374760, 2766
000000090338021431046, 2887
000000122416587901932, 4079
000000129988190451905, 4352
000000147398128905153, 5044
000000216984999548607, 7959
000000235590863008682, 8853
000000297768140193931, 12033
000000334463579198116, 14137
000000354944507184524, 15412
000000435201267327388, 21200
000000628597032364689, 40496
000000728025040735039, 48398
000000744300213785748, 49481
000000765018664898324, 50827
000000774014865012139, 51395
000000989411814726806, 61606
000001049715175296307, 63716
000001054438455522829, 63842
000001214515982982642, 69080
000001383825020702355, 76572
000001547819949614097, 87589
000001570037104628850, 89563
000001851182600845092, 115694
000001880916822686715, 117522
000002362084407547606, 136913
000002567118275026908, 148057
000002728976998218061, 162824
000002732989700231372, 163317
000002898394969118770, 178947
000003093541827090012, 189957
000003585145175422264, 210190
000003820250675578399, 231343
000004495735859409152, 268211
000004761568989052217, 284969
000004764816122214367, 285268
000005028435209404099, 311299
000005072466027722922, 314221
000005160544496653030, 319171
000005519224517326920, 333016
000005766039391328126, 347741
000007394089020258796, 458048
000007650016836353655, 468884
000008165559914496372, 513542
000009032255811495972, 570703
000009771825424296258, 610802
000010565062669301843, 664221
000010605654468751745, 666600
000011404437124762080, 724260
000011962347526109286, 771503
000012235696743226050, 785668
000012353284940430203, 790246
000012438864131951196, 794407
000012691163605328158, 815309
000013854350305842047, 905551
000014207640235979529, 922043
000015882988979278854, 1048347
000016298836510090602, 1077401
000016406578758680194, 1091395
000016613238679376690, 1107220
000017285986900598279, 1157996
000018065289502797871, 1213744
000018631984655519344, 1252267
000018845661284265499, 1270031
000021364086510166114, 1472513
000021742547784423696, 1505696
000022263845482490616, 1551498
000022374344933316181, 1561587
000023007885672501182, 1611064
000024835869685769198, 1765987
000025025269975263070, 1775382
000026662275710950565, 1917190
000027064833838760762, 1961721
000028090079239692569, 2045139
000028457851960832208, 2089127
000028587304245322172, 2096601
000028646441753338047, 2099460
000029540415822712351, 2178692
000030644782048276847, 2290654
000030802645794779679, 2299182
000031126475344844558, 2338817
000033001758522248878, 2512647
000034658421986556202, 2685184
000035035935052819004, 2724610
000036975323438395823, 2930766
000037527997834241461, 2987416
000037816883140901816, 3015388
000039834456575571677, 3235577
000040517577638869848, 3321278
000041088156885197952, 3386813
000042704222505245336, 3572037
000043298313257216357, 3647243
000043336471943738731, 3653141
000044385616222658572, 3780532
000046018723032077761, 3986170
000047242580199971354, 4134638
000048981379254188690, 4376026
000049449606332194047, 4439720
000050041167275796239, 4518614
000052839114530226885, 4915736
000053497583910041005, 5014980
000053770330238249289, 5051409
000055550971660718669, 5323356
000056493069467663598, 5482520
000057049573965475977, 5569273
000057664955243610982, 5668703
000058537096806369906, 5819566
000058834314755434979, 5862858
000060355596586088613, 6125362
000062989416586426195, 6609497
000063015821359441963, 6613264
000063185694170242871, 6635724
000063376123268801042, 6680345
000064219486090807123, 6833030
000066889493234293544, 7372313
000067013140564092967, 7401419
000067092684468437203, 7410905
000067906564117725951, 7591441
000069773258853506957, 7994280
000069946390490678583, 8035173
000070308013253252200, 8118564
000070832384859564091, 8238811
000073465479583073703, 8841684
000073642637984385018, 8874248
000073927049738007685, 8935402
000075367200013183525, 9245864
000075452281541562152, 9263552
000077475194893573165, 9688175
000077691250138036411, 9721431
000082341060881200859, 10603265
000082716800012129678, 10669428
000083595751448163921, 10814966
000087844246129641108, 11515355
000087882416762822093, 11521808
000088491257849631446, 11605988
000090692854198569568, 11934669
000090761386928359575, 11943687
000091473939113749230, 12055044
000092910047639447210, 12255044
000093027118087690249, 12265705
000093102921596276712, 12277248
000093699520023812318, 12368926
000095836640254620972, 12648794
000096079615613685541, 12680245
000096204159576463895, 12702453
000097334568035430691, 12843307
000101790430617664038, 13376575
000103104416186747277, 13529753
000103141068025943746, 13537523
000105858485351481663, 13831415
000106093389070024556, 13857570
000107041775309273721, 13959284
000110040592197285508, 14270163
000111860845114574418, 14437095
000112967004475454385, 14548044
000114047116720863206, 14648037
000116232110293619161, 14846729
000117071902031660232, 14924764
000118895048357892002, 15075051
000122505200549546911, 15377867
000126414463777581785, 15671668
000131096781871553714, 16010729
000131361157024831406, 16039400
000133328488692764644, 16174348
000133958676117093779, 16205887
000135623958021556667, 16319815
000137917319430377898, 16462486
000140182276679253393, 16605020
000141057876045175343, 16656381
000143167161174775205, 16783220
000143402745906470094, 16796516
000143741825305997979, 16827418
000145061207147051480, 16904804
000148080496042014751, 17096439
000150589957970823123, 17249794
000151602756405964810, 17317891
000152082231807186452, 17360303
000153762752786644073, 17476096
000154682519674152523, 17540427
000155285991092597375, 17574031
000162188982935153885, 18096157
000163067224901086266, 18166184
000163301795237857665, 18194304
000163396911394738554, 18204157
000163670527422390620, 18219966
000170490419989250022, 18810865
000171071684998121441, 18869822
000171545151816311031, 18913523
000177325073886811365, 19476252
000177684888504357470, 19523606
000180443493423213027, 19811665
000180628243015198023, 19842593
000184364140150255415, 20261657
000184461690315872808, 20275413
000185892673316373747, 20450113
000188007793679165512, 20714999
000194806925331821811, 21662430
000194846586962809618, 21671670
000199353431246468654, 22372090
000201807677453154596, 22799066
000211157864262672035, 24645556
000216506412135414249, 25879904
000221134964463806737, 26821911
000221195310844558732, 26840558
000231506765104056783, 28593674
000236775073865551776, 29352874
000238648435730236873, 29598722
000239005535555013154, 29635197
000239297096660671305, 29682120
000239473066426119086, 29695650
000240021855801262568, 29764225
000240316832614293575, 29811307
000241701906020557973, 29972764
000242095199664961042, 30020629
000243738319144706475, 30215849
000245859763529217339, 30465588
000246930605006281282, 30583107
000247198153864667424, 30606794
000248924616430251203, 30796619
000249860567549811364, 30885140
000250809978858211015, 30992121
000254868194979083522, 31388141
000255369788242019917, 31437080
000258091315405402567, 31676747
000261285028046640619, 31963075
000261708616583380546, 31988937
000262189514185008010, 32038672
000265030036708049405, 32258782
000265289064027474279, 32289790
000266929228920520129, 32416358
000267414786355844551, 32445936
000269502229851194788, 32608501
000270861893043739750, 32704252
000274156893265827143, 32941951
000283543484720262828, 33544556
000284643745293082063, 33611209
000286817752827770476, 33745391
000288316764897396228, 33831691
000290758398091735555, 34003327
000292929519877260111, 34145164
000297082099683386269, 34446923
000300892428344837470, 34734174
000304293983758588716, 35004832
000307015655169372679, 35247401
000311463462779478058, 35648359
000312976398762210970, 35787181
000313024599279891679, 35790484
000313558371848708410, 35847733
000316686568493794187, 36168139
000317560898702519605, 36249998
000320132922606424935, 36531680
000321581636503928289, 36698517
000322033067298341415, 36754421
000324913232014640167, 37092093
000328161065852733872, 37496448
000329531588174461980, 37683437
000331248236453681114, 37923824
000333319368198982661, 38212150
000335273689408296341, 38512388
000335995554751730549, 38614053
000337425348365245923, 38851462
000338059456191834738, 38943180
000339589783385282907, 39198461
000340125962885289360, 39298851
000342583349156182635, 39724278
000344716365852704774, 40123632
000350255247369542274, 41281479
000353979205376069322, 42144664
000356331554390238536, 42680286
000361450361510448887, 43740445
000366865007641285520, 44713740
000367361429830509077, 44807078
000368226413286819317, 44949611
000368261285572196873, 44953610
000368777658220566388, 45029737
000369162248846296456, 45092668
000373656194494503092, 45777265
000373877385285003836, 45809314
000375168773787055537, 45996604
000379456883569852338, 46560632
000383963169447162103, 47113455
000384865823358643699, 47208425
000386435364824587473, 47385874
000386993211210297218, 47449029
000387329321167080425, 47490338
000388012320039434614, 47565643
000389003784642991915, 47659611
000393423266891790183, 48107660
000394915443156890146, 48247743
000397730175280871144, 48506618
000398201009230970175, 48557509
000400816409403042848, 48775726
000407076689875802581, 49284581
000407892761059907617, 49348312
000408318884757643159, 49378612
000412188657395232850, 49666753
000415723998975932978, 49902858
000416195192250003375, 49936977
000417410441819587627, 50010968
000417668808795089194, 50028862
000418565341795662880, 50084520
000428718982578981172, 50731381
000430068272118511494, 50831327
000433908554166642425, 51106104
000435555613697396183, 51227097
000442837254226402466, 51802942
000447539550142488412, 52225065
000454543495239916864, 52904265
000457918127402090775, 53278071
000462537350503813347, 53813533
000465985705153221320, 54261093
000473347493990557661, 55315145
000473556594009217386, 55353924
000475846448178101004, 55714350
000489237181493588076, 58367336
000491061838072312800, 58798580
000493418683422866768, 59364064
000495069558221300315, 59719934
000500270861968120516, 60775667
000500732126336020365, 60871674
000501049344143869050, 60930429
000502911814124186610, 61267512
000505026407410242287, 61621740
000506965248132111411, 61940168
000509087595863037450, 62271523
000509775296245605943, 62385065
000513546710099399111, 62920219
000514140215700618747, 63000931
000525679855987041567, 64407533
000528698646092293584, 64717459
000530368341707917828, 64882416
000539921854972022262, 65741848
000542237484458114068, 65932747
000542591640609702283, 65965196
000545082421665116904, 66156079
000547951613073953717, 66376397
000548963580368519545, 66447864
000549675719728260057, 66499398
000552496584411411602, 66694248
000556935089810402041, 66979237
000562787679918392046, 67358188
000571116620991839262, 67945693
000574288092093697252, 68181836
000574563261615905442, 68212688
000578115051536015592, 68494206
000585613300315808225, 69183208
000593982564917116764, 70058329
000595513553245320579, 70241553
000597847558785280780, 70516801
000597962328955809691, 70525960
000601593954162697238, 70992567
000602655722576293871, 71143332
000604229312778868792, 71363922
000604433014539705214, 71383776
000608764699277692741, 72032186
000616965092582049091, 73464943
000619131284620345289, 73895364
000621268771227265030, 74338000
000622661707335899478, 74647893
000626573704299784675, 75574829
000627540819140331509, 75819790
000629219537847760156, 76213182
000629584556318536695, 76288801
000631335591501226888, 76678246
000642011220374749540, 78689773
000642586289298044675, 78776366
000644545973803673218, 79092541
000646145854185279307, 79327718
000647147422340804044, 79482113
000650617885619692661, 79958047
000656705544077314894, 80733030
000656823500426665967, 80741383
000657319900487862279, 80803013
000658913975785601911, 80989185
000659482847778228353, 81053065
000663833776549014955, 81518373
000667609168368332606, 81887638
000668260371371269274, 81946576
000682402832892009086, 83132692
000686121817865509308, 83403875
000687587624055882364, 83497672
000688600095855260653, 83565274
000689584347635301010, 83630844
000695539612227932986, 84017652
000695609025949259166, 84020194
000703318794902564623, 84553066
000704418336327817300, 84641422
000704916345448111858, 84675585
000706888843787885778, 84826104
000709983900013766997, 85074241
000709987016618258560, 85074453
000710671395758462398, 85134045
000711998921251711078, 85253680
000722161813298559179, 86198087
000727408270887444436, 86771458
000738366760602464714, 88187161
000747943732180495965, 89735114
000748466035194802422, 89841465
000751889875013415796, 90501892
000760025073003784199, 92345323
000765026609894955056, 93518271
000766627700018379226, 93852944
000767277744136779540, 93986025
000768738259100411455, 94288165
000770177629865921806, 94563734
000775567201342376972, 95507174
000780498714604123113, 96276176
000781499643927426666, 96418810
000789188799167929054, 97443841
000792658255369400554, 97845250
000798010289740891694, 98424419
000798138508481892438, 98433521
000804409688915959149, 99029199
000806922755998446154, 99254494
000809589029056295845, 99481385
000811882282875145208, 99670156
000816315619981872285, 100005645
000819649808244801681, 100246557
000829546958727412730, 100903996
000832417360799285082, 101104344
000834661520583708446, 101257956
000836012573889669280, 101367809
000852939020993259529, 102842171
000863655248203302124, 104069466
000864199591995104886, 104137122
000866139037627829912, 104395104
000871515308859369459, 105163029
000876103859837482586, 105898773
000882859057121026542, 107146866
000887614786845008642, 108172489
000888793061794604077, 108452015
000915853376921262128, 113643761
000919701846533080964, 114152701
000920915919490137096, 114295772
000924030399522449878, 114667624
000924118774453254110, 114678252
000932760089153817295, 115578137
000933748318610715520, 115670415
000934661177114740823, 115749419
000941580619341298451, 116356874
000944067576387541468, 116558909
000944293254437977351, 116580117
000944855000482008443, 116616713
000958635321628477636, 117578014
000961430674459884395, 117772065
000967316246411947912, 118216461
000967480035445473829, 118225263
000968906518393434542, 118341916
000986686157995542393, 120008177
000988752399532897308, 120252799
000990918384193110894, 120511466
000995649967702548910, 121111315
000995867977517567825, 121140160
000997511335279390098, 121370375
001002909067881857442, 122171661
001004716787996873427, 122477579
001008547586865828548, 123143080
001015220362415710227, 124492699
001017773871167627913, 125074691
001019009616713912741, 125372648
001022096591426624061, 126158849
001031690190547718055, 128256632
001042043961963979213, 129998848
001042380694113488027, 130039284
001042836425489281187, 130105649
001045360318239215623, 130471429
001049384720691325533, 130997192
001053249456437055509, 131459821
001056780592119932590, 131852594
001061791480955262013, 132365855
001065809334544069393, 132732878
001070122596343560441, 133108242
001082727041423845996, 134063835
001091240629540500274, 134645871
001093438431450265069, 134807983
001093669422785746441, 134820382
001109699226837930866, 136242506
001112180778527609078, 136502426
001113398530984506944, 136634194
001113619880593574175, 136649199
001123023979833162852, 137805274
001127795499366893929, 138478572
001129788439909768403, 138792454
001131171735683382926, 139005911
001131460653282231107, 139060791
001132082722386615940, 139155471
001132748813357197987, 139268481
001135238993894735127, 139714030
001136000988242542870, 139850865
001138221079311205418, 140266264
001141098758727727321, 140853429
001147658707370586951, 142400459
001149266226469008596, 142810666
001156943900590819640, 144581074
001165887472679664304, 146219278
001167610166681585949, 146494566
001174790034080571062, 147526709
001175106809605596143, 147577439
001179053125306378745, 148062947
001179447688566341616, 148113670
001180842577001186559, 148283222
001181339538416561372, 148334231
001183666461459025675, 148594217
001186604253562080289, 148905392
001188411875427150840, 149093895
001190604195279955589, 149301746
001197097401923088636, 149887141
001199295283164162068, 150075676
001200659109980543199, 150188811
001200782058153827579, 150198978
001201324681488485144, 150233079
001205634413896946965, 150559785
001208075737380712622, 150736117
001211739745624903308, 150991624
001213687246516912778, 151123674
001222681994403044141, 151788640
001223389409640405092, 151848875
001228424971023179142, 152285228
001235136348719775720, 152914628
001237842126483019801, 153207344
001243380190794881140, 153824932
001245577547035002726, 154096632
001247059483220369307, 154285448
001249726509499101973, 154654673
001252728801842226956, 155074284
001261300031991755378, 156498534
001261591847614273086, 156556367
001261842645197398078, 156597693
001268329122107247231, 157930472
001279002190328234790, 160563700
001294176860647011669, 163441884
001297497982091761327, 163926896
001303541803872098419, 164742915
001306981516075283026, 165150511
001310918285293007202, 165593152
001313288478519970736, 165831448
001314355881934909375, 165939940
001317748658327327844, 166267392
001319926068868037454, 166465570
001326423056512097670, 167017972
001329110795295047935, 167233343
001338171339080691614, 167874015
001340214502371796651, 168022817
001343997610335539487, 168297935
001351054501435215845, 168881400
001354087769801435137, 169149282
001357736763919408918, 169494087
001372317248949330744, 171182669
001375671237029723298, 171649902
001387115813440747835, 173592839
001392218336417486746, 174656742
001400312369256215778, 176694941
001408780041681442434, 178602976
001410378009544207783, 178916562
001411140083638798110, 179057134
001416955495610314839, 180073291
001418277376833790162, 180281807
001419543421117459610, 180475898
001421226156029258554, 180721416
001425753157910319494, 181339148
001428756383486308261, 181723387
001434397383985686161, 182372068
001434548597123955312, 182384647
001435181095244813652, 182451441
001447653704630672741, 183630817
001458021765176464974, 184422252
001465525471733392820, 184958288
001466624623137159968, 185052990
001485006704097328150, 186749879
001486708865462037692, 186935811
001489459158354961541, 187256911
001496041646988645361, 188119331
001496456759150741294, 188174023
001503996267579006235, 189347584
001504858080601630865, 189505837
001521230302352403576, 193139005
001534414475952257497, 196076048
001537116150752683634, 196545572
001543250244913915185, 197520087
001547451270157439923, 198106108
001550852403585271831, 198530474
001571434178307015154, 200600293
001584073276285755409, 201531848
001585890719496915010, 201667314
001588267288211279921, 201854482
001591142895796649449, 202096910
001602444908997250293, 203149217
001607686724514133012, 203718114
001608516534497837528, 203816101
001616934887917942928, 204912875
001625136313335576051, 206202987
001625533992772008554, 206277043
001627929633490432007, 206709443
001639379192237957460, 209280456
001648115158717184196, 211503547
001654926877973800117, 212890501
001662594340810719306, 214172079
001672469807160677008, 215499269
001675837756667538543, 215889896
001676164490748233579, 215935781
001679511998842329679, 216285952
001681427950744096510, 216480674
001681601738581383067, 216507379
001682566997748182931, 216597611
001692294144841346930, 217459062
001697424213127949524, 217857253
001701211302940665317, 218135268
001701732456353445172, 218171440
001726214027520175610, 220407196
001730946624729057151, 220988400
001732067177073581694, 221129725
001732808839371667845, 221239237
001748052057926174106, 223724959
001752257126317262502, 224600644
001765895602821456570, 228075575
001783507356917551592, 231317205
001789250317082305974, 232085565
001792586930912127614, 232500501
001792887943387296414, 232527301
001797266738318300313, 233022192
001801006892351508383, 233396960
001803880397549140781, 233680548
001810763155380283908, 234281473
001824816753981443267, 235344105
001828544423473075873, 235661605
001834379984497333676, 236190434
001840209961172212109, 236783828
001846753055369352698, 237554888
001855259443167350137, 238734752
001862398869704674134, 239930344
001868812008360389148, 241236765
001871073828017273852, 241759728
001875039912411498803, 242762307
001878092359977448112, 243621638
001883483415559407679, 244986069
001884649036208913095, 245252279
001884784561603819030, 245293003
001886306351020392239, 245623905
001896450106340214579, 247506138
001904906652306805595, 248727520
001908364381691208111, 249168437
001909913962787274704, 249358074
001915618277695205722, 249979880
001927793298420996607, 251109668
001929968261862094749, 251275477
001954950171065085158, 253427900
001961381803155140860, 254162790
001974223225338459705, 256028993
001977488433282074601, 256597840
001977906743686462887, 256683824
001983818475407962292, 257883841
001987763987300129093, 258810363
002001006554307908093, 262263018
002008423560374595826, 263723822
002011457054416778077, 264244828
002021445345067165782, 265694847
002021539094078431653, 265709773
002023771430665993806, 265997385
002024644595438991200, 266091736
002036248948112244607, 267324103
002043025288297162685, 267917631
002044865695163335620, 268067884
002049748862267291176, 268440186
002053337079208212488, 268713733
002056770459646947301, 269005047
002059697318720788917, 269254602
002061875989266011632, 269455533
002077721121256858671, 271201363
002082158612373137175, 271827198
002102864621576947683, 275899415
002106045913319581253, 276779978
002110888790229183474, 278119372
002116531863097730795, 279425018
002122019293987946411, 280494803
002122389892227672322, 280560945
002128425715684718307, 281555451
002132331217634132498, 282131050
002145877919050673615, 283748642
002148928109552670221, 284042076
002160365219626540910, 285028437
002166941873264429409, 285541498
002179985236043196768, 286745918
002193952440408665019, 288480276
002211795929448775524, 291806988
002212465563877234441, 291964425
002221479157502894641, 294454214
002240954534226237636, 298419753
002259295899663230449, 300685891
002264892330526364358, 301215447
002265021111113361810, 301229544
002270810552602861242, 301724882
002274234890616086374, 301989082
002274458970334031797, 302000119
002275294349336888911, 302064686
002275940895655198588, 302120671
002281994831349338611, 302630289
002282427638531189243, 302657478
002287849034905258943, 303171065
002291786010085259329, 303574399
002291866738108389660, 303584216
002321572217765534069, 308324496
002324411071275699237, 309012852
002329888898548482695, 310570289
002353576485835050427, 315518586
002355336058624978566, 315767585
002357973453225950893, 316136869
002361792726211517311, 316609228
002367102993410479349, 317210192
002370214869292527258, 317532650
002374126293033473203, 317914960
002391933942156875342, 319409910
002393524509810021373, 319550994
002403637025527569573, 320593860
002409676493287043385, 321327360
002414011648032941904, 321927721
002432695657078383712, 325521660
002433929631127104110, 325843678
002443027722639200783, 328440753
002452280141892246627, 330551332
002452584842536941008, 330610641
002454032185216173119, 330885884
002458900131455027730, 331737377
002460088816846105876, 331929878
002462518189690217937, 332294940
002473613964180301951, 333742824
002480268577870787014, 334443262
002481784957712903781, 334598353
002491168978092413873, 335414733
002491316868158290806, 335422541
002510026139987545341, 337154505
002510671343160470372, 337229836
002513495725514866267, 337559316
002515308036909281135, 337774752
002516346080311757994, 337906362
002519359258990563537, 338305958
002519401447520162405, 338311372
002519424676669679234, 338314985
002522092654936924091, 338698207
002536985922956150336, 341439872
002547487954638586547, 344280100
002557875031252040407, 346902043
002572551446667602696, 349443325
002577226458934180122, 350076187
002581117373767689568, 350544374
002595724553456464900, 351987406
002604511691202338250, 352710054
002611073149335031712, 353311502
002616105425404351215, 353838811
002631416105831025657, 355890445
002634403893048212452, 356390221
002637226299097755247, 356909060
002653791229146746833, 361045176
002667354620114287292, 364358112
002667778499694681021, 364439853
002671991163519481417, 365197079
002683296561370121785, 366866705
002683947068033893470, 366941857
002684807466270856006, 367057868
002696153869420561392, 368284696
002697298230709038817, 368385724
002706272203638147019, 369161732
002708022906596513075, 369300989
002714137618315703170, 369856483
002715557619156551110, 369997512
002719873006695145290, 370434698
002737163687123192910, 372759695
002740397324469143496, 373310707
002745100445990661903, 374227361
002748853652829108842, 375064303
002755319033195793673, 376768579
002766298999336537734, 379873249
002768551479865737358, 380384575
002775490455063440465, 381760537
002778121271087849001, 382214162
002781240300698461617, 382720530
002781316629956269508, 382728489
002781476817015352037, 382745651
002788403295990096965, 383713367
002789010395704183615, 383785689
002821383505257998921, 386938064
002825208746320904830, 387361840
002825586461273815381, 387394655
002843193518140102562, 389917719
002864833681174246488, 395373510
002898592006995907513, 401331476
002901338595188629526, 401612576
002911547846005946652, 402567554
002964787751770022954, 411490715
002978333556122729848, 414849255
002979334043541655327, 415044390
002981181006441832134, 415387375
002982809534832890535, 415685735
002989312371070351542, 416709599
002993448822875735020, 417274383
002999460977780812191, 418003097
003010655595787531348, 419124892
003019182951067968365, 419877394
003026488882931020558, 420607742
003034672693211647862, 421590096
003041463928404330961, 422572590
003051507872856121422, 424411155
003056068368317040157, 425458159
003074253981992309011, 430604227
003076018056563074048, 431007456
003081615991227948688, 432136458
003085387699763939850, 432795208
003090692374570980781, 433605028
003104382680923177987, 435282198
003113141177961517531, 436097402
003115387895633078223, 436288014
003122559519594243270, 436965189
003131312367538678531, 437935950
003133315811614571556, 438187369
003145604196259388296, 440075087
003155371670451208075, 442145584
003156590611330349928, 442444840
003160385385779852955, 443494015
003168233416719349436, 445967351
003173067644026723935, 447227683
003174986188297359249, 447678724
003182764451745665868, 449242650
003188290285337750454, 450151128
003198996507607612327, 451561002
003216862471874353045, 453306368
003227135326269303356, 454365686
003236070471875914947, 455537708
003269288316423481622, 463524399
003270152091924453237, 463737605
003280975999423054862, 466040300
003283062435842989289, 466412553
003286243008096588300, 466925050
003296147435752218947, 468259287
003302172238378669425, 468927713
003303030323622959728, 469026778
003310182713051826814, 469695862
003317196361917444955, 470346672
003325793004902352463, 471282612
003339958650444706238, 473369436
003352227276079341203, 476046432
003353777038697185619, 476460537
003361587028943283983, 478950118
003365458792179861344, 480094545
003365949202578824820, 480227325
003370576162660278208, 481362927
003375418865079526954, 482386416
003378841899706220787, 483010034
003382983690893123539, 483705828
003387793281859215467, 484398731
003394843185213808167, 485290469
003397881147330889176, 485623192
003403108245154077607, 486149401
003403339071938053746, 486170409
003415918246113708714, 487373316
003416126897044184268, 487390720
003419054603920078971, 487716379
003425452409402436278, 488506646
003446368220580814273, 492367417
003450369254351122625, 493454578
003454210467698204061, 494668032
003472685815189673033, 499481275
003473680019303974439, 499659543
003476216772458310588, 500107090
003489984375044152148, 502043112
003502046326784921578, 503266543
003522246176183232869, 505542709
003523356539396417765, 505697776
003528975178691531823, 506595848
003529127684658412507, 506623161
003530180883695073964, 506804548
003531011971084071427, 506965488
003536067317015076483, 507968107
003565027810082134559, 515860981
003571286580170757295, 517013003
003573009962237739559, 517287433
003580349695146969654, 518343527
003581109189054902465, 518450222
003589175271178152917, 519377448
003594684127398874686, 519921469
003613265373229435914, 521934353
003626198850749988359, 524030633
003628167219103476498, 524426241
003649950326845539414, 530687518
003651633397531649149, 531123964
003654097089966601155, 531738142
003661532548831731019, 533275675
003671072114804868321, 534818565
003690721507841392721, 537000149
003697924081544924979, 537731689
003709875213468250167, 539293474
003711842361460717290, 539603975
003717930082411272275, 540671901
003729211134533106883, 543326929
003730134098960594944, 543600582
003768875606480383998, 552366290
003774263337831118237, 552986057
003776213011170249527, 553186972
003776696352117270404, 553239310
003789303842651547718, 554496865
003792776065173258594, 554897905
003794089474994175212, 555069701
003801145383076372690, 556052472
003804082283239969420, 556522538
003827976694193132339, 562639191
003866967857709638893, 569974375
003871752852095637265, 570439959
003876874286599049807, 570954230
003879755791002255955, 571275710
003882673712249880926, 571613571
003893413705648672331, 573139748
003896302422288084272, 573637447
003901393783751498218, 574625790
003912416559648833747, 577494723
003917363930067211020, 579217128
003932050993632642696, 583199261
003933283249239459728, 583447632
003939109307418619581, 584494784
003941226386195239300, 584832776
003941358619923874503, 584845721
003951743134613258321, 586233110
003956731676280347789, 586791663
003968273483424792681, 587963393
003974932650170036365, 588777480
003975447146038835012, 588845369
004005370575232781601, 595682032
004011396170379644570, 597682419
004013864102161737271, 598357926
004016866863765334276, 599120881
004040044835334718753, 602998263
004040476024810514929, 603055042
004049661218462510528, 604019141
004050823394203407889, 604124771
004056799143629255880, 604768326
004060988903891192386, 605283734
004063494146126618506, 605614462
004064711895437820904, 605780917
004071478385833234758, 606846591
004075377896731274390, 607562931
004075932976726664403, 607662370
004083729166393092339, 609449519
004085645197785340051, 609969423
004092090972216376481, 612052267
004100395299675440227, 614822429
004100487930741849577, 614853731
004113896397116062459, 617812995
004124221502856746449, 619379166
004153736764901601943, 622850141
004164005197745333886, 624682966
004168242703390798798, 625632898
004174537210925733820, 627371470
004174912036572230081, 627491000
004186241070512948002, 631372930
004204784652169030083, 635320308
004210045860673585343, 636087858
004220875933893786171, 637343704
004223220650610588904, 637586680
004226929753051930613, 637971969
004246265338471317723, 640738336
004252553123377843768, 642052601
004265543100673879489, 645998801
004269902781221941853, 647552776
004270925967981963999, 647875218
004287041227792242955, 651583264
004302590345325821283, 653748449
004312267996969663031, 654769149
004312499288218796926, 654788870
004316685798078229620, 655289913
004316963774340534138, 655317048
004321392256669747333, 655905434
004323282830778227823, 656188192
004332116083716286015, 657701171
004339757359395147701, 659418650
004342031768084225190, 660023043
004347631185200461651, 661799911
004353621074468695268, 664023024
004356031120386799582, 664800082
004378855010441858422, 669520691
004398418369578204479, 671783669
004409910680628322113, 673379887
004418276209068141616, 674949665
004419389804948564802, 675193716
004423738523292154800, 676249881
004461021033648876924, 686138899
004467527475000001080, 687010453
004472732261884912361, 687604062
004477680062472596125, 688124674
004480535623103048136, 688447819
004488590048556371143, 689496740
004512260236429408896, 694743035
004529865839368305280, 700405216
004542204909815057967, 702738633
004545230691210279285, 703186191
004555084375882993264, 704391105
004567898761435077121, 705886039
004592843918223688811, 711129325
004598290794147100629, 713136149
004604992818162348230, 715478969
004608096679328055006, 716365740

Вам также может понравиться