Вы находитесь на странице: 1из 8

JAVANI_WORLD

ABOUT

CASE_STUDY: HASH_SET
Set as a datastructure:
Before introducing a collection, called set, it is good to know the basics of hashing. Objects in Java inherit a
method hashCode() from the
java.lang.Object. hashCode() returns an int value. Hashcode is a 32-bit signed integer, which is used to store a
n instance of the class in hash related collections:
sets and maps. Hashcode as inherited from the object class relates to certain object behaviour as distinguishes
it from the other objects created from the class
by objects properities. Let's have an example:
public class Person {
private String name;
private String dateOfBirth;
private String socialSecurityNumber;
public Person(String name, String dob, String ssn) {
this.name = name;
dateOfBirth = dob;
socialSecurityNumber = ssn;
}
@Override
public int hashCode() {
int hash = name == null ? 0 : name.hashCode();
int hash += dateOfBirth == null ? 0 : dateOfBirth.hashCode();
int hash += socialSecurityNumber == null ? 0 : socialSecurityNumber.hashCode();
return hash;
}
}

String class has its own hashCode, so it is used in the example. Overriding hashCode() should always mimic the i
ntances behaviour, since there is an other important
method, which is inherited from the java.lang.Object: equals(Object o): Every intance, which has the same hashco
de, must be equal. However not every equal object
share the same hashcode:
public class Person {
private String name;
private String dateOfBirth;
private String socialSecurityNumber;
public Person(String name, String dob, String ssn) {
this.name = name;
dateOfBirth = dob;
socialSecurityNumber = ssn;
}
@Override
public boolean equals(Person p) {
if (name.equals(p.getName()) // getName() implementation omited...
if (dateOfBirth.equals(p.getDateOfBirth())
if (socialSecurityNumber.equals(p.getSSN()))
return true;
converted by Web2PDFConvert.com

return true;
return false;
}
@Override
public int hashCode() {
int hash = name == null ? 0 : name.hashCode();
int hash += dateOfBirth == null ? 0 : dateOfBirth.hashCode();
int hash ^= socialSecurityNumber == null ? 0 : socialSecurityNumber.hashCode();
return hash;
}
}

When using sets and maps and overriding java.lang.Objects equals method, one must always ensure that the hashCod
e() behaves the same way. Hashcode is used for
mapping the objects inside of these datastructures and thus, making them super efficient.
But that is what hashCode() is all about, in generally speaking.
Set is a datastructure, which is used to store non-duplicate elements. Like is array, set is used to store eleme
nts based on indexing system.
However whereas in array, elements are stored in indexed cells, set uses different kind of key to access the ele
ments:
Every object has a method called hashCode(), which is ultimately inherited from the Object class itself.
Hashcode is an integer value of the object, which distinguishes it from other objects. Gererally speaking, the h
ashcode inherited from
the Object class, is a memory address of the object. java.util.HashSet uses hashcode to retrieve the object from
the set and also to make sure
there are no duplicate elements. Every object, which has the same hashcode, is equal, but not all equal objects
have the same hashcode:
The method equlas(Object o) inherited from the java.lang.Object.
A demonstration of a hashset:

In HashSet, objects stored are not in any particular order: not in the insertion order nor any other.
Iterating through the HashSet resulst a seemingly random order:
public class Example {
public static void main(String[] args) {
java.util.HashSett<Integer> set = new java.util.HashSett<>();
set.add(1);
set.add(100);
set.add(1000000);
set.add(-1);
set.add(-5);
for (Integer i : set)
System.out.print(i + ", ");
converted by Web2PDFConvert.com

System.out.print(i + ", ");


}
}

As you can see, if you try to program.


To understand the concept of set, it is worth of effort to implement one:
/* HashSet implemented */
public class HashSet implements Iterable {
private static final int INITIAL_CAPACITY;
private static final float LOAD_FACTOR;
private int size;
private int capacity;
private java.util.ArrayList[] table;
static {
INITIAL_CAPACITY = 40;
LOAD_FACTOR = 0.75f;
}
/**
* Constructs a hashset with initial capacity of 40
*/
public HashSet() {
table = new java.util.ArrayList[INITIAL_CAPACITY];
capacity = INITIAL_CAPACITY;
}
// increases a size of the hashset when needed
private void increaseCapacity() {
java.util.ArrayList list = asList();
capacity <<= 1;
table = new java.util.ArrayList[capacity];
size = 0;
for (int i = 0; i < list.size(); i++)
add(list.get(i));
}
/**
* Constructs a hashset with pre defined capacity
*/
public HashSet(int capacity) {
this.capacity = capacity;
table = new java.util.ArrayList[capacity];
}
/**
* Adds element to hashset
* @param e
* @return true if set previously contains no equal element
*/
public boolean add(E e) {
if (contains(e))
return false;
if (size >= LOAD_FACTOR * capacity)
increaseCapacity();
int index = getHash(e.hashCode());
if (table[index] == null)
table[index] = new java.util.ArrayList();
table[index].add(e);
size++;
return true;
}
/**
* Empties the set
*/
converted by Web2PDFConvert.com

public void clear() {


size = 0;
for (int i = 0; i < capacity; i++)
if (table[i] != null)
table[i].clear();
}
/**
* Removes element from the hashset
* @param e
* @return true if element is found in the set
*/
public boolean remove(E e) {
if (!contains(e))
return false;
int index = getHash(e.hashCode());
if (table[index] != null) {
java.util.ArrayList list = table[index];
for (int i = 0; i < list.size(); i++)
if (list.get(i).equals(e))
list.remove(i);
}
size--;
return true;
}
/**
* Returns true, if the given element is present in the hashset
* @param e
* @return true if element is in the set
*/
public boolean contains(E e) {
int index = getHash(e.hashCode());
if (table[index] != null) {
java.util.ArrayList list = table[index];
for (int i = 0; i < list.size(); i++)
if (list.get(i).equals(e))
return true;
}
return false;
}
/**
* Returns true is the size of the set is 0
* @return true
*/
public boolean isEmpty() {
return size == 0;
}
/**
* Returns the iterator of the hashset
* @return HashSetIterator()
*/
public java.util.Iterator iterator() {
return new HashSetIterator(this);
}
// Evenly distributes the hash
private int getHash(int code) {
code ^= (code >>> 19) ^ (code >>> 13);
code ^= (code >>> 7) ^ (code >>> 4);
return (code & (capacity - 1));
}
// Iterator class for the hashset
private class HashSetIterator implements java.util.Iterator {
private int current = 0;
java.util.ArrayList setList;
HashSet set;
converted by Web2PDFConvert.com

HashSet set;
public HashSetIterator(HashSet set) {
setList = asList();
this.set = set;
}
@Override
public boolean hasNext() {
return current < setList.size();
}
@Override
public E next() {
return setList.get(current++);
}
@Override
public void remove() {
set.remove(setList.get(current));
setList.remove(current);
}
}
/**
* Returns java.util.ArrayList presentation of the hashset
* @return list
*/
public java.util.ArrayList asList() {
java.util.ArrayList list = new java.util.ArrayList<>();
for (int i = 0; i < capacity; i++)
if (table[i] != null)
for (int j = 0; j < table[i].size(); j++)
list.add(table[i].get(j));
return list;
}
@Override /** Returns the string presentation of the hashset */
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("[");
java.util.ArrayList list = asList();
for (int i = 0; i < list.size(); i++)
if (i < list.size() - 1)
builder.append(list.get(i) + ", ");
else
builder.append(list.get(i));
builder.append("]");
return builder.toString();
}
/** main method for testing purposes*/
public static void main(String[] args) {
HashSet set = new HashSet();
set.add(1);
set.add(2);
set.add(100);
for (Integer i : set) // test iterator
System.out.println(i);
set.remove(2);
System.out.println(set);
}
}

All in all, the hashset depends on the hash function. It distributes the objects getHash() method, where the obj
ects hashcode
itself is being evaluated. There is no get() method, and only way to access elements is through iteration. Howev
er hashset is super
efficient in accessing elements for example with contains(E e) method: if the hashcode is implemented well enoug
converted by Web2PDFConvert.com

efficient in accessing elements for example with contains(E e) method: if the hashcode is implemented well enoug
h, it
only takes a constant time. Also removing element with remove(E e), it takes constant time to execute.
Like being said, elements are not in any particular order: one may notice that they are actually stored in an ar
raylist, which is itself stored in the array.
Index of the arraylist in the array is the hashcode of that particular object distributed eventually by getHash(
E e) method.
In the case that two non-equal objects share a hashcode, arraylist stores every object containing the same hashc
ode.
So in other words, the hashcode of the object is the index of arraylist containing it.
Hashset does not allow storage of duplicate (equals returns true) objects, so if a programmer needs a efficient
datastructure to store non-duplicate
objects, one might consider a hashset.
Java collection framework also itroduces two other sets for storing non-duplicate elements: java.util.TreeSet an
d java.util.LinkedHashSet. The difference between
those and a reqular HashSet is that LinkedHashSet keeps the insertion order when iterating through the collectio
n. TreeSet stores non-duplicate elements
in an ascending order. LinkedHashSet is implemented with a linkedlist-like structure and finally TreeSet is actu
ally implemented with Tree-structure.
Accessing elements is not as efficient, but if the programmer needs to maintain the insertion order, LinkedHashS
et is a good choice. On the other hand,
if the programmer needs elements to be sorted, TreeSet is a good choice. However if the order or sorting is not
required after each insertion or removing,
programmer should consider a hashset and then create latter sets, when the order is needed.
You might have noticed that actual arraylist is used for storing the elements: the reason is that two non equal
objects may share the hashcode, which is used
as an index of the final object storage. A perfect hashfunction is a hashcode method, which always returns the d
ifferent int value, if the objects are
not equal. Quite often although happens that that is not the case. There are ways to handle these kinds of colli
ssions:
One way is to use other structures (like arraylist here) to store objects with the same hashcode. Other way is t
o put objects in the following empty indexes in a case
of collission:
add(E e)
int index = e.hashCode();
if (table[index] == null)
table[index] = e;
else
for (int i = index + 1; i < capacityOfTheTable; i++)
if (table[i] == null)
table[i] = e;
The third way to deal is like the second, but indead of using the next free index, distribute the indexes more:
add(E e)
int index = e.hashCode();
if (table[index] == null)
table[index] = e;
else
int new = index * 7; // prime is a good choise
while (new < capacity)
if (table[new] == null)
table[new] = e;
else
new =* 7;

The basic operations with the sets are the following:

converted by Web2PDFConvert.com

boolean add(E e)

Adds new element to the hashset, if no equal element is present.

boolean contains(Object o)

Returns true, if set containts specific object

boolean isEmpty()

Returns true, if the set's size is 0

int size()

Returns the size of the set.


converted by Web2PDFConvert.com

Returns the size of the set.

java.util.Iterator iterator()

Returns the iterator for the set.

converted by Web2PDFConvert.com

Вам также может понравиться