You are on page 1of 6

LOG IN OR JOIN

HOME

REFCARDZ

MICROZONES

ZONES

LIBRARY

SNIPPETS

TUTORIALS

Search

O Z N EB IA D / T G I B

Use these Java SDKs to Image Enable Your Web Apps - Atalasoft JoltImage Get a Free 30 Day Trial!

Big Data/BI Zone is brought to you in partnership with:

Niranjan Tallapalli
Bio Website

CONNECT W ITH DZONE Publish an Article Share a Tip

Hashmap Internal Implementation Analysis in Java


05.09.2013 | 21854 views
| Like 25
Tweet 2

DZone, Inc.
Follow
+ 4,808
Like 8.2k Follow 25.4K followers

+1

Share

Note: Based on the response from my technical blog I am posting this article here so that it would be
visible to wider audience. java.util.HashMap.java
RELATED MICROZONE RESOURCES

Get Started Developing with Splunk, the Platform for Machine Data Build Big Data Apps with JavaScript and Django Download Hunk: Splunk Analytics for Hadoop

1. /** 2. * The maximum capacity, used if a higher value is implicitly specified 3. * by either of the constructors with arguments. 4. * MUST be a power of two <= 1<<30. 5. */ 6. static final int MAXIMUM_CAPACITY = 1 << 30;
It says the maximum size to which hashmap can expand, i.e, till 2^(30) = 1,073,741,824 java.util.HashMap.java

01. 02. 03. 04. 05. 06. 07. 08. 09.

/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 16; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f;

It says default size of an array is 16 (always power of 2, we will understand soon why it is always power of 2 going further) and load factor means whenever the size of the hashmap reaches to 75% of its current size, i.e, 12, it will double its size by recomputing the hashcodes of existing data structure elements. Hence to avoid rehashing of the data structure as elements grow it is the best practice to explicitly give the size of the hashmap while creating it.

Dev Tech That Will be HOT in 2014 New Checklist: TestDriven Development (TDD) Conversational Git: The Friendly Introduction to Git

Do you foresee any problem with this resizing of hashmap in java? Since java is multi threaded it
is very possible that more than one thread might be using same hashmap and then they both realize the need for re-sizing the hashmap at the same time which leads to race condition.

converted by Web2PDFConvert.com

What is race condition with respect to hashmaps? When two or more threads see the need for
resizing the same hashmap, they might end up adding the elements of old bucket to the new bucket simultaneously and hence might lead to infinite loops. FYI, in case of collision, i.e, when there are different keys with same same hashcode, internally we use single linked list to store the elements. And we store every new element at the head of the linked list to avoid tail traversing and hence at the time of resizing the entire sequence of objects in linked list gets reversed, during which there are chances of infinite loops. For example, lets assume there are 3 keys with same hashcode and hence stored in linked list inside a bucket [below format is in object_value(current_address, next_address) ] Initial structure: 1(100, 200) > 2(200, 300) > 3(300, null) After resizing by thread-1: 3(300, 200) > 2(200, 100) > 1(100, null) When thread-2 starts resizing, its again starts with 1st element by placing it at the head: 1(100, 300) > 3(300, 200) > 2(200, 100) ==> which becomes a infinite loop for next insertion and thread hangs here. java.util.HashMap.java

Implementing the Card UI Pattern in PhoneGap/HTML5 Applications

POPULAR AT DZONE

JSF usage in the real world - List of sites using JSF This year in Scala (2013) Java Snaps of the Day #6, Count DB Table Records, Check if Table is exist, Table Column Names... Java Collections waste - statistics over 500 applications Working with OS environment variables in Java JavaScript routing in Play 2 (Java edition) WebJars Took Off in 2013
See more popular at DZone Subscribe to the RSS feed

01. /** 02. * Associates the specified value with the specified key in this map. 03. * If the map previously contained a mapping for the key, the old 04. * value is replaced. 05. * 06. * @param key key with which the specified value is to be associated 07. * @param value value to be associated with the specified key 08. * @return the previous value associated with <tt>key</tt>, or 09. * <tt>null</tt> if there was no mapping for <tt>key</tt>. 10. * (A <tt>null</tt> return can also indicate that the map 11. * previously associated <tt>null</tt> with <tt>key</tt>.) 12. */ 13. public V put(K key, V value) { 14. if (key == null) 15. return putForNullKey(value); 16. int hash = hash(key.hashCode()); 17. int i = indexFor(hash, table.length); 18. for (Entry<K,V> e = table[i]; e != null; e = e.next) { 19. Object k; 20. if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { 21. V oldValue = e.value; 22. e.value = value; 23. e.recordAccess(this); 24. return oldValue; 25. } 26. } 27. 28. modCount++; 29. addEntry(hash, key, value, i); 30. return null; 31. }
Here it 1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument 2. generates index based on the re-generated hashcode and length of the data structure. 3. if key exists, it over-rides the element, else it will create a new entry in the hashmap at the index generated in STEP-2 Steps3 is straight forward but Steps1&2 needs to have deeper understanding. Let us dive into the internals of these methods

Note: These two methods are very very important to understand the internal working functionality of
hashmap in openjdk java.util.HashMap.java

01. /** 02. * Applies a supplemental hash function to a given hashCode, which 03. * defends against poor quality hash functions. This is critical 04. * because HashMap uses power-of-two length hash tables, that 05. * otherwise encounter collisions for hashCodes that do not differ 06. * in lower bits. Note: Null keys always map to hash 0, thus index 0. 07. */
converted by Web2PDFConvert.com

08. 09. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.

static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); }

here: h is hashcode(because of its int data type, it is 32 bit) length is DEFAULT_INITIAL_CAPACITY(because of its int data type, it is 32 bit) Comment from above source code says Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. What do this means??? It means that if in case the algorithm we wrote for hashcode generation does not distribute/mix lower bits evenly, it will lead to more collisions. For example, we have hashcode logic of empId*deptId and if deptId is even, it would always generate even hashcodes because any number multiplied by EVEN is always EVEN. And if we directly depend on these hashcodes to compute the index and store our objects into hashmap then 1. odd places in the hashmap are always empty 2. because of #1, it would leave us to use only even places and hence double the number of collisions For example,

01. I am considering some hash codes which our code might generate, which are very valid as they are different, but we will prove these to be useless soon 02. 1111110111011010101101010111110 03. 1100110111011010101011010111110 04. 1100000111011010101110010111110 05. I am considering these sequences directly (without using hash function) and pass it for indexFor method, where we do AND operation between 'hashcode' and 'length-1(which will always give sequence of 1's as length is always power of 2)' 06. As we are considering the length as default length, i.e, 16, binary representation of 16-1 is 1111 07. this is what happens inside indexFor method 08. 1111110111011010101101010111110 & 0000000000000000000000000001111 = 1110 09. 1100110111011010101011010111110 & 0000000000000000000000000001111 = 1110 10. 1100000111011010101110010111110 & 0000000000000000000000000001111 = 1110 What is bucket and what can be maximum number of buckets in hashmap? A bucket is an
instance of the linked list (Entry Inner Class in my previous post) and we can have as many number of buckets as length of the hashmap at maximum, for example, in a hashmap of length 8, there can be maximum of 8 buckets, each is an instance of linked list. From this we understand that all the objects with these different hascodes would have same index which means they would all go into the same bucket, which is a BIG-FAIL as it leads to arraylist complexity O(n) instead of O(1) Comment from above source code says that otherwise encounter collisions for hashCodes that do not differ in lower bits. Notice this sequence of 0-15 (2-power-4), its the default size of Hashtable

1. 2. 3. 4. 5. 6. 7. 8.

0000 0001 0010 0011 0100 0101 0110 0111

0 1 2 3 4 5 6 7

1000 1001 1010 1011 1100 1101 1110 1111

8 9 10 11 12 13 14 15
converted by Web2PDFConvert.com

If we notice here, hashmap with power-of-two length 16(2^4), only last four digits matter in the allocation of buckets, and these are the 4 binary lower bit digit variations that play prominent role in identifying the right bucket. Keeping the above sequence in mind, we re-generated the hashcode from hash(int h) by passing the existing hascode which makes sure there is enough variation in the lower bits of the hashcode and then pass it to indexFor() method , this will ensure the lower bits of hashcode are used to identify the bucket and the rest higher bits are ignored. For example, taking the same hascode sequences from above example

01. this is what happens inside indexFor method 02. 1111110111011010101101010111110 (our hash code) when regenerated with hash(int h) method, it generates new hashcode 1111001111110011100110111011010 03. 1100110111011010101011010111110 ==> 1100000010010000101101110011001 04. 1100000111011010101110010111110 ==> 1100110001001000011011110001011 05. 06. passing these set of new hashcodes to indexFor() method 07. 1111001111110011100110111011010 & 0000000000000000000000000001111 = 1010 08. 1100000010010000101101110011001 & 0000000000000000000000000001111 = 1001 09. 1100110001001000011011110001011 & 0000000000000000000000000001111 = 1011
so here it is clear that becase of regenerated hashcode, the lower bits are will distributed/mixed leading to unique index which leads to different buckets avoiding collisions.

Why only these magic numbers 20, 12, 7 and 4. It is explained in the book: The Art of Computer Programming by Donald Knuth.
>> Here we are XORing the most significant bits of the number into the least significant bits (20, 12, 7, 4). The main purpose of this operation is to make the hashcode differences visible in the least significant bits so that the hashmap elements can be distributed evenly across the buckets. java.util.HashMap.java

01. /** 02. * Associates the specified value with the specified key in this map. 03. * If the map previously contained a mapping for the key, the old 04. * value is replaced. 05. * 06. * @param key key with which the specified value is to be associated 07. * @param value value to be associated with the specified key 08. * @return the previous value associated with <tt>key</tt>, or 09. * <tt>null</tt> if there was no mapping for <tt>key</tt>. 10. * (A <tt>null</tt> return can also indicate that the map 11. * previously associated <tt>null</tt> with <tt>key</tt>.) 12. */ 13. public V put(K key, V value) { 14. if (key == null) 15. return putForNullKey(value); 16. int hash = hash(key.hashCode()); 17. int i = indexFor(hash, table.length); 18. for (Entry<K,V> e = table[i]; e != null; e = e.next) { 19. Object k; 20. if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { 21. V oldValue = e.value; 22. e.value = value; 23. e.recordAccess(this); 24. return oldValue; 25. } 26. } 27. 28. modCount++; 29. addEntry(hash, key, value, i); 30. return null; 31. }
Going back to previous steps: 1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument 2. generates index based on the re-generated hashcode and length of the data structure. 3. if key exists, it over-rides the element, else it will create a new entry in the hashmap at the index generated in STEP-2

converted by Web2PDFConvert.com

Steps1&2 must be clear by now. Step3:

What happens when two different keys have same hascode?


1. if the keys are equal, i.e, to-be-inserted key and already-inserted keys hashcodes are same and keys are same (via reference or via equals() method) then over-ride the previous key-value pair with the current key-value pair. 2. if keys are not equal, then store the key-value pair in the same bucket as that of the existing keys.

When collision happens in hashmap? it happens in case-2 of above question. How do you retrieve value object when two keys with same hashcode are stored in hashmap? Using hashcode wo go to the right bucket and using equals we find the right element in the
bucket and then return it.

How does different keys with same hascode stored in hashmap? Usual answer is in bucket but
technically they are all stored in a single linked list. Little difference is that insertion of new element to the linked list is made at the head instead of tail to avoid tail traversal. java.util.HashMap.java

01. /** 02. * Returns the value to which the specified key is mapped, 03. * or {@code null} if this map contains no mapping for the key. 04. * 05. * <p>More formally, if this map contains a mapping from a key 06. * {@code k} to a value {@code v} such that {@code (key==null ? k==null : 07. * key.equals(k))}, then this method returns {@code v}; otherwise 08. * it returns {@code null}. (There can be at most one such mapping.) 09. * 10. * <p>A return value of {@code null} does not <i>necessarily</i> 11. * indicate that the map contains no mapping for the key; it's also 12. * possible that the map explicitly maps the key to {@code null}. 13. * The {@link #containsKey containsKey} operation may be used to 14. * distinguish these two cases. 15. * 16. * @see #put(Object, Object) 17. */ 18. public V get(Object key) { 19. if (key == null) 20. return getForNullKey(); 21. int hash = hash(key.hashCode()); 22. int i = indexFor(hash, table.length); 23. for (Entry<K,V> e = table[i]; e != null; e = e.next) { 24. Object k; 25. if (e.hash == hash && ((k = e.key) == key || key.equals(k))) 26. return e.value; 27. } 28. return null; 29. }
1. re-generates the hashcode using hash(int h) method by passing user defined hashcode as an argument 2. generates index based on the re-generated hashcode and length of the data structure. 3. point to the right bucket, i.e, table[i], and traverse through the linked list, which is constructed based on Entry inner class 4. when keys are equal and their hashcodes are equal then return the value mapped to that key
Published at DZone with permission of its author, Niranjan Tallapalli.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: Java

Server-side

Theory

AROUND THE DZONE NETWORK

converted by Web2PDFConvert.com

ARCHITECTS

JAVALOBBY

ARCHITECTS

JAVALOBBY

JAVALOBBY

SERVER

Top Posts of 2013: Big Data Beyond MapReduce: Goog...

Top Posts of 2013: The Principles of Java Applicat...

5 Things a Java Developer Should Consider This Yea...

Top Posts of 2013: There Are Only 2 Roles of Code

Singleton Design Pattern An Introspection w/ B...

Best Best Practices Ever

YOU MIGHT ALSO LIKE


Top Posts of 2013: Please stop using Twitter Bootstrap Top Posts of 2013: The Principles of Java Application Performance Tuning Top Posts of 2013: There Are Only 2 Roles of Code Top Posts of 2013: Big Data Beyond MapReduce: Google's Big Data Papers Top Posts of 2013: 10 Subtle Best Practices when Coding Java Top Posts of 2013: Why I'm Leaving Heroku 'What's New in Visual Studio 2013" - a Four Hour Tour Development on a Mac versus Linux QA is Dead. Long live QA! Top Posts of 2013: Apache Camel vs. Spring Integration MongoDB Lightning Fast Aggregation Challenged with Oracle 10 Ways To Use Message Queues TDD via Tic-Tac-Toe Memory Leaks and Memory Management in Java Applications The IoC Metaphor in SDN

POPULAR ON JAVALOBBY
Spring Batch - Hello World Is Hibernate the best choice? How to Create Visual Applications in Java? 9 Programming Languages To Watch In 2011 Lucene's FuzzyQuery is 100 times faster in 4.0 Introduction to Oracle's ADF Faces Rich Client Framework Interview: John De Goes Introduces a Newly Free Source Code Editor Time Slider: OpenSolaris 2008.11 Killer Feature

SPOTLIGHT RESOURCES

Camel Essential Components


DZone's 170th Refcard is an essential reference to Camel, an opensource, lightweight, integration library. This Refcard is authored by...

LATEST ARTICLES
The collaborative classroom What is the social learning perspective? Why is Tomcat a Webserver and not an Application Server Why Did SQLJ Die? Hubway Data Visualization Challenge Entry: The Flow of Bikers Saving for Retirement as a Software Engineer Some More MySQL Tuning JavaScript Routing in Play 2 (Java edition)

Practical DNS: Managing Domains for Safety, Reliability, and Speed

Essential Couchbase APIs: Open Source NoSQL Data Access from Java, Ruby, and .NET

Search

DZone
Refcardz Tech Library Snippets About DZone Tools & Buttons Book Reviews IT Q uestions My Profile Advertise Send Feedback

Topics
HTML5 Cloud .NET PHP Perform ance Agile W indows Phone Mobile Java Eclipse Big Data DevO ps

Follow Us
Google + Facebook Link edIn Twitter

"Starting from scratch" is seductive but disease ridden -Pithy Advice for Program m ers

Advertising - Terms of Service - Privacy - 1997-2012, DZone, Inc.

Online Visitors: 125

converted by Web2PDFConvert.com