Вы находитесь на странице: 1из 4

2010 Ninth IEEE International Symposium on Network Computing and Applications 2010 Network Computing and Applications

Advanced Hashing with Hybrid Key Duplication for IP Address Lookup


Rujiroj Tiengtavat Wei-Ming Lin Department of Electrical and Computer Engineering The University of Texas at San Antonio San Antonio, TX 78249-0669, USA

Abstract
Hashing techniques have been widely adopted for general IP address lookup and specic network intrusion detection. In most current commonly used XOR-hashing algorithms, each of the hash key bits is usually explicitly XORed only at most once in the hash process, which may limit the amount of potential randomness that can be introduced by the hashing process. This paper further looks into various ways in duplicating and re-using key bits to maximize randomness needed in the hashing process so as to enhance the overall performance further.

key bits, one is able to XOR more duplicated bits to obtain each hash bit without coming across the bit nullication or downgrading problem.

2 XOR-Hashing Methodology
Throughout this paper, the database under discussion is dened as consisting of M = 2m entries with each entry having n bits in length. It can also be viewed as having n M -bit vectors with each vector consisting of each respective bit from all entries. A commonly used hashing technique is to simply hash the n-bit key into m-bit hash result n through a simple process XORing every distinct m key bits into a nal hash bit. Such a random XORing process (socalled Group-XOR in this paper) may not always lead to a desirable outcome. A much more effective hashing approach is proposed in [4] by preprocessing (and sorting) the database according to a parameter, the d value, that reveals a very useful insight into the degree of uniformity of the database. The d value of a bit vector is the absolute difference between the number of 0s and 1s in it. This leads us to employing a simple pre-processing step in re-arranging the n bit vectors according to their d values sorted into a non-decreasing order. This sorted sequence then gives us an order of signicance according to which each bit should be utilized. A XOR-hashing algorithm based on the principle of d value is presented in [6]. This algorithm, the d-IOX (d value in-order XOR folding), involves the aforementioned preprocessing/sorting step before applying the simple in-order folding XOR hashing. The d-IOX proves to be much better than the simple random Group-XOR approach by registering an improvement in maximum search length (MSL) up to 30% in randomly generated database and up to 80% in real IP database.

1 Introduction
A complete survey and complexity analysis on IP address lookup algorithms has been provided in [8]. A performance comparison of traditional XOR folding, bit extraction, CRC-based hash functions is given in [3]. Other hashing algorithms have also been widely adopted to provide for the address look-up process [1, 2, 7, 9]. Hashing techniques using simple XOR operations have been very popular in applications where timely response is critical due to its relatively small hash process delay. In all the current commonly used XOR-hashing algorithms, when deriving the hash value, key bits are partitioned into rows to be XORed, and each of the hash key bits is usually explicitly XORed only at most once in the hash process, which may limit the amount of potential randomness that can be introduced by the hash process. When a key bit is reused (duplicated) for XORing in generating different hash value bits, there exists a potential that the overall randomness of new hash result may increase. In [5] a theory has been developed in duplicating bits while avoiding induced bit correlation which may easily offset any gained performance through bit duplication. A very signicant performance improvement was obtained in this series of techniques by employing a novel single-row bit duplication process to avoid bit nullication or downgrading problem. This paper aims to further extend the theory into duplicating more than one rows of key bits. By relaxing the restriction in duplicating only one row of
978-0-7695-4118-1/10 $26.00 2010 IEEE DOI 10.1109/NCA.2010.54 10.1109/NCA.2010.48 261

3 Bit-Duplication XOR Hashing


For the sake of completeness, a summary of the bitduplication theory presented in [5] is given here. Note that when there is no bit duplication under standard XOR hashing, no bits are shared in XORing to lead to different hash

value bits. That is, each hash value bit comes from XORing a distinct set of hash key bits. If one intends to reuse some key bits for XORing, then the overall effectiveness may be compromised due to the sharing. In obtaining two hash key bits, when there exist common bits between the two sets of hash key bits for their XORing, an Induced Duplication Correlation (IDC) arises between the two hash value bits. When more bits are duplicated for XORing, higher IDC tends to ensue. With the introduction of IDC the d value obtained for each hash value bit loses some of its meaning. That is, while randomness in the bit-wise distribution (d value) for each bit may be increased due to more bits being XORed, the overall randomness across the m hash value bits may actually decrease due to the IDC. In [5], a simple cycle duplication approach is proposed to enA B C D sure minimal bit correlation through the duplication process, in which W X Y Z key bits are shared between two D C A B groups of source bits to be XORed. One typical problem is the nulliC D A B cation problem where the same bit is duplicated to be XORed with it= BITS IN COMMON self in producing a hash bit, which Figure 1. Down- results in a loss of one additional grading potential bit for randomness. The from Adother problem is, while performditional ing the cycle duplication process on Duplicathe same row as shown in Figure 1 tion where the rst row is duplicated twice in order to further increase the randomness. With this, each pair of hash bits will have two key bits shared in their XORing (e.g. bits 0 and 1 sharing A and D), thus leading to a downgrading problem, or simply a higher degree of IDC.
hash hash hash hash bit 0 bit 1 bit 2 bit 3
10 11 12 7 9 5 6 8 1 2 3 4 0 In order to duplicate 10 11 9 8 5 6 7 3 2 1 12 0 4 9 8 7 6 3 4 5 1 10 11 12 0 2 X times without the 3 2 1 10 11 12 0 8 7 6 5 4 9 downgrading problem, 5 12 0 [5] shows that the 11 1 2 4 1 minimal m required is 10 3 2 m X (X + 1) + 1. 9 3 The problem can be 8 4 6 translated into a problem 7 5 6 of graphics for easier Figure 2. Threevisualization and proTime Duplication cessing. Borrowing with m = 13 Using from the notations used Cycle Duplication in [5], let the set of the m bit position indices be denoted S = {0, 1, 2, . . . , m 1}, and these bits are to be duplicated X times such that X satises the aforementioned condition. For the sake of simplicity without losing generality, assume that each of the X duplicated sequences

of the m bits are to be rotated starting from a particular bit position to avoid the two problems. We can simply focus on the bit 0 position of each of the strings to analyze the whole pattern. That is, the bit 0 position of the original string is at position 0. Let the position of bit 0 of each of the X + 1 strings be denoted as sj where 0 j X. Figure 2 shows an illustration for a case with X = 3 and m = 13, with 13 bit positions on a circle. In this case, the four starting locations are s0 = 0, s1 = 1, s2 = 3 and s3 = 9. With this notation, one can easily show that, in order to avoid any nullication problem, the following condition has to hold: si = sj , i, j, 0 i, j m, and i = j which guarantees that no bit position has two identical bits to be XORed. In order to avoid any sharing of multiple bits (i.e. the downgrading problem), the following condition has to be satised: Dij = Dkl , i, j, k, l, 0 i, j, k, l m, and (i, j) = (k, l) where Dij denotes the shorter distance from position si to position sj . Essentially, this condition guarantees that the no two positions can share more than one bit in common.

4 Multi-Row Each-Row-Once Duplication


1 Based on the theory develH I J K L M N r2 O P Q R S T U r3 oped in the previous section, G A B C D E F r 1.d one can easily derive that, if K L M N H I J r 2.d each row is only duplicated S T U O P Q R r 3.d once, the total of number of rows (and thus total of times) Figure 3. Eachthat can be duplicated withRow-Once out causing the nullication Duplication or downgrading problem can with m = 7 be signicantly greater. AsUsing Cycle suming that the bit rows are Duplication n indexed as ri , 1 i m , using the same circular fashion for duplication, one can show that, if i, 1 i m1 , row i is duplicated ex2 actly once by rotating its bits by i bit positions, then the maximum number of duplications can be achieved without any of the aforementioned problems. That is, let Y denote the number of rows thus duplicated, and Y m1 . This 2 is demonstrated by the example shown in Figure 3, where each of the three rows, r1 , r2 , and r3 , is cycle-duplicated once by rotating each of its bits once, twice, or thrice, respectively.

Hash Value Bits

5 Hybrid Duplication
Note that, as aforementioned, duplicating rows with higher d values tends to bring limited benets and sometimes can even be detrimental. In order to maximize the benet from duplication, one may have to use a hybrid approach in duplicating several rows, by duplicating different rows different number of times. Let Xi denote the number of times
262

row i is to be duplicated, in order for no nullication or downgrading to happen, the following condition has to be n Xi (Xi +1) m satised: m1 where r is the total i 2 2 number of rows under duplication. Note that this is a necessary condition but not a sufcient condition since a given pattern satisfying this condition may not be feasible. A simple example is when 3 0 1 14 1 m = 14, a satis2 13 2 fying pattern of 4 3 12 (2, 2) cannot be 6 7 4 constructed. This 11 5 again follows the 5 10 same reasoning 6 9 7 8 in proving the : starting bit positions of first row and its duplications uni-row and : starting bit positions of second row and its duplications : starting bit positions of third row and its duplications multi-row duplication. Similarly, under m = 15 Figure 4. Verifying Hyand n = 64, some brid Duplication with of maximally (2, 2, 1, 0, 0) on m = 15 allowed duplication patterns are listed in the following: (3, 1, 0, 0, 0), (2, 1, 1, 1, 1), (2, 2, 1, 0, 0), and all permutations of each of the patterns, such as (0, 1, 0, 3, 0) and (1, 2, 0, 1, 2). For example, in the case of (2, 2, 1, 0, 0), as shown in Figure 4, rst row is duplicated two times with each of the three starting bit positions being 0, 1 and 3 (indicated in circles), the second row is duplicated two times with the starting positions being 0, 4 and 9 (indicated in boxes), and the third row duplicated once with the two starting positions being 0 and 7 (indicated in triangles). Note that, the condition for avoiding nullication problem still needs to be satised for each of the three duplicated rows, independently; that is, all three rows duplicated are allowed to share the starting bit position 0. For the condition in avoiding downgrading problem, again, it has to be satised within each set of duplicated rows; that is, for the rst row, the three distance values between every pair of two starting positions are 1 (between 0 and 1), 2 (between 1 and 3) and 3 (between (0 and 3); for the second row the three distance values are 4 (between 0 and 4), 5 (between 4 and 9) and 6 (between 9 and 0); for the third row the only distance value is 7 (between 0 and 7). With this, all available distance values (from 1 to 7) are taken, which represents the maximally allowed duplication situation. The complete duplication pattern is shown in Figure 5.

10

11

12

13

14

a0 a 1 a 2 a3 a4 a 5 a 6 a7 a8 a 9 a10 a11 a12 a13 a14 b0 b1 b 2 b3 b4 b 5 b 6 b7 b8 b 9 b10 b11 b12 b13 b14 c 0 c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 c 11 c 12 c 13 c 14 a14 a 0 a 1 a2 a3 a 4 a 5 a6 a7 a 8 a9 a10 a11 a12 a13 a12 a 13 a 14 a0 a1 a 2 a 3 a4 a5 a 6 a7 a8 a9 a10 a11 b11 b12 b 13 b14 b0 b 1 b 2 b3 b4 b 5 b6 b7 b8 b9 b10 b6 b7 b 8 b9 b10 b 11 b 12 b13 b14 b 0 b1 b2 b3 b4 b5 c 8 c 9 c 10 c 11 c 12 c 13 c 14 c 0 c 1 c 2 c 3 c 4 c 5 c 6 c 7

Figure 5. Hybrid Duplication Pattern with (2, 2, 1, 0, 0) on m = 15 which XORs groups of random key bits is the general base of our comparison, while the d-IOX [6] and the IDC technique from [5] aforementioned will serve as the reference. The data set are randomly generated such that the d value for each bit position is uniformly distributed. Performance comparison among the three techniques are in terms of MSL by taking an average of results from 1,000 runs. In order to disengage potential effect from any uncertain factors, no partial rows are considered; that is, n is set to be an integral multiple of m. We rst compare the effect of uni-row duplication approaches in duplicating different rows. Figure 6 shows the results comparing all possible uni-row duplication patterns on n = 2m and n = 3m. In Figure 6(a) where

(a)

(b)

Figure 6. Simulation Results for Uni-Row Duplications for (n, m) = (a) (32, 16), (b) (30, 10) (n, m) = (32, 16), when one row is selected for duplication, the more times it is duplicated, the higher the performance it delivers. Comparing performance from duplicating different rows, our aforementioned conjecture is clearly veried here that duplicating the row with the smallest d values (the rst row) does lead to the most benet while duplicating the row with the largest d values produces the least benet. On the case where n = 3m, the uni-row duplication shows a somewhat different result than the n = 2m cases. Duplicating the best row (the rst row) again shows the best potential, while duplicating each of the non-best rows (the second or the third row), although it still shows continuously improved performance when more duplications are applied, its best achievable performance (from (0, 2, 0) or (0, 0, 2)) cannot closely match the performance from (2, 0, 0). From this result, had the second row been
263

6 Simulation Results
Simulation runs are performed on randomly generated data sets to demonstrate the performance improvement of the minimal IDC duplication XOR hash technique over other techniques with no duplication. The Group-XOR algorithm

allowed to duplicate a few more times, it might have had a chance to match the rst-row duplication, but the maximal number of times that can be applied without causing any nullication or downgrading problem is restricted to 2 for each of the uni-row duplications under m 10 for our simulations. Duplicating the third row does not inspire as much as duplicating other rows, which can be easily explained by the fact that the high d values in the third row inherently limit its potential in duplication. Under uni-row duplication, the best duplication is from the maximally duplicated patterns using the rst row. Using the each-row-once duplication approach, one may be able to duplicate the most number of times, but the benet may be offset by duplicating the rows with larger d values, and the fact that m is not large enough to support maximal number of rows for duplication also limits the potential of this approach. Simulation runs on hybrid duplication deliver the most intriguing results. Figure 7 shows the comparison results for both n = 2m and n = 3m. In the case

ever is feasible normally leads to a gain in performance. In (n, m) = (30, 10), (2, 1, 0) produces the best performance, closely followed by that of (1, 2, 0).

7 Conclusion
This paper further extends previously proposed hash design methodology to allow for more performance improvement. This new methodology provides an extra degree of design exibility and points out a direction for future research, especially for cases with large number of hash key bits. By providing initial groundwork for duplication in hashing, this paper has pointed out the potential areas to improve hashing algorithms and new ways to exploit specic characteristics of the target database. Acknowledgement: This research was partly supported by the NSF grant # HRD-0932339.

References
[1] A. Broder and M. Mitzenmacher, Using Multiple Hash Functions to Improve IP Lookups, IEEE INFOCOM, 2001. [2] S. Chung, J. Sungkee, H. Yoon and J. Cho, A Fast and Updatable IP Address Lookup Scheme, International Conference on Computer Networks and Mobile Computing, 2001.

(a)

(b)

Figure 7. Simulation Results for Hybrid Duplications for (n, m) = (a) (32, 16), (b) (30, 10) where n = 2m, some of hybrid duplication patterns easily outperform the best uni-row duplication approach. For example, in the case of (n, m) = (32, 16), the the patterns of (1, 3), (2, 1), (2, 2) and (3, 1) all surpass the performance of (3, 0) by a signicant margin, up to an additional 25% of improvement. This important observation reveals that maximally duplicating the best row, (3, 0) in this case, may not be the best approach; instead, duplicating the best row fewer than the maximally allowed times coupled with duplicating the second row (e.g. (2, 1) or (2, 2) or even (1, 2)) actually delivers better results. This can be explained by looking into the performance trend shown in Figure 6 where duplicating a row with smaller d values tends to reach it best potential earlier in terms of the numbers of duplication applied. For example, (3, 0) does not pose a signicant gain over (2, 0), while duplicating the second row each additional time obviously provides more benet. In the case of n = 3m, hybrid patterns produce even more interesting results. First, the pattern of (1, 1, 1) by duplicating the third row on top of (1, 1, 0) actually leads to a degraded performance, which again can be explained by the large d values in the third row. In general, duplicating the rst row and second row wher264

[3] R. Jain, A Comparison of Hashing Schemes for Address Lookup in Computer Networks, IEEE Transactions on Communications,, Vol. 40, No. 10, Oct 1992. [4] C. Martinez and W.-M. Lin, Adaptive Hashing Technique for IP Address Lookup in Computer Networks, 14th IEEE International Conference on Networks (ICON 2006), September 2006, Singapore. [5] C. Martinez and W.-M. Lin, Advanced Hash Algorithms with Key Bits Duplication for IP Address Lookup, The Fifth International Conference on Networking and Services (ICNS 2009), Valencia, Valencia, Spain, April 2009. [6] D. Pandya, C. Martinez, W.-M. Lin and P. Patel, Advanced Hashing Techniques for Non-Uniformly Distributed IP Address Lookup, Third IASTED International Conference on Communications and Computer Networks (CCN2006), October 2006, Lima, Peru. [7] D. Pao, C. Liu, L. Yeung and K.S. Chan, Efcient Hardware Architecture for Fast IP Address Lookup, IEEE INFOCOM, 2002. [8] M.A. Ruiz-Sanchez, E.W. Biersack, and W. Dabbous, Survey and Taxonomy of IP Address Lookup Algorithms, IEEE Network, Vol.15, pp.8-23, Mar./Apr.2001. [9] P.A. Yilmaz, A. Belenkiy, N. Uzun, N. Gogate and M. Toy, A Trie-based Algorithm for IP Lookup Problem, Global Telecommunications Conference (GLOBECOM) 2000.

Вам также может понравиться