Comparison of Hash Strategy For Flow Based Load Balancing
Comparison of Hash Strategy For Flow Based Load Balancing
Witsarut Pittayapitak
Kasetsart University
Faculty of Engineering, 50 Ngamwongwan Rd., Chatuchak, Bangkok
10900, Thailand
[email protected]
Kasom Koht-Arsa
Kasetsart University
Faculty of Engineering, 50 Ngamwongwan Rd., Chatuchak, Bangkok
10900, Thailand
[email protected]
ABSTRACT
Hashing is powerful tool and widely used for flow-based load
balancing schemes in parallel processing. In this paper, we analyze and
compare computing overhead and load dispersion characteristics of hash
strategies using XOR and CRC operations under four hashing key schemes
(from 2-tuple to 5-tuple). We conduct experiments with real-life 24-hour
campus network traffic. The results show that XOR32 has the lowest
computing overhead among all hash function groups. Moreover, XOR32
with 4-tuple and XOR32 with 5-tuple are the two outstanding strategies that
provide very good uniform distribution of traffic across multiple links, thus
achieving better load balancing for flow-based applications.
Keywords: Hashing, Hash Functions, Flow-based, Load Balancing
1. INTRODUCTION
In today’s context, network speed has been continuously increasing to
a high gigabit per second rate, while processor and memory speeds have not
260 International Journal of Electronic Commerce Studies
2. RELATED WORK
Hashing offers a simple stateless solution for load balancing by
maintaining a flow with correct packet ordering over a specific link5. XOR
and CRC are among well-known hash functions. Although neither of them
are novel, they can provide high uniformity and low cost6. However,
balancing load in practical cases may not always be perfect due to rapidly
varying and unpredictable traffic patterns7. Cao et al.8 simulated
performance of several hash functions and showed that CRC16 provides the
best performance tradeoff. However, the trace period was too short to
adequately represent actual load characteristics under modern traffic. In
contrast to Cao’s works, Detal et al.9 reported that CRC16 gave a rather
poor packet distribution. Similarly, our experiments showed that CRC16 has
worse performance compared to XOR16 and XOR32, however their
distribution characteristics have no significant difference.
Guang et al.10 compared the uniformity of distribution and computation
speed of the IP Shift-XOR (IPSX) hash algorithm, the Bob algorithm based
on XOR, and the CRC32 algorithm. The IPSX offered fastest execution
time with good uniformity. Jiang et al.11 proposed Fowler-Noll-Vo (FNV)
Surasak Sanguanpong, Witsarut Pittayapitak, and Kasom Koht-Arsa 261
hash for balancing SIP server clusters. However, our evaluation indicated
that FNV has poorer performance compared to XOR and CRC. In this
paper, we complement prior works by analyzing six differences of the XOR
and CRC hashing schemes (XOR8, XOR16, XOR32, CRC8, CRC16, and
CRC32) when applied to flow based load balancing. Each hashing scheme
is computed with four hashing keys from 2-tuple to 5-tuple based on a
combination of packet header fields.
PE0
LOAD BALANCER
PEN-1
can generate a pseudorandom number, hence it has been widely used for
hashing. If K is a key with hash table size n, and is the exclusive-OR
operator, then the XOR operation can be expressed as:
4. MEASUREMENT RESULTS
A performance of hash functions is measured in two aspects: (1)
Computing overhead time and (2) Load distribution using the Coefficient of
Variation14 (CV). The CV is a statistical value representing a ratio of the
standard deviation to the mean. The lower the CV, the lesser the variance
(better balancing) is. We run the tests using a snapshot of the Kasetsart
Surasak Sanguanpong, Witsarut Pittayapitak, and Kasom Koht-Arsa 263
25
20
15
10
5
0
6 8 10 12 14
Size of hash key (byte)
(a) IPv4
180 XOR8
XOR16
160 XOR32
CRC8
140 CRC16
120 CRC32
Time (ns)
100
80
60
40
20
0
30 32 34 36 38
Figure 2. Computing overhead of hashing for (a) IPv4 and (b) IPv6
264 International Journal of Electronic Commerce Studies
4.2 Distribution
We select XOR32 and CRC32 as the representative hash functions due
to their minimal computational overhead. We compare load dispersion of
four hashing key schemes in three distribution scenarios using CV against a
number of nodes, ranging from 2 to 256. The 256-buckets size is selected in
order to exploit the fast memory access of the CPU’s L1 cache. The
following are observations from the experimental results:
Packet Distribution: Figures 3(a) and 3(b) show that 3, 4, and 5-tuples
deliver very good packet distribution in the same class. The 2-tuple
distinctly performs the worst.
0.12
2-tuple
3-tuple
Coffiicient of Variation
0.1 4-tuple
5-tuple
0.08
0.06
0.04
0.02
0
2 4 8 16 32 64 128 256 #Node
(a) XOR32
0.12
2-tuple
3-tuple
Coffiicient of Variation
0.1 4-tuple
5-tuple
0.08
0.06
0.04
0.02
0
2 4 8 16 32 64 128 256 #Node
(b) CRC32
Size Distribution: XOR32 and CRC32 show similar results for the
distributions by byte counts as illustrated in Figures 4(a) and 4(b). The
hashing key schemes with 4 and 5-tuples show smooth distribution in the
Surasak Sanguanpong, Witsarut Pittayapitak, and Kasom Koht-Arsa 265
same class, while the 2-tuple again shows poorer distribution for the large
number of nodes.
Flow Distribution: XOR32 and CRC32 show similar results for the
flow distributions as illustrated in Figures 5(a) and 5(b). The hashing key
schemes with 3, 4 and 5-tuples have very identical distribution
characteristics and show excellent flow distribution. The 2-tuple performs
extremely poorly for almost every number of nodes, since load distribution
is based only on the IP addresses. We conclude that good distribution can be
achieved for k-tuple, when k >3
0.14
2-tuple
0.12 3-tuple
4-tuple
Cofficient of Variation
5-tuple
0.1
0.08
0.06
0.04
0.02
0
2 4 8 16 32 64 128 256 #Node
(a) XOR32
0.14
2-tuple
3-tuple
0.12 4-tuple
Cofficient of Variation
5-tuple
0.1
0.08
0.06
0.04
0.02
0
2 4 8 16 32 64 128 256 #Node
(b) CRC32
0.07
2-tuple
3-tuple
0.06
4-tuple
Cofficient of Variation
5-tuple
0.05
0.04
0.03
0.02
0.01
0
2 4 8 16 32 64 128 256 #Node
(a) XOR32
0.07
2-tuple
3-tuple
0.06
4-tuple
Cofficient of Variation
5-tuple
0.05
0.04
0.03
0.02
0.01
0
2 4 8 16 32 64 128 256 #Node
(b) CRC32
5. CONCLUSION
We present experimental results of computational overhead for
XOR/CRC hash functions and their load distribution characteristics under
four hashing key schemes. Our experiments show that XOR32 has the
lowest computational overhead, followed by XOR16. We found that for
every hash function, the 4-tuple and the 5-tuple show the best load
distribution, respectively. The combination of XOR32 with 4-tuple key is
the best candidate strategy to provide excellent random hash indices with
minimal computational overhead
6. REFERENCES
[1] K. Nam-Uk, S. Jung, and T. Chung, An efficient hash-based load
balancing scheme to support parallel NIDS. Lecture Notes in Computer
Science, 6782, p537-549, 2011.
Surasak Sanguanpong, Witsarut Pittayapitak, and Kasom Koht-Arsa 267
http://dx.doi.org/10.1007/978-3-642-21928-3_39.
[2] K. Koht-arsa, and S. Sanguanpong, A centralized state repository
approach to highly scalable and high-availability parallel firewall.
Journal of Computers 8(7), p1664-1676, 2013.
http://dx.doi.org/10.4304/jcp.8.7.1664-1676.
[3] K. Koht-arsa, and S. Sanguanpong, High availability and scalable
parallel stateful firewall design. Presented at the International
Conference on Internet Studies, Bangkok, August 17-19, 2012.
[4] P. N. Ayuso, R. M. Gasca, and L. Lefevre, Demystifying cluster-based
fault-tolerant firewalls. Internet Computing, 13(6), p31-38, 2009.
http://dx.doi.org/10.1109/MIC.2009.128.
[5] S. Prabhavat, H. Nishiyama, N. Ansari, and N. Kato, On load
distribution over multipath networks. IEEE Communications Surveys
and Tutorials, 14(3), p662-680, 2011.
http://dx.doi.org/10.1109/SURV.2011.082511.00013.
[6] B. Xiong, K. Yang, F. Li1, X. Chen, J. Zhang, Q. Tang, and Y. Luo, The
impact of bitwise operators on hash uniformity in network packet
processing. International Journal of Communication Systems, 27(11),
p3158-3184, 2014. http://dx.doi.org/10.1002/dac.2532.
[7] M. Molina, S. Niccolini, and N.G. Duffield, A comparative
experimental study of hash functions applied to packet sampling.
Presented at the 19th International Teletraffic Congress, Beijing,
August 29-September 2, 2005.
[8] Z. Cao, Z. Wang, and E. Zegura, Performance of hashing-based
schemes for Internet load balancing. In F. Bauer (Ed.), Proceedings of
the Annual Joint Conference of the IEEE Computer and
Communications Societies (vol.1) (p332-341). Tel Aviv: IEEE Press,
2000. http://dx.doi.org/10.1109/INFCOM.2000.832203.
[9] G. Detal, C. Paasch, S. Linden, P. Mal’rindol, G. Avoine, and O.
Bonaventure, Revisiting flow-based load balancing: Stateless path
selection in data center networks. Computer Networks, 57(5),
p1204-1216, 2013. http://dx.doi.org/10.1016/j.comnet.2012.12.011.
[10] C. Guang, Z. Wei, and G. Jian, XOR hashing algorithms to measured
flows at the high-speed link. In B. Werne (Ed.), Proceedings of the
International Conference on Future Generation Communication and
Networking (vol. 1) (p152-155). Hainan Island: IEEE Press, 2008.
http://dx.doi.org/10.1109/FGCN.2008.110.
[11] H. Jiang, H. Jiang, A. Iyengar, E. Nahum, W. Segmuller, A. Tantawi,
and C. Wright, Design, implementation, and performance of a load
balancer for SIP server clusters. IEEE Transactions on Networking,
20(4), p1190-1202, 2012.
http://dx.doi.org/10.1109/TNET.2012.2183612.
[12] A. Doering, and M. Waldvogel, Fast and flexible CRC calculation.
268 International Journal of Electronic Commerce Studies