HBase write throughput as a function of number of column qualifiers
In Hbase, every cell value is stored along with all its cardinalities as follows,
rowkey:columnfamily:columnqualifier:timestamp:value
Hypothetically, let us assume the following
Data payload size = 10 kb
rowkey size = 64 kb
columnfamily:columnname size = 60 kb
In order to write a row with say 2 columns, the total amount of bytes transferred and written will be
2 * ( 5kb + 64 kb + 60 kb) = 258 kb (Total 10kb of payload split between two columns)
In order to write a row with say 1 column, the total will be
1 * (10 + 64 + 60) = 134 kb.
Larger the size, more data transfer across network, memstore will get full more often and hence will need more flush. This will negatively impact write throughput.
Verfiying this behaviour using HBase Load Testing tool,
Summary
- Rows : 10k 10k 10K
- Columns : 2 5 10
- PayLoad : 512 kb 200 kb 100 kb
- Total PayLoad : ~1000 kb 1000 kb 1000 kb
- Throughput : 405 Keys/s 252 keys/s 175 keys/s
Details
We start with 10000 rows, 2 columns with a payload of 512kb for every cell, indicated by - write 2:512:20
$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 2:512:20 -num_keys 10000
14/08/19 11:47:26 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..4
Data size per column: 256..768
Below is a log captured at 5 seconds interval, at the end of 20 seconds, we see that write throughput is 405 keys/s
Starting to write data...
14/08/19 11:47:39 INFO util.MultiThreadedAction: [W:20] Keys=1663, cols=5.6 K, time=00:00:05 Overall: [keys/s= 332, latency=58 ms] Current: [keys/s=332, latency=58 ms], wroteUpTo=-1
14/08/19 11:47:44 INFO util.MultiThreadedAction: [W:20] Keys=3641, cols=12.3 K, time=00:00:10 Overall: [keys/s= 361, latency=54 ms] Current: [keys/s=395, latency=51 ms], wroteUpTo=-1
14/08/19 11:47:49 INFO util.MultiThreadedAction: [W:20] Keys=5769, cols=19.5 K, time=00:00:15 Overall: [keys/s= 382, latency=51 ms] Current: [keys/s=425, latency=46 ms], wroteUpTo=-1
14/08/19 11:47:54 INFO util.MultiThreadedAction: [W:20] Keys=8128, cols=27.6 K, time=00:00:20 Overall: [keys/s= 405, latency=49 ms] Current: [keys/s=471, latency=42 ms], wroteUpTo=-1
Failed to write keys: 0
We do it again with 5 columns, 200kb payload and 10k rows
$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 5:200:20 -num_keys 10000
14/08/19 14:38:20 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..10
Data size per column: 100..300
.
.
Starting to write data...
14/08/19 14:38:31 INFO util.MultiThreadedAction: [W:20] Keys=901, cols=5.6 K, time=00:00:05 Overall: [keys/s= 180, latency=106 ms] Current: [keys/s=180, latency=106 ms], wroteUpTo=-1
14/08/19 14:38:36 INFO util.MultiThreadedAction: [W:20] Keys=1979, cols=12.4 K, time=00:00:10 Overall: [keys/s= 197, latency=99 ms] Current: [keys/s=215, latency=92 ms], wroteUpTo=-1
14/08/19 14:38:41 INFO util.MultiThreadedAction: [W:20] Keys=3070, cols=19.3 K, time=00:00:15 Overall: [keys/s= 204, latency=96 ms] Current: [keys/s=218, latency=91 ms], wroteUpTo=-1
14/08/19 14:38:46 INFO util.MultiThreadedAction: [W:20] Keys=4367, cols=27.7 K, time=00:00:20 Overall: [keys/s= 218, latency=90 ms] Current: [keys/s=259, latency=77 ms], wroteUpTo=-1
14/08/19 14:38:51 INFO util.MultiThreadedAction: [W:20] Keys=5857, cols=36.9 K, time=00:00:25 Overall: [keys/s= 234, latency=84 ms] Current: [keys/s=298, latency=66 ms], wroteUpTo=-1
14/08/19 14:38:56 INFO util.MultiThreadedAction: [W:20] Keys=7373, cols=46.4 K, time=00:00:30 Overall: [keys/s= 245, latency=80 ms] Current: [keys/s=303, latency=65 ms], wroteUpTo=-1
14/08/19 14:39:01 INFO util.MultiThreadedAction: [W:20] Keys=8843, cols=55.7 K, time=00:00:35 Overall: [keys/s= 252, latency=78 ms] Current: [keys/s=294, latency=67 ms], wroteUpTo=-1
Failed to write keys: 0
As seen above, the write throughput has reduced to 252 keys/s.
Further increasing the number of columns to 10, with 100K payload, the write throughput is reduced to 175 keys/s
$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 10:100:20 -num_keys 10000
14/08/19 14:34:54 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..20
Data size per column: 50..150
Starting to write data...
14/08/19 14:35:07 INFO util.MultiThreadedAction: [W:20] Keys=582, cols=6.4 K, time=00:00:05 Overall: [keys/s= 116, latency=168 ms] Current: [keys/s=116, latency=168 ms], wroteUpTo=-1
14/08/19 14:35:12 INFO util.MultiThreadedAction: [W:20] Keys=1157, cols=13.0 K, time=00:00:10 Overall: [keys/s= 115, latency=171 ms] Current: [keys/s=115, latency=173 ms], wroteUpTo=-1
14/08/19 14:35:17 INFO util.MultiThreadedAction: [W:20] Keys=1884, cols=21.0 K, time=00:00:15 Overall: [keys/s= 125, latency=158 ms] Current: [keys/s=145, latency=137 ms], wroteUpTo=-1
14/08/19 14:35:22 INFO util.MultiThreadedAction: [W:20] Keys=2687, cols=30.0 K, time=00:00:20 Overall: [keys/s= 134, latency=147 ms] Current: [keys/s=160, latency=123 ms], wroteUpTo=-1
14/08/19 14:35:27 INFO util.MultiThreadedAction: [W:20] Keys=3558, cols=39.8 K, time=00:00:25 Overall: [keys/s= 142, latency=139 ms] Current: [keys/s=174, latency=115 ms], wroteUpTo=-1
14/08/19 14:35:32 INFO util.MultiThreadedAction: [W:20] Keys=4513, cols=50.5 K, time=00:00:30 Overall: [keys/s= 150, latency=132 ms] Current: [keys/s=191, latency=104 ms], wroteUpTo=-1
14/08/19 14:35:37 INFO util.MultiThreadedAction: [W:20] Keys=5410, cols=60.5 K, time=00:00:35 Overall: [keys/s= 154, latency=128 ms] Current: [keys/s=179, latency=111 ms], wroteUpTo=-1
14/08/19 14:35:42 INFO util.MultiThreadedAction: [W:20] Keys=6322, cols=70.8 K, time=00:00:40 Overall: [keys/s= 157, latency=126 ms] Current: [keys/s=182, latency=109 ms], wroteUpTo=-1
14/08/19 14:35:47 INFO util.MultiThreadedAction: [W:20] Keys=7280, cols=81.8 K, time=00:00:45 Overall: [keys/s= 161, latency=123 ms] Current: [keys/s=191, latency=104 ms], wroteUpTo=-1
14/08/19 14:35:52 INFO util.MultiThreadedAction: [W:20] Keys=8496, cols=95.6 K, time=00:00:50 Overall: [keys/s= 169, latency=117 ms] Current: [keys/s=243, latency=82 ms], wroteUpTo=-1
14/08/19 14:35:57 INFO util.MultiThreadedAction: [W:20] Keys=9632, cols=108.1 K, time=00:00:55 Overall: [keys/s= 175, latency=113 ms] Current: [keys/s=227, latency=87 ms], wroteUpTo=-1
Failed to write keys: 0