Reduce getNodeByQuery overhead #13221

fcostaoliveira · 2024-04-17T13:08:07Z

The following PR does the following changes based upon on CPU profile info. The getNodeByQuery function represents 8.2% of an overhead of 12.3% when comparing single shard cluster with standalone.
Proposed changes:

inlinging keyHashSlot to reduce overhead of that function call
Reduce duplicate calls to getCommandFlags within getNodeByQuery
moving crc16 to header file. inlining crc16 to reduce overhead of that function call

The above changes represent an improvement of approximately 5% on the achievable ops/sec as showcase bellow.

results

steps to reproduce

2 Nodes.
DB node (swap ip) :

taskset -c 0 ./src/redis-server --save '' --cluster-announce-ip 192.168.1.200 --bind 192.168.1.200 --protected-mode no --requirepass perf --daemonize yes --cluster-enabled yes --logfile redis.log
redis-cli -h 192.168.1.200 -a perf flushall
redis-cli -h 192.168.1.200 -a perf cluster flushslots
redis-cli -h 192.168.1.200 -a perf cluster addslotsrange 0 16383 
redis-cli -h 192.168.1.200 -a perf cluster info

Client node:

taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram

results unstable (`804110a`)

root@micro1:~# taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   107876070 ops,  599582 (avg:  599300) ops/sec, 83.99MB/sec (avg: 83.95MB/sec),  0.83 (avg:  0.83) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       599300.19          ---          ---         0.00         0.00         0.83133         0.82300         1.25500         3.07100     85967.12 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     599300.19         0.00         0.00         0.00         0.00         0.83133         0.82300         1.25500         3.07100     85967.12

results this PR (`7e1d2ea`)

root@micro1:~# taskset -c 0,1 memtier_benchmark --server 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   112251930 ops,  618650 (avg:  623611) ops/sec, 86.66MB/sec (avg: 87.36MB/sec),  0.80 (avg:  0.80) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       623611.19          ---          ---         0.00         0.00         0.79897         0.79100         1.20700         2.76700     89454.70 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     623611.19         0.00         0.00         0.00         0.00         0.79897         0.79100         1.20700         2.76700     89454.70

…t function call

jhershberg · 2024-04-18T03:23:04Z

src/cluster.h

+    for (e = s+1; e < keylen; e++)
+        if (key[e] == '}') break;
+
+    /* No '}' or nothing between {} ? Hash the whole key. */


While we're at it, do we want to put an "unlikely" here? I am far from an expert, just asking.

will to the test of the impact on the results and reply back.

…after defragment (redis#13231) Introducted by redis#13013 After defragmenting the dictionary in the kvstore, if the dict is reallocated, the value of its node in the kvstore rehashing list must be updated.

For my mistake, in the last revert commit in redis#13231, I originally wanted to revert the last one, but reverted the penultimate fix. Now that we have fix another potential memory read issue in [`743f1dd` (redis#13231)](redis@743f1dd), now it just seems to avoid confusion, i will verify in the future whether it will have any impact, if so we will add this PR to backport. Failed CI: https://github.com/sundb/redis/actions/runs/8826731960

Because it does not cause any propagation (arguably it should, see the comment in the tcl file) The motivation for this fix is that in 6.2 if dirty changed without propagation inside MULTI/EXEC it would cause propagation of EXEC only, which would result in the replica sending errors to its master

added reverse history search to redis-cli, use it with the following: * CTRL+R : enable search backward mode, and search next one when pressing CTRL+R again until reach index 0. ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (reverse-i-search): # press CTRL+R (reverse-i-search): keys two # input `keys` (reverse-i-search): keys one # press CTRL+R again (reverse-i-search): keys one # press CTRL+R again, still `keys one` due to reaching index 0 (i-search): keys two # press CTRL+S, enable search forward (i-search): keys two # press CTRL+S, still `keys one` due to reaching index 1 ``` * CTRL+S : enable search forward mode, and search next one when pressing CTRL+S again until reach index 0. ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (i-search): # press CTRL+S (i-search): keys one # input `keys` (i-search): keys two # press CTRL+S again (i-search): keys two # press CTRL+R again, still `keys two` due to reaching index 0 (reverse-i-search): keys one # press CTRL+R, enable search backward (reverse-i-search): keys one # press CTRL+S, still `keys one` due to reaching index 1 ``` * CTRL+G : disable ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (reverse-i-search): # press CTRL+R (reverse-i-search): keys two # input `keys` 127.0.0.1:6379> # press CTRL+G ``` * CTRL+C : disable ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (reverse-i-search): # press CTRL+R (reverse-i-search): keys two # input `keys` 127.0.0.1:6379> # press CTRL+G ``` * TAB : use the current search result and exit search mode ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (reverse-i-search): # press CTRL+R (reverse-i-search): keys two # input `keys` 127.0.0.1:6379> keys two # press TAB ``` * ENTER : use the current search result and execute the command ``` 127.0.0.1:6379> keys one 127.0.0.1:6379> keys two (reverse-i-search): # press CTRL+R (reverse-i-search): keys two # input `keys` 127.0.0.1:6379> keys two # press ENTER (empty array) 127.0.0.1:6379> ``` * any arrow key will disable reverse search your result will have the search match bolded, you can press enter to execute the full result note: I have _only added this for multi-line mode_, as it seems to be forced that way when `repl` is called Closes: redis#8277 --------- Co-authored-by: Clayton Northey <clayton@knowbl.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech> Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>

…SP3 in MULTI (redis#13255) This test was introducted by redis#13251. Normally we auto transform the reply format of XREADGROUP to array under RESP3 (see trasformer_funcs). But when we execute XREADGROUP command in multi it can't work, which cause the new test failed. The solution is to verity the reply of XREADGROUP in advance rather than in MULTI. Failed validate schema CI: https://github.com/redis/redis/actions/runs/9025128323/job/24800285684 --------- Co-authored-by: guybe7 <guy.benoish@redislabs.com>

…an bug (redis#13150)" (redis#13266) The kernel config `vm.mmap_rnd_bits` had been revert in actions/runner-images#9491, so we can revert the changes from redis#13150. CI only with ASAN passed: https://github.com/sundb/redis/actions/runs/9058263634

…ommand (redis#13276) This PR is based on the commits from PR redis#12944. Allow SPUBLISH command within multi/exec on replica Behavior on unstable: ``` 127.0.0.1:6380> CLUSTER NODES 39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected 8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC (error) MOVED 866 127.0.0.1:6379 ``` With this change: ``` 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC 1) (integer) 0 ``` --------- Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: oranagra <oran@redislabs.com>

`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which is logging the error assumed it did. This makes sure `errno` is set. Fixes redis#13245 Signed-off-by: Ted Lyngmo <ted@lyncon.se>

filipecosta90 and others added 5 commits April 15, 2024 23:08

Reduce duplicate calls to getCommandFlags within getNodeByQuery

4a6773d

inlinging keyHashSlot to reduce overhead of that function call

db2dde7

moving crc16 to header file. inlining crc16 to reduce overhead of tha…

7f3b581

…t function call

removed declaration of crc16 on server.h

9dab4d9

Merge branch 'redis:unstable' into improve.getNodeByQuery

7e1d2ea

fcostaoliveira changed the title ~~Improve.get node by query~~ Reduce getNodeByQuery overhead Apr 17, 2024

jhershberg reviewed Apr 18, 2024

View reviewed changes

sundb and others added 8 commits May 28, 2024 11:10

Fix forget to update the dict's node in the kvstore's rehashing list …

4580a19

…after defragment (redis#13231) Introducted by redis#13013 After defragmenting the dictionary in the kvstore, if the dict is reallocated, the value of its node in the kvstore rehashing list must be updated.

Log the real reason for why posix_fadvise failed (redis#13246)

1d6d0cf

`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which is logging the error assumed it did. This makes sure `errno` is set. Fixes redis#13245 Signed-off-by: Ted Lyngmo <ted@lyncon.se>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce getNodeByQuery overhead #13221

Reduce getNodeByQuery overhead #13221

fcostaoliveira commented Apr 17, 2024 •

edited

jhershberg Apr 18, 2024

fcostaoliveira Apr 22, 2024

Reduce getNodeByQuery overhead #13221

Are you sure you want to change the base?

Reduce getNodeByQuery overhead #13221

Conversation

fcostaoliveira commented Apr 17, 2024 • edited

results

steps to reproduce

results unstable (804110a)

results this PR (7e1d2ea)

jhershberg Apr 18, 2024

Choose a reason for hiding this comment

fcostaoliveira Apr 22, 2024

Choose a reason for hiding this comment

fcostaoliveira commented Apr 17, 2024 •

edited

results unstable (`804110a`)

results this PR (`7e1d2ea`)