Fixing some remaining riak-admin relics #1886

JMercerGit · 2024-03-08T13:12:51Z

Making some changes to some outdated riak-admin outputs that would be better suited to be consistent with the new format riak admin

PR now accepted into redbug, so return to master. Changes to pass eqc tests on OTP 24. Also all eunit tests passing on OTP 25.1.1

Add cmake to allow build of snappy

* Bound reap/erase queues Also don't log queues on a crash - avoid over-sized crash dumps * Create generic behaviour for eraser/reaper Have the queue backed to disk, so that beyond a certain size it overflows from memory to disk (and once the on disk part is consumed from the queue the files are removed). * Stop cleaning folder Risk of misconfiguration leading to wiping of wrong data. Also starting a job may lead to the main process having its disk_log wiped. * Setup folders correctly Avoid enoent errors * Correct log * Further log correction * Correct cleaning of directories for test * Switch to action/2 * Update eqc tests for refactoring of reaper/eraser * Improve comments/API * Pause reaper on overload Check for a soft overload on any vnode before reaping. This will add some delay - but reap has been show to potentially overload a cluster ... availability is more important than reap speed. There is no catch for {error, mailbox_overload} should it occur - it should not as the mailbox check should prevent it. If it does, the reaper will crash (and restart without any reaps) - return to a safe known position. * Adjustments following review The queue file generated, are still in UUID format, but the id now incorporates creation date. This should make it easier to detect and clean any garbage that might be accrued. One use case where '++' is used has also been removed (although a lists:flatten/1 was still required at the end in this case) # Conflicts: # priv/riak_kv.schema # rebar.config # src/riak_kv_overflow_queue.erl # src/riak_kv_reaper.erl # src/riak_kv_replrtq_src.erl # src/riak_kv_test_util.erl

See basho#1804 The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes. Default behaviour is to always fallback to configured behaviour. Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes. # Conflicts: # rebar.config # src/riak_kv_replrtq_peer.erl # src/riak_kv_replrtq_snk.erl # src/riak_kv_replrtq_src.erl

To simplify the configuration, rather than have the operator select all_check day_check range_check etc, there is now a default strategy of auto_check which tries to do a sensible thing: do range_check when a range is set, otherwise do all_check if it is out of hours (in the all_check window), otherwise do day_check Some stats added to help with monitoring, the detail is still also in the console logs. See basho#1815 # Conflicts: # src/riak_kv_stat.erl # src/riak_kv_ttaaefs_manager.erl

Expand on use of riak_kv_overflow_queue so that it is used by the riak_kv_replrtq_src, as well as riak_kv_reaper and riak_kv_eraser. This means that larger queue sizes can be supported for riak_kv_replrtq_src without having to worry about compromising the memory of the node. This should allow for repl_keys_range AAE folds to generate very large replication sets, without clogging the node worker pool by pausing so that real-time replication can keep up. The overflow queues are deleted on shutdown (if there is a queue on disk). The feature is to allow for larger queues without memory exhaustion, persistence is not used to persist queues across restarts. Overflow Queues extended to include a 'reader' queue which may be used for read_repairs. Currently this queue is only used for the repair_keys_range query and the read-repair trigger. # Conflicts: # priv/riak_kv.schema # src/riak_kv_overflow_queue.erl # src/riak_kv_replrtq_src.erl

Introduces a reader overflowq for doing read repair operations. Initially this is used for: - repair_keys_range aae_fold - avoids the pausing of the fold that would block the worker pool; - repair on key_amnesia - triggers the required repair rather than causing an AAE delta; - repair on finding a corrupted object when folding to rebuild aae_store - previously the fold would crash, and the AAE store would therefore never be rebuilt. [This PR](martinsumner/kv_index_tictactree#106) is required to make this consistent in both AAE solutions. # Conflicts: # priv/riak_kv.schema # rebar.config # src/riak_kv_clusteraae_fsm.erl # src/riak_kv_index_hashtree.erl # src/riak_kv_vnode.erl

Merges removed the stat updates for ttaae full-sync (detected by riak_test). A log had been introduced in riak_kv_replrtq_peer what could crash (detected by riak_test). The safety change to avoid coordination in full-sync by setting time for first work item from beginning of next hour, makes sense with 24 slices (one per hour) ... but less sense with different values. riak_test which uses a very high slice_count to avoid delays then failed. # Conflicts: # src/riak_kv_replrtq_peer.erl # src/riak_kv_ttaaefs_manager.erl

# Conflicts: # rebar.config

Fixing some remaining riak-admin relics that should be `riak admin`

Fixing some remaining riak-admin relics

Andrei Zavada and others added 17 commits April 28, 2022 16:39

explicit add_path in schema, for cuttlefish on osx which is funky

6666f65

tmp switch to TI-Tokyo/riak_core (with aarch64-aware eleveldb)

09d9356

Merge branch 'basho:develop' into develop

88678ad

Update to mainstream redbug

6e5a091

PR now accepted into redbug, so return to master. Changes to pass eqc tests on OTP 24. Also all eunit tests passing on OTP 25.1.1

Update erlang.yml (basho#1834)

125fe91

Add cmake to allow build of snappy

repoint most external deps (all except hyper) to hex.pm

5458e79

Tags for release

8df351f

# Conflicts: # rebar.config

rebase oops

1d003a4

correctly name and refer to riakhhtpc app

d360020

Fixing some remaining riak-admin relics

7613888

Fixing some remaining riak-admin relics that should be `riak admin`

Merge pull request #2 from JMercerGit/JMDeveop-RiakAdminchanges

858b0a5

Fixing some remaining riak-admin relics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing some remaining riak-admin relics #1886

Fixing some remaining riak-admin relics #1886

JMercerGit commented Mar 8, 2024

Fixing some remaining riak-admin relics #1886

Are you sure you want to change the base?

Fixing some remaining riak-admin relics #1886

Conversation

JMercerGit commented Mar 8, 2024