Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining a node after committing a plan - transfers freeze & cluster state is stuck #996

Open
martinsumner opened this issue Jan 5, 2023 · 0 comments

Comments

@martinsumner
Copy link
Contributor

To replicate:

  • Start six nodes
  • Join three nodes to the first, but not nodes 5 and 6
  • Plan/Commit cluster chnages
  • attempt to join nodes 5 and 6 (i.e. without planning - just riak admin cluster join )

The transfers stop at the point the additional nodes join - and the cluster ends up stuck in that state:

dev/dev4/riak/bin/riak admin cluster plan
=============================== Staged Changes ================================
Action         Details(s)
-------------------------------------------------------------------------------
join           'dev2@127.0.0.1'
join           'dev3@127.0.0.1'
join           'dev4@127.0.0.1'
-------------------------------------------------------------------------------


NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
                         After cluster transition 1/1
###############################################################################

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid     100.0%     25.0%    dev1@127.0.0.1
valid       0.0%     25.0%    dev2@127.0.0.1
valid       0.0%     25.0%    dev3@127.0.0.1
valid       0.0%     25.0%    dev4@127.0.0.1
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Transfers resulting from cluster changes: 48
  16 transfers from 'dev1@127.0.0.1' to 'dev4@127.0.0.1'
  16 transfers from 'dev1@127.0.0.1' to 'dev3@127.0.0.1'
  16 transfers from 'dev1@127.0.0.1' to 'dev2@127.0.0.1'

$ dev/dev4/riak/bin/riak admin cluster commit
Cluster changes committed
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+------+-------+-----+-------+
|        node        |status| avail |ring |pending|
+--------------------+------+-------+-----+-------+
| (C) dev1@127.0.0.1 |valid |  up   |100.0|  25.0 |
|     dev2@127.0.0.1 |valid |  up   |  0.0|  25.0 |
|     dev3@127.0.0.1 |valid |  up   |  0.0|  25.0 |
|     dev4@127.0.0.1 |valid |  up   |  0.0|  25.0 |
+--------------------+------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev5/riak/bin/riak admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev5@127.0.0.1' to 'dev1@127.0.0.1'
$ dev/dev6/riak/bin/riak admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev6@127.0.0.1' to 'dev1@127.0.0.1'
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: false

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected


$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin transfers
'dev6@127.0.0.1' waiting to handoff 30 partitions
'dev5@127.0.0.1' waiting to handoff 30 partitions
'dev4@127.0.0.1' waiting to handoff 27 partitions
'dev3@127.0.0.1' waiting to handoff 30 partitions
'dev2@127.0.0.1' waiting to handoff 11 partitions

Active Transfers:


$ dev/dev4/riak/bin/riak admin transfers
'dev6@127.0.0.1' waiting to handoff 30 partitions
'dev5@127.0.0.1' waiting to handoff 30 partitions
'dev4@127.0.0.1' waiting to handoff 27 partitions
'dev3@127.0.0.1' waiting to handoff 30 partitions
'dev2@127.0.0.1' waiting to handoff 11 partitions

Active Transfers:


@martinsumner martinsumner changed the title Joining a node after committing a plan - plan will freeze Joining a node after committing a plan - transfers freeze & cluster state is stuck Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant