Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RDMA instead of TCP for state transfer #157

Open
etremel opened this issue May 7, 2020 · 3 comments
Open

Use RDMA instead of TCP for state transfer #157

etremel opened this issue May 7, 2020 · 3 comments

Comments

@etremel
Copy link
Contributor

etremel commented May 7, 2020

State-transfer operations in Derecho (i.e. the transfer of a serialized Replicated Object from one node to another) currently use a set of TCP sockets that are stored in ViewManager. Originally we used this design because state transfer was part of the initial setup of adding a node to a group, and we wouldn't add the new node into SST or RDMC until after it had finished joining the group. However, now that we can create peer-to-peer RDMA connections independently of the main RDMC/SST multicast groups, the state transfer operation could be done with a peer-to-peer RDMA connection.

There are two options for how we could implement this:

  1. Use the existing P2PConnections in RPCManager to send each Replicated Object. This would be easier for transferring state between existing members that have been re-assigned from one subgroup/shard to another, but would require some refactoring in order to work for new members. Right now, a P2PConnection to a new member is set up in RPCManager's new_view_callback, which does not get called until after state transfer is complete (at the end of ViewManager::finish_view_change). Also, for both new and existing members, ViewManager would need to have access to RPCManager in order to get the P2PConnections, or the P2PConnections object would have to be shared between them (in the way the tcp_connections used to be shared).
  2. Create a new P2PConnection, or something similar, that only exists during the state transfer process between nodes that need to do state transfer. This might introduce some redundancy and add the overhead of another RDMA queue pair, but it would also be less destructive to our existing setup of RPCManager and P2PConnections. It would also allow us to customize the P2P RDMA connection that we use for state transfer to have a different buffer size than the one we use for peer-to-peer RPC messages, which might be necessary if the Replicated Objects we're transferring are much larger than the maximum message size for an RPC.

I think option 2 is slightly better, but since I didn't implement P2PConnections, I'm not confident in that opinion. Whatever we decide, though, this seems like a good opportunity for speeding up the process of restarting or joining a group, since transferring a lot of state over a TCP socket is an obvious bottleneck.

(Note that this issue is a specialization of #118)

@KenBirman
Copy link
Contributor

KenBirman commented May 7, 2020 via email

etremel added a commit that referenced this issue May 14, 2020
Since it's not likely that we will completely eliminate TCP sockets from
ViewManager any time soon (see issues #118 and #157), we should at least
make our usage of TCP less confusing. The port named "rpc_port" in all
of our configuration files is actually not used for RPC operations at
all, but for transferring Views and object state between nodes during a
view change. Renaming this port will make it clear that there is no RPC
activity going over TCP.
etremel added a commit that referenced this issue May 15, 2020
Since it's not likely that we will completely eliminate TCP sockets from
ViewManager any time soon (see issues #118 and #157), we should at least
make our usage of TCP less confusing. The port named "rpc_port" in all
of our configuration files is actually not used for RPC operations at
all, but for transferring Views and object state between nodes during a
view change. Renaming this port will make it clear that there is no RPC
activity going over TCP.
@KenBirman
Copy link
Contributor

Issue seems to be resolved at this point, with the work Edward did on v2.0

@etremel
Copy link
Contributor Author

etremel commented Jul 1, 2020

Actually this is not resolved yet in version 2.0. As you can see in the latest version of Group::receive_objects, we still receive state data over a TCP socket:

LockedReference<std::unique_lock<std::mutex>, tcp::socket> leader_socket

@etremel etremel reopened this Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants