Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TCP for protocol messages #3242

Open
softins opened this issue Feb 27, 2024 · 20 comments
Open

Support TCP for protocol messages #3242

softins opened this issue Feb 27, 2024 · 20 comments
Assignees
Labels
feature request Feature request
Milestone

Comments

@softins
Copy link
Member

softins commented Feb 27, 2024

What is the current behaviour and why should it be changed?

All Jamulus protocol (non-audio) messages are currently delivered over the same UDP channel as the audio. For most protocol messages, this is fine, but those that send a list of servers from a directory, or a list of clients from a server, can generate a UDP datagram that is too large to fit into a single physical packet. Physical packets are constrained by the MTU of the Ethernet interface (normally 1500 bytes or less), and further by any limitations in links between hops on the internet. Neither the client nor the server has any control over these limitation. It's also possible a large welcome message could require fragmentation.

The UDP protocol itself allows datagrams up to be up to nearly 65535 bytes in size, minus any protocol overhead. IPv4 will allow nearly all of this size to be used, in theory. If the IPv4 datagram being sent by a node (host or router) is too large to fit into a single packet on the outgoing interface, the IP protocol will fragment the packet into pieces that do fit, with IP headers that contain the information needed to order and reassemble the fragments into a single datagram at the receiving end. Normally intermediate hops do not perform any reassembly, but will further fragment an IP packet if it will not fit the MTU of the outgoing interface.

The receiving end needs to store all the received fragments as they arrive and can only reassemble them into the original datagram once all fragments have been received. The loss of even one fragment renders the whole datagram lost, and the remaining received fragments consume resources until they time out and are discarded. There are also possibilities for a denial of service attack if an attacker deliberately sends lots of fragments with one or more missing.

If a directory has more than around 35 servers registered (depending on the length of the name, city, etc.), the list of servers sent to a client when requested is certain to be fragmented. Similarly, if a powerful server has a lot of clients connected, e.g. a big band or large choir, the list of clients sent to each connected client can get fragmented. In either of these cases, a client that is unable to receive fragmented IP packets will show an empty list or an empty mixer panel.

There are several reasons that fragmented IP datagrams can fail to make it from server to client:

  • The configuration of a user's router, either accidentally or deliberately. Sometimes a user can be helped by a knowledgeable friend to check and fix this, but often not.
  • The configuration of an intermediate router along the path from server to client. This is fairly rare, but could be a carrier's deliberate choice to avoid the kind of DoS attack mentioned above. For whatever reason, it is outside the control of the user or server operator.
  • The IPv6 protocol deliberately has no provision for fragmentation of datagrams at the IP layer. So this is a complete show-stopper for the use of IPv6 in directories, as there is therefore no support at all for large UDP messages.

The IPv6 limitation means that resolving this issue is a prerequisite to implementing IPv6 support in directories as per the ongoing discussion in https://github.com/orgs/jamulussoftware/discussions/1950.

Describe possible approaches

There is a longstanding discussion at https://github.com/orgs/jamulussoftware/discussions/1058 about the problems this issue is intended to solve, and mentioning various approaches that have been tried or proposed.

  • Limiting the size of directories. This doesn't go far enough, and as mentioned above, a directory needs to be really small (less than 30 or so servers) to be sure of avoiding fragmentation.
  • Implementing "split" messages at the Jamulus protocol level using REQ_SPLIT_MESS_SUPPORT, SPLIT_MESS_SUPPORTED and SPECIAL_SPLIT_MESSAGE. I'm not sure whether such split messages are ever used in practice, and it appears that they only apply to connected messages, not the connectionless messages which are most at risk from fragmentation. In addition, the size of split parts is fixed, and not intelligently determined from any kind of path MTU discovery.
  • Having the directory also send a "reduced" server list with only the bare information of name, IP and port (CLM_RED_SERVER_LIST). This also fails to avoid the problem, as a directory list that may take around 7 fragments in its full form still takes around 3 fragments in its reduced form.
  • I experimented with zlib compression of servers lists (https://github.com/orgs/jamulussoftware/discussions/1058#discussioncomment-8354688), but it only provides around 40% compression, not enough to avoid fragmentation.

The only possible solution is to send some protocol messages using TCP instead of UDP, when talking to a compatible client. UDP would still be available for backward compatibility when talking to older clients or older servers.

There are two kinds of protocol message that each need to be handled differently:

  • Connectionless messages CLM_*. These are unrelated to a channel (with one exception). They are mainly used by a client to fetch information for the Connect dialog:
    • List of servers from a directory (CLM_REQ_SERVER_LIST).
    • List of connected clients from a server (CLM_REQ_CONN_CLIENTS_LIST).
    • Small messages such as requests, ping, version and OS, register and unregister server. These are small enough never to need fragmentation.
  • Channel-specific messages. These need to be related to a connected channel on the server. Currently, they are identified by the IP:port of the client end.

Connectionless Messages

For connectionless messages, the client can send a TCP connection request to the server, with a timeout. If the server supports TCP, this connection will be accepted and the client can then send the CLM_REQ_* message over the TCP connection. The server needs to interpret the message and send the response back over the same TCP connection. The client can then close the connection or leave it open for sending another message (tbd). If the TCP connection from the client is refused or times out (probably due to a firewall dropping the connect request), the client can fall back to the existing UDP usage to send the request. For this reason, the TCP connection timeout will need to be short, something like 2 seconds. This will be plenty of time for a compatible server to answer.

I have a branch that implements the server side of connectionless messages over TCP, currently just for CLM_REQ_SERVER_LIST and CLM_REQ_CONN_CLIENTS_LIST, but others could be added as needed. It can be seen at https://github.com/softins/jamulus/tree/tcp-protocol. It is necessary to pass the TCP socket pointer via the function calls, signals and slots, to the point at which the response message can be sent. If this socket pointer is nullptr, the response will be send over UDP as presently, otherwise it will be sent to the referenced TCP socket.

Note that due to the variable size of Jamulus protocol messages, and the stream-oriented nature of TCP sockets, it is necessary for the receiver at each end first to read the fixed-size header (9 bytes), determine from that header the payload size, and then read the payload, plus two more bytes for the CRC.

I have tested it using a Python client, based on @passing's jamulus-python project, but enhanced to support TCP. See https://github.com/softins/jamulus-python/tree/tcp-protocol.

The next step is to add to Jamulus the client side of using TCP for connectionless messages to fetch server and client lists.

Connected Channel Messages

For connected channel messages, the situation is a little more complicated. The following factors must be considered:

  • The list of connected clients is sent to a participating client using a connected channel message (CONN_CLIENTS_LIST).
  • Each time someone else connects to or disconnects from the server, the server sends an unsolicited CONN_CLIENTS_LIST to each client that is still connected. For a large busy server with many clients, this could be a long message and subject, presently, to UDP fragmentation. It should therefore be sent if possible over TCP.
  • A server cannot initiate a TCP connection to a client. Therefore the client needs to open a TCP connection to the server at the beginning of the session, and keep the connection open continuously until leaving the session. This connection should be used by the server to send the updated client lists to the client.
  • If a client has both a TCP and a UDP connection to the server, there is no way for the server to relate the two connections just by IP and port number, as the source ports will not be related to each other. Even if the client were to bind both TCP and UDP sockets to the same local port number, they could get mapped independently to different ports by a NAT router in the path.

My proposal to solve the last point above is as follows:

  • The client starts the session in the same was as at present, by sending an audio stream.
  • On receiving the new audio stream, the server searches by IP:port (CHostAddress) for a matching channel (in CChannel vecChannels[]), and on not finding a match, allocates a free channel in that array for the new client. It stores the CHostAddress value in the allocated channel, and returns the ID (index) of the new channel.
  • The server immediately sends this ID to the client as a CLIENT_ID connected channel message. This is all existing behaviour so far.
  • A TCP-enabled client, when it receives this CLIENT_ID message, will initiate a TCP connection to the server. If the connection attempt fails, the client will assume the server is not TCP-enabled and will not retry. Operation will continue over UDP only as at present.
  • If the TCP connection succeeds, the client will immediately send a CLIENT_ID message to the server, specifying the client ID that it had just received from the server. This will enable the server to associate that particular TCP connection with the correct CChannel, and the server will store the pTcpSocket pointer in the CChannel.
  • The server can then easily send the CONN_CLIENTS_LIST updates to the client over TCP if the socket pointer in the channel is not null, or otherwise over UDP. It could also send the welcome message over the same TCP socket, improving support for longer welcome messages.
  • Other connected channel messages that are not size-critical could be sent over either UDP as at present, or the TCP connection. This is open for discussion.
  • When the client wants to disconnect from the channel, it will send a CLM_DISCONNECTION the same as at present (over either UDP or TCP), but will also close any open TCP connection to the server.
  • Over UDP, connected channel messages need to be acked, and will be retried if the ack is not received. This is necessary due to the lack of guaranteed delivery in UDP. Over the TCP socket, which provides guaranteed delivery, it would be possible to send messages without queuing or needing acks, and this might simplify implementation. Comments?

I have not yet implemented any of this connected channel functionality, beyond adding the pTcpSocket pointer to the CChannel class.

This is currently a work in progress, as described above. The purpose of this issue is to allow input from other contributors on the technical details mentioned above, and to keep the topic visible until the code is ready for a PR.

The expectation is that all the public directories will support TCP connections. This will also need suitable firewall rules at the servers. However, clients implementing all the above will still be backward-compatible with older directories and servers run by third parties. Similarly, older clients connecting to newer directories and servers will continue to operate as a present over UDP, with no use of TCP required.

Has this feature been discussed and generally agreed?

See the referenced discussion at https://github.com/orgs/jamulussoftware/discussions/1058 for history. I would value comments within this Issue regarding the solution I am proposing above. @pljones @ann0see @hoffie @dtinth and any others interested.

@softins softins added the feature request Feature request label Feb 27, 2024
@softins softins self-assigned this Feb 27, 2024
@dtinth
Copy link
Contributor

dtinth commented Feb 27, 2024

I like the proposal and support this approach.

I kinda wonder if it’s possible for another person to hijack the TCP connection. e.g. a client connects, receives a channel ID 5. Maybe the presence of a new channel is broadcasted, and a rogue client who sees this message quickly makes a TCP connection to hijack that ID. Maybe we don’t need to worry about this case, but it just kinda pops up.

@ann0see
Copy link
Member

ann0see commented Feb 27, 2024

I believe this is very much possible. There aren't any guarantees currently either way as we don't have encryption.

What's the disadvantage of trying to connect via TCP first?

@softins
Copy link
Member Author

softins commented Feb 27, 2024

I like the proposal and support this approach.

Thanks! I used your RPC server as a template for the connection handler.

I kinda wonder if it’s possible for another person to hijack the TCP connection. e.g. a client connects, receives a channel ID 5. Maybe the presence of a new channel is broadcasted, and a rogue client who sees this message quickly makes a TCP connection to hijack that ID. Maybe we don’t need to worry about this case, but it just kinda pops up.

That's certainly a good observation, and worth considering, even if unlikely. I think it could largely be mitigated by only acting upon a CLIENT_ID message if it has come from the same IP address as the one already recorded in the channel it refers to. We can't compare the port of course, as it may be different as already mentioned, but I think it's extremely unlikely that the related UDP and TCP connections would come from different IPs. That would limit the scope for hijack to another client behind the same IP (e.g. same host or same NAT).

@softins
Copy link
Member Author

softins commented Feb 27, 2024

What's the disadvantage of trying to connect via TCP first?

I couldn't think of a compatible way to associate a subsequent UDP audio stream with a TCP connection that was made first. Especially if they had travelled through NAT. When I thought of starting with UDP as normal, and then sending back the CLIENT_ID message over a successful TCP connection, it was like a Eureka moment.

@pljones
Copy link
Collaborator

pljones commented Feb 28, 2024

It sounds pretty good. I'd be inclined only to move explicitly those messages we know cause problems. However, the "infrastructure" should be there to support other messages.

I guess only the audio packets themselves need remain on UDP.

TCP/IP keep alive will be running on the TCP/IP connection, right? At the moment, a UDP drop out isn't seen as a "connection failure" by the server for quite a large window (relatively). Would the TCP/IP connection remain "connected" or drop out if keep alive failed? How would a "continuous" UDP audio connection work in this situation?

@ann0see
Copy link
Member

ann0see commented Feb 28, 2024

I couldn't think of a compatible way to associate a subsequent UDP audio stream with a TCP connection that was made first.

Seems like that's a fundamental problem.

So to not break backwards compatibility the server must still respond on UDP audio messages for session creation.

Could we send an empty audio message to the server to query the version/capabilities, then do something comparable to what syn cookies do (https://en.m.wikipedia.org/wiki/SYN_cookies) for some kind of authentication and then set up a TCP connection.
The main idea cound also be to have some kind of "secret" stored on the server to verify the client.

@softins
Copy link
Member Author

softins commented Feb 28, 2024

I couldn't think of a compatible way to associate a subsequent UDP audio stream with a TCP connection that was made first.

Seems like that's a fundamental problem.

A limitation, certainly, but I wouldn't call it a problem.

So to not break backwards compatibility the server must still respond on UDP audio messages for session creation.

Yes indeed.

Could we send an empty audio message to the server to query the version/capabilities, then do something comparable to what syn cookies do (https://en.m.wikipedia.org/wiki/SYN_cookies) for some kind of authentication and then set up a TCP connection. The main idea cound also be to have some kind of "secret" stored on the server to verify the client.

I don't really think that gains us anything except quite a lot of unneeded complexity. I think the source IP is an adequate enough "secret" to validate the TCP connection that follows the start of the UDP session.

@softins
Copy link
Member Author

softins commented Feb 28, 2024

It sounds pretty good. I'd be inclined only to move explicitly those messages we know cause problems. However, the "infrastructure" should be there to support other messages.

Yes, I agree.

I guess only the audio packets themselves need remain on UDP.

That's definitely true, also.

TCP/IP keep alive will be running on the TCP/IP connection, right? At the moment, a UDP drop out isn't seen as a "connection failure" by the server for quite a large window (relatively). Would the TCP/IP connection remain "connected" or drop out if keep alive failed? How would a "continuous" UDP audio connection work in this situation?

If I remember correctly (from a long time ago), TCP keepalive has a very long timeout, and it just intended to keep a session alive when there is no data to exchange. In the case of Jamulus, the server regularly sends the channel levels using CLM_CHANNEL_LEVEL_LIST, so if we make sure they go over TCP, that will be enough to keep the connection alive. If the actual connection fails, that will be picked up by TCP layer retries eventually giving up.

@pljones
Copy link
Collaborator

pljones commented Feb 29, 2024

OK, assuming that the TCP keepalive is longer than the Jamulus UDP audio "keep alive" time for a channel, we'd just need to ensure that part of channel clean up is to clear down the TCP socket, too?

@softins
Copy link
Member Author

softins commented Feb 29, 2024

OK, assuming that the TCP keepalive is longer than the Jamulus UDP audio "keep alive" time for a channel, we'd just need to ensure that part of channel clean up is to clear down the TCP socket, too?

Yes, absolutely.

@pljones
Copy link
Collaborator

pljones commented Feb 29, 2024

Let's say a client connects and establishes a TCP connection.

At some point the server tries to send a message over the TCP socket and gets an error back indicating the socket is no longer valid.

If the UDP audio stream continues, how does recovery work in this situation? Would the client get a CONN_FAIL, too, and know to re-initiate the TCP connection? Will the server know what its state was at the time the connection failed for recovery? I guess association with the same CChannel would continue.

@softins
Copy link
Member Author

softins commented Feb 29, 2024

Well at the very least, the server will close its end of the TCP socket and set CChannel::pTcpSocket to nullptr. This would fall back to the UDP-only situation as per older versions.

At some point the client would also notice the socket had failed, and would either also revert to UDP-only, or could try re-connecting TCP.

I think it would be unlikely in practice that the TCP socket would fail while the UDP is still working. A typical network outage would affect both at once.

@pljones
Copy link
Collaborator

pljones commented Mar 1, 2024

OK, so as I see it, even if the client has established TCP for certain messages, it should handle them arriving over UDP as currently -- but use an "unexpected UDP fallback" to indicate the server can't send the message over TCP and re-initiation of the socket is probably needed.

@softins
Copy link
Member Author

softins commented Mar 11, 2024

As I have been thinking about the design of this, and conducting a few protocol experiments, I have found some significant issues that we need to solve, concerning the interaction between a client's Connect Dialog and the servers listed by the directory. I have some ideas, but would be grateful for any other suggestions.

Background - the current Jamulus behaviour on UDP

When the client's user opens the Connect Dialog, or changes the Directory selection (Genre) while the Connect Dialog is open, the following things happen:

  • The client sends CLM_REQ_SERVER_LIST to the directory.
  • The directory notes the source IP address and port number from which the request originated. For a client that is behind a NAT router (which will be the case for most home users), these will be the public IP of the router and the possibly-mapped port number assigned by the NAT engine in the router.
  • The directory sends CLM_RED_SERVER_LIST and CLM_SERVER_LIST to the client at the above address and port.
  • At the same time, the directory sends to each listed server a CLM_SEND_EMPTY_MESSAGE containing the above public address and port of the client. This is asking each server to send a CLM_EMPTY_MESSAGE to the client that fetched the server list from the directory.
    • The intention of this is to support a server that is itself behind a NAT and/or stateful firewall, without any port-forwarding rules having been set up.
    • The action of the server sending this empty message to the original client is to create a local session in the firewall that is ready to receive ping requests from the client, as the stateful firewall will consider such a ping as a response packet to the session just created.
    • It does not matter whether the original client receives this empty message or not: its only purpose is to open the session at the server end.
    • If this step were not performed, only servers where the operator had configured an incoming port-forwarding rule to the server would be able to be pinged.
  • The client sends a CLM_PING_MS_WITHNUMCLIENTS message to each server in the list received from the directory. This message contains a millisecond timestamp of the time of sending, and a number of connected clients, which for the client itself is always 0.
  • Each server, on receiving that message, responds to the client with a CLM_PING_MS_WITHNUMCLIENTS message of its own, containing the same timestamp that was received from the client, and the current number of clients connected to the server.
  • When the client receives the message from the server, it can calculate the ping round-trip time from the current time and the received timestamp.
  • If the number of clients in the received message is different from the last value received from that server (initialised as 0), the client sends a CLM_REQ_CONN_CLIENTS_LIST to the server, asking for a list of connected clients.
  • In the above case, the server responds with a CLM_CONN_CLIENTS_LIST containing a list of all the connected clients. The Connect Dialog can display this list of connected clients to the user.
  • All the above steps are repeated every 2.5 seconds until the user clicks Connect or closes the Connect Dialog.

As stated at the beginning of this Issue, it is that fact that the CLM_SERVER_LIST and CLM_CONN_CLIENTS_LIST can be large messages requiring IP fragmentation that is the purpose of investigating the use of TCP.

The Problem

A client using the proposed TCP functionality would send the CLM_REQ_SERVER_LIST request to the directory via a newly-opened TCP connection. But for efficiency and timing accuracy, it would still send the ping messages to the listed servers using UDP. This gives rise to the problem that the public source ports of the TCP connection and UDP session could well be different. In fact this is likely.

  • When the directory receives the CLM_REQ_SERVER_LIST via TCP, the public address and TCP port number seen by it for the client will currently be used in the CLM_SEND_EMPTY_MESSAGE the directory sends to each listed server.
  • The servers will therefore send their UDP CLM_EMPTY_MESSAGE to the TCP port number of the client.
  • The client will ping each listed server from its UDP socket, and this ping could arrive at the server from a UDP source port that is different from the TCP port number seen by the directory and opened as UDP by each server.
  • In that case, only servers that have been set up with open port-forwarding to the listening UDP port will be pingable by the client.

Possible Solutions

  1. Initially I thought it might be possible to overcome this by binding the TCP socket in the client to the same local port number as already used by the UDP socket.

    However, although this might work sometimes, when the client is behind a NAT router, the router is free to translate the source port number to anything it wants, and this translation might well be different for the UDP and TCP sessions, even if they use the same port number in the client host.

    So as stated above, only servers that already have an open port-forwarding set up for their listening UDP port can be reliably pinged after the client has fetched the server list via TCP.

    In addition to the above, a client using the proposed TCP functionality would want to send the client list request CLM_REQ_CONN_CLIENTS_LIST to each in-use server via TCP, to allow for the situation of a large server having a lot of clients connected. In practice, this would only be the case for a small minority of servers, but a thorough solution should allow for the situation.

    However, the magic port-opening described above only applies to UDP sessions. A server wanting to respond to a TCP client list request will need to have its ingress router or firewall configured to allow and port-forward TCP connections to its listening port number (a server will listen on the same port number for both UDP and TCP). Note that if a server operator is savvy enough to configure this, they could also reasonably be expected to allow and port-forward UDP messages on the same port number, which would solve the initial problem for that particular server.

  2. I have wondered about replacing the CLM_REQ_SERVER_LIST message with a new enhanced message over TCP that allows the client to tell the directory what IP address and port number to give the listed servers in the CLM_SEND_EMPTY_MESSAGE. However, a client program has no way of autonomously detecting or influencing the public source port used for its outgoing packets. It would need the client first to send the directory a new protocol message over UDP to ask the directory to send back the public IP and source port it saw with the message. For example CLM_REQ_IP_AND_PORT, with a response of CLM_IP_AND_PORT. If the directory did not respond to this request, it could also be inferred that the server does not support TCP connections for Jamulus.

IPv6 considerations

Several of the existing protocol messages encode an IP address as 4 bytes. This is fine for an IPv4 address, but an IPv6 address takes 16 bytes, if represented in binary format. So we would also need to define new messages to replace the followingi, which all include an IPv4 address as a 4-byte binary value:

  • CLM_SERVER_LIST
  • CLM_RED_SERVER_LIST
  • CLM_SEND_EMPTY_MESSAGE

The suggested CLM_IP_AND_PORT message mentioned above would need to be designed from the outset to support both IPv4 and IPv6, possibly as a variable-length string.

Conclusion

This is turning out to be more complex than anticipated, so I would be interested in comments on the above, and suggestions for the way forward.

@ann0see
Copy link
Member

ann0see commented Mar 11, 2024

I don't think solution 1 is feasible since NATs behave as they want.
2. seems more promising.

I'm a bit worried about TCP hole punching, since it most likely works different to UDP anyway (I remember having heared that it's non trivial) - this needs some more research.

Food for thought:

  • Can the directory also relay CLM_CONN_CLIENTS_LIST of each server to the clients via TCP while all the hole punching works over UDP only and the server doesn't send the list?
  • can the client open a TCP socket to the directory and then replace the destination IP to the server? What would that imply?
  • https://en.m.wikipedia.org/wiki/TCP_hole_punching

@pljones
Copy link
Collaborator

pljones commented Mar 11, 2024

If the Client initiates the Directory server list request over UDP and gets back an initial "CLM_TCP" response (with some token), it could then follow up with the TCP request for the server list (including the returned token) and the Directory would know the UDP and TCP port details. It would mean the Client knew immediately that the server supported TCP, without sending a TCP request and having to wait for that to time out, too, before initiating the UDP server list request, if it didn't.

That would then allow the Directory to send TCP and UDP port details to the Servers over either TCP or UDP.

@pljones
Copy link
Collaborator

pljones commented Mar 11, 2024

CLM_SERVER_LIST has to support both the "outside" and "inside" addresses for any Server that registers with a Directory behind a NAT gateway. (Currently it only handle the IP address; it should really handle port remapping, too.) It's, essentially, the same data content as the Directory holds for each server (and that each server holds for itself).

I think for registration and Directory interaction, we'll need CLM_IPV6_.... protocol messages. If a Client UDP message arrives over IPv6, they get the CLM_IPV6.... responses; if a Server registers over IPv6, it gets CLM_IPV6.... messages if a client requests the IPv6 server list.

A Client wanting both IPv4 and IPv6 Servers would need to make two requests, as I can't think of a way to embed the two lots of semantics into one request...

@softins
Copy link
Member Author

softins commented Mar 11, 2024

If the Client initiates the Directory server list request over UDP and gets back an initial "CLM_TCP" response (with some token), it could then follow up with the TCP request for the server list (including the returned token) and the Directory would know the UDP and TCP port details. It would mean the Client knew immediately that the server supported TCP, without sending a TCP request and having to wait for that to time out, too, before initiating the UDP server list request, if it didn't.

That would then allow the Directory to send TCP and UDP port details to the Servers over either TCP or UDP.

I like this idea, I'll give it some more thought. Thanks!

@softins
Copy link
Member Author

softins commented Mar 11, 2024

I don't think solution 1 is feasible since NATs behave as they want. 2. seems more promising.

Yes, I agree. Solution 1 was my first thought, and I was explaining why I didn't think it would work.

I'm a bit worried about TCP hole punching, since it most likely works different to UDP anyway (I remember having heared that it's non trivial) - this needs some more research.

Thanks for the link below. I've read through it, and don't think it's suitable for what we want to achieve.

Food for thought:

  • Can the directory also relay CLM_CONN_CLIENTS_LIST of each server to the clients via TCP while all the hole punching works over UDP only and the server doesn't send the list?

Maybe, but I'd be a bit worried about backward compatibility.

  • can the client open a TCP socket to the directory and then replace the destination IP to the server? What would that imply?

I don't think that is possible. Once a TCP connection is established, we can't move one end of the connection from one host to another. Unless I've misunderstood what you mean.

Thanks!

@softins
Copy link
Member Author

softins commented Mar 11, 2024

I like this idea, I'll give it some more thought. Thanks!

Just to add a bit more. For compatibility, the directory will still have to send the CLM_RED_SERVER_LIST and CLM_SERVER_LIST via UDP after it has sent the new CLM_TCP message. And in doing so, it will then have already been able to send the empty message requests to all the servers, with the correct UDP port details. So the servers will then be pingable.

There is no per-client state stored in the directory to relate CLM client requests to each other, so I think the 'token' will need to be the public IP address and port number seen by the directory on the initial UDP list request. A client that then switches to TCP can send these details back to the directory, which can use them for the empty message requests to the servers.

For a client-directory connection that drops fragmented UDP, the CLM_TCP message will still get back to the client over UDP even if the server lists don't, and a client that is able to switch to TCP at that point will go on to fetch the server list successfully.

@softins softins added this to the Release 4.0.0 milestone Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
Status: Triage
Development

No branches or pull requests

4 participants