You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As servers are being upgraded, there is an increasing proportion of hardware support for RoCE (RDMA over Converged Ethernet) network cards. The RDMA protocol offers better congestion control and lower latency. Adapting distributed file systems to utilize RDMA can enhance overall throughput.
Contact Details
No response
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
As servers are being upgraded, there is an increasing proportion of hardware support for RoCE (RDMA over Converged Ethernet) network cards. The RDMA protocol offers better congestion control and lower latency. Adapting distributed file systems to utilize RDMA can enhance overall throughput.
Describe the solution you'd like.
RDMA模块简介
RDMA意为远程直接地址访问,即本端节点可以像访问本地内存一样,绕过复杂的TCP/IP网络协议栈直接读写远端内存。RDMA技术可以用于构建高性能的存储网络,提供低延迟和高带宽的存储访问,因此为了提高CubeFS在大模型场景中的写入速度,开发了RDMA模块以减少cpu消耗,加速数据传输和降低通信延迟,并优先解决写入流程。
RDMA技术优势:
CubeFS RDMA方案设计:
为了适配GDS场景,控制消息使用send和recv这一组操作实现,而数据载荷部分是数据传输的主要内容,所以使用read和write这类对端无感知的操作来实现, 在客户端数据写入流程中,由接收消息头的一端主动到另一端拉取数据;而在客户端数据读取流程中,由接收消息头的一段主动将数据写到远端。
1.客户端写数据到服务端流程:
2.客户端从服务端读数据流程:
3.数据内存:
数据内存实现为内存池,并且使用buddy算法进行内存分配,每次进行通信的时候都先从内存池中分配一块足够大小的内存用来保存读取或者写入的数据,而当上层应用处理完成之后,就将该内存释放回内存池。由于dataNode leader需要同时作为client的接收端以及follower的发送端,所以需要在dataNode leader跨连接时共享数据内存以避免数据载荷的多次拷贝,即使dataNode leader面向client的接收端连接和面向follower的发送端连接注册相同的内存池,这样就省去了去了内存拷贝的消耗。
4.控制内存:
在客户端和服务端分别注册一块用于传递控制信息的控制内存,它的作用主要有:固定长度的消息头和响应都从其上分配内存,连接建立之初同时注册多个消息头和响应的内存块以同时接收和发送多个消息。这些块可以分为两个部分:
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response
The text was updated successfully, but these errors were encountered: