-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crimson: Support basic deployments #57593
base: main
Are you sure you want to change the base?
Conversation
e3d7b2f
to
6faf209
Compare
default: 0 | ||
desc: number of CPU cores on which seastar reactor threads will run | ||
flags: | ||
- startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why crimson_seastar_reactor_num
is needed.
Looks it is identical to configure crimson_seastar_cpu_cores
as "0-crimson_seastar_reactor_num-1
" ?
Which seems not useful when there are multiple OSDs because --cpuset
always start with 0, causing overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's practically the same. It's there only to make Crimson deployment friendlier by setting a global (non-optimal) OSD value of crimson_seastar_reactor_num = n
.
Once crimson_seastar_cpu_cores
and crimson_alien_thread_cpu_cores
are set, this value will no longer have any impact.
For crimson_seastar_cpu_cores
to be set properly, each OSD on each host must be configured separately which may be too specific for just getting the cluster up and running.
Another reason is that if crimson_seastar_cpu_cores
is not set, we fallback to Seastar's default of using all available cores. That means that on nodes with 80 cores we will create 80 seastar threads.
By looking again, crimson_seastar_thread_num
is probably a better terminology and we can also cap it to 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I generally understand the purpose.
Which seems not useful when there are multiple OSDs because --cpuset always start with 0, causing overlap.
But I'm still wondering how to prevent "--cpuset" overlapping if there are multiple OSDs in the same host, only with crimson_seastar_thread_num
or crimson_seastar_reactor_num
configured.
I suppose the OSD still needs to know that it is the nth daemon in the same host to produce the accurate --cpuset
for seastar. But if it requires yet another Ceph configuration (which still has to be configured separately), probably the current crimson_seastar_cpu_cores
is simpler?
Or, does each OSD live in a separate container, so that overlapping is impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm still wondering how to prevent "--cpuset" overlapping if there are multiple OSDs in the same host, only with
crimson_seastar_thread_num
orcrimson_seastar_reactor_num
configured.
It's expected to overlap. As mentioned, it's not optimal (at all), but it's just there to let the cluster come up in a simple way. Actual usage will have to set crimson_seastar_cpu_cores
and crimson_alien_thread_cpu_cores
.
I'll emphasis that in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyx1231st, Please see updated commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-c [ --smp ] arg number of threads (default: one per CPU)
Looks it configures the seastar thread number, and thread-affinity=0
to unpin the CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks it configures the seastar thread number, and thread-affinity=0 to unpin the CPUs.
(From our offline conversation)
See reactor.cc:
if (smp_opts.smp) {
smp::count = smp_opts.smp.get_value();
} else {
smp::count = cpu_set.size();
}
std::vector<reactor*> reactors(smp::count);
All that --smp
does is replace cpu_set.size()
usage explicitly. Setting --smp
= n or --cpuset
= 0-n-1 will result in the same number of smp::count. The only difference would be which CPUs are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference would be which CPUs are used.
And also quote from @athanatos's comment above:
With crimson_seastar_reactor_num, we don't set a cpu_mask at all so the scheduler is free to place them whereever. In that case, they may overlap, but at least all of the cpus on the host will be used.
So my understanding from the above is that with crimson_seastar_reactor_num=n
, we expect the n
seastar reactor threads to be scheduled to all the available CPUs, rather than being limitted to --cpuset=0-(n-1)
, right?
Also, if we want to isolate the CPUs between seastar and alienstore threads in this case, for example, cpuset 0-(m-1) for seastar threads and cpuset m-(nproc-1) for alienstore thread, m
should be another independent configuration, is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it makes sense to allow the user to specify a number of alienstore cpu threads different from the number of seastar threads. To make things simpler, I'd be fine with requiring the user to specify both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1 (Optimal)
- CPUs are pinned to the provided sets (seastar and alien cpu sets)
- thread-affinity = 1 (seastar's default).
- smp (smp::count / reactor count):
seastar_cpu_set.size()
- Seastar reactor threads:
seastar_cpu_set.size()
- Alienstore threads:
alien_cpu_set.size()
*crimson_alien_op_num_threads_per_core
(default = 4).
Option 2 (Basic):
- No CPU pinning.
- thread-affinity = 0
- smp (smp::count / reactor count) =
crimson_seastar_num_threads
- Seastar reactor threads num:
crimson_seastar_num_threads
- Alienstore threads num:
crimson_alien_op_num_threads
6faf209
to
9f6182b
Compare
9f6182b
to
a1519a2
Compare
1a00aee
to
335b0e4
Compare
} | ||
std::string smp = fmt::format("{}", reactor_num); | ||
ret.early_args.emplace_back("--smp"); | ||
ret.early_args.emplace_back(smp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
335b0e4
to
539241f
Compare
539241f
to
dd1b201
Compare
@cyx1231st, @athanatos - should be ready with all comments addressed. I'll follow up with doc update once approved/merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for minor comments.
crimson_alien_op_num_threads_per_core can be removed from the commit message.
This new option is there to allow *basic* cluster deployments. crimson_seastar_num_threads can be used globally for all the OSDs in the cluster: ``` osd: crimson_seastar_num_threads: <n> crimson_alien_op_num_threads: <m> ``` As a result, all the available CPUs will be allocated to *both* seastar reactor threads and to Alienstore threads - without any exclusion. Notes: * The core allocation will most likely overlap between OSDS and/or Seastar and Alienstore. An optiomal deployment requires `crimson_alien_thread_cpu_cores` and `crimson_seastar_cpu_cores` to be set for each OSD based on the host its located at. * Seastar reactor number (smp::count) will be deduced directly from crimson_seastar_num_threads --- Documentation followup: ceph#57352 --- ### Option 1 (Optimal) * CPUs are pinned to the provided sets (seastar and alien cpu sets) * thread-affinity = 1 (seastar's default). * smp (smp::count / reactor count): `seastar_cpu_set.size()` * Seastar reactor threads: `seastar_cpu_set.size()` * Alienstore threads num: `crimson_alien_op_num_threads` --- ### Option 2 (Basic): * No CPU pinning. * thread-affinity = 0 * smp (smp::count / reactor count) = `crimson_seastar_num_threads` * Seastar reactor threads num: `crimson_seastar_num_threads` * Alienstore threads num: `crimson_alien_op_num_threads` --- Fixes: https://tracker.ceph.com/issues/65752 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
dd1b201
to
64a9915
Compare
This new options is there to allow basic cluster deployments.
crimson_seastar_num_threads can be used globally for all the OSDs in the cluster:
As a result, CPUs
0-<n-1>
will be allocated to seastar threads and CPUs<n>-<nproc-1>
will be allocated to Alienstore threads
Note: The core allocation will overlap between OSDS. An optiomal deployment
requires
crimson_alien_thread_cpu_cores
andcrimson_seastar_cpu_cores
to be setfor each OSD based on the host its located at.
Documentation followup: #57352
Fixes: https://tracker.ceph.com/issues/65752
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e