Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16757: Fix broker re-registration issues around MV 3.7-IV2 #15945

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

cmccabe
Copy link
Contributor

@cmccabe cmccabe commented May 13, 2024

When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced.

This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set. This is much more targetted than the original code.

Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId.

Copy link
Contributor

@gaurav-narula gaurav-narula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this and raising this! I've a question around onMetadataUpdate implementation in BrokerRegistrationTracker

Comment on lines +126 to +128
if (metadataVersion.isAtLeast(MetadataVersion.IBP_3_7_IV2) &&
registration.directories().isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metadataVersion.isAtLeast(MetadataVersion.IBP_3_7_IV2) evaluates to false, then isn't registration.directories().isEmpty() always true?

The comment above mentions the directory list changing as a way to get here, but the only ways the directory list can change are:

  • On a broker registration that includes dirs
  • When a broker that has previously registered with more than one dir turns one of them offline

Both of these require that the broker has registered at least once with directories, which can only happen after MetadataVersion.IBP_3_7_IV2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the broker configuration? If the broker isn't configured with multiple log directories there's no need to re-register. Perhaps we shouldn't even setup this metadata listener if there's only one configured log dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metadataVersion.isAtLeast(MetadataVersion.IBP_3_7_IV2) evaluates to false, then isn't registration.directories().isEmpty() always true?

No. We can get to this point in the code just because of a metadata version change. For example, from 3.7-IV1 to 3.7-IV2. In that case registration.directories() would not be empty and we would not want to do anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the broker configuration? If the broker isn't configured with multiple log directories there's no need to re-register. Perhaps we shouldn't even setup this metadata listener if there's only one configured log dir.

broker IDs do have benefits for non-JBOD brokers. One benefit is knowing when a directory has been re-initialized. In any case, the impact of resending the registration in this specific case is pretty small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we add directories to registration records pre 3.7-IV2, so if we're bumping the MV from 3.7-IV1, how does registration.directories() get populated before a broker re-registration?

if (featureControl.metadataVersion().isDirectoryAssignmentSupported()) {
record.setLogDirs(request.logDirs());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't express myself well. I agree that registration.directories() will indeed be empty if we're coming from 3.7-IV1 or earlier. But, we still need to check the current MV in this "if" statement since we do not want to resend the registration unless the MV is 3.7-IV2 or higher AND the registration does not contain directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the point is to trigger the re-registration only once, and empty directories signals that hasn't happened yet.

@soarez
Copy link
Member

soarez commented May 30, 2024

There are some conflicts that need addressing, and the JDK 21 pipeline didn't run.

When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced.

This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set.  This is much more targetted than the original code.

Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId.
@cmccabe
Copy link
Contributor Author

cmccabe commented May 31, 2024

rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants