release-24.1: streamingccl: mark cutback retention jobs as successful #124055

blathers-crl · 2024-05-13T17:17:17Z

Backport 1/1 commits from #123934 on behalf of @dt.

/cc @cockroachdb/release

Previously we started creating a stream producer job in the destination cluster when we completed replication cutover, to preserve the history as of that cutover time in case the another cluster would subsequently want to start replicating as of that time, e.g. reversing the direction of replication, or in case the promoted cluster would want to revert to the cutover time as part of a demotion back to a standby.

However, this placeholder job is, by design, never actually used by replication -- it exists only to keep the option open for some other replication job to be started -- and thus is never heartbeated or marked as no longer needed due to successful completion of replication, causing it to be marked as FAILED when it expires.

This changes the initial status so that it is created already indicating that replication succeeded. Thus when it expires, it is marked as successful instead of failed, avoiding the spurious 'failures' that one observes in the job system surfaces.

Release note (enterprise change): History Retention jobs created at the completion of cluster replication no longer erroneously indicate they failed when the expire.

Epic: none.

Release justification:

Previously we started creating a stream producer job in the destination cluster when we completed replication cutover, to preserve the history as of that cutover time in case the another cluster would subsequently want to start replicating as of that time, e.g. reversing the direction of replication, or in case the promoted cluster would want to revert to the cutover time as part of a demotion back to a standby. However, this placeholder job is, by design, never actually used by replication -- it exists only to keep the option open for some other replication job to be started -- and thus is never heartbeated or marked as no longer needed due to successful completion of replication, causing it to be marked as FAILED when it expires. This changes the initial status so that it is created already indicating that replication succeeded. Thus when it expires, it is marked as successful instead of failed, avoiding the spurious 'failures' that one observes in the job system surfaces. Release note (enterprise change): History Retention jobs created at the completion of cluster replication no longer erroneously indicate they failed when the expire. Epic: none.

blathers-crl · 2024-05-13T17:17:21Z

Thanks for opening a backport.

Please check the backport criteria before merging:

Backports should only be created for serious
issues or test-only changes.
Backports should not break backwards-compatibility.
Backports should change as little code as possible.
Backports should not change on-disk formats or node communication protocols.
Backports should not add new functionality (except as defined
here).
Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
All backports must be reviewed by the owning areas TL and one additional
TL. For more information as to how that review should be conducted, please consult the backport
policy.

If your backport adds new functionality, please ensure that the following additional criteria are satisfied:

There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
Your backport must be accompanied by a post to the appropriate Slack
channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

cockroach-teamcity · 2024-05-13T17:17:33Z

This change is

msbutler · 2024-05-16T17:26:02Z

@dt could you add the following patch to this pr? #124162

As of #123934, the producer job succeeds instead of fails. This patch teaches some test infra about this. Fixes #124139 Fixes #124138 Fixes #124151 Fixes #124137 Release note: none

blathers-crl bot requested a review from a team as a code owner May 13, 2024 17:17

blathers-crl bot force-pushed the blathers/backport-release-24.1-123934 branch from 02f6a24 to c6db70f Compare May 13, 2024 17:17

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 13, 2024

blathers-crl bot requested review from msbutler and removed request for a team May 13, 2024 17:17

blathers-crl bot assigned dt May 13, 2024

blathers-crl bot requested a review from stevendanna May 13, 2024 17:17

blathers-crl bot added the backport Label PR's that are backports to older release branches label May 13, 2024

dt requested a review from jbowens May 13, 2024 17:36

jbowens approved these changes May 13, 2024

View reviewed changes

streamingccl: deflake a few tests

65a235d

As of #123934, the producer job succeeds instead of fails. This patch teaches some test infra about this. Fixes #124139 Fixes #124138 Fixes #124151 Fixes #124137 Release note: none

dt merged commit f199555 into release-24.1 May 26, 2024
18 of 20 checks passed

dt deleted the blathers/backport-release-24.1-123934 branch May 26, 2024 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-24.1: streamingccl: mark cutback retention jobs as successful #124055

release-24.1: streamingccl: mark cutback retention jobs as successful #124055

blathers-crl bot commented May 13, 2024

blathers-crl bot commented May 13, 2024

cockroach-teamcity commented May 13, 2024

msbutler commented May 16, 2024

release-24.1: streamingccl: mark cutback retention jobs as successful #124055

release-24.1: streamingccl: mark cutback retention jobs as successful #124055

Conversation

blathers-crl bot commented May 13, 2024

blathers-crl bot commented May 13, 2024

cockroach-teamcity commented May 13, 2024

msbutler commented May 16, 2024