Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: TaskRun recoverable error, CreateContainerConfigError neither resumes PipelineRun on fix, nor times out #7807

Open
codegold79 opened this issue Mar 26, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@codegold79
Copy link

codegold79 commented Mar 26, 2024

Expected Behavior

Consider a TaskRun started by a PipelineRun. When the TaskRun encounters a recoverable error such as a CreateContainerConfigError, the PipelineRun should either resume running if the TaskRun is fixed, or fail due to timeout if problem is not fixed in time.

Actual Behavior

In the case when a PipelineRun starts a TaskRun, and a CreateContainerConfigError occurs, the TaskRun status.conditions[].status is "False" (failed). However, the TaskRun status.steps[] indicate it is still in waiting. According to this Tekton design table, the TaskRun is in a recoverable state.

You can tell its in a recoverable state because the status has no status.completionTime, and in the steps, the state is in waiting, not terminated:

TaskRun

status:
  conditions:
  - lastTransitionTime: "2024-03-22T18:09:56Z"
    message: Failed to create pod due to config error
    reason: CreateContainerConfigError
    status: "False"
    type: Succeeded
  startTime: "2024-03-22T18:09:40Z"
  steps:
  - container: step-check-step
    name: check-step
    waiting:
      message: secret "oci-store" not found
      reason: CreateContainerConfigError

Although the TaskRun is in a recoverable state, the PipelineRun has already terminated. There doesn't seem to be a way to recover from the failed state:

PipelineRun

status:
  completionTime: "2024-03-22T21:46:29Z"
  conditions:
  - lastTransitionTime: "2024-03-22T21:46:29Z"
    message: 'Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 0'
    reason: Failed
    status: "False"
    type: Succeeded
  startTime: "2024-03-22T21:46:16Z"

When the oci-store is provided, ... <explain how recovery doesn't happen>

When pod timeout is adjusted to timeout, ... <explain how pipelinerun timeout won't happen>

Steps to Reproduce the Problem

TO DO
1.
2.
3.

Additional Info

  • Kubernetes version:

    Output of kubectl version:

(paste your output here)
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

(paste your output here)
@codegold79 codegold79 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2024
@codegold79 codegold79 changed the title TaskRun recoverable error, CreateContainerConfigError neither resumes PipelineRun on fix, nor times out WIP: TaskRun recoverable error, CreateContainerConfigError neither resumes PipelineRun on fix, nor times out Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant