Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Jenkins Workflow Plugin #67

Open
tisto opened this issue Feb 13, 2015 · 42 comments
Open

Evaluate Jenkins Workflow Plugin #67

tisto opened this issue Feb 13, 2015 · 42 comments
Assignees
Labels

Comments

@tisto
Copy link
Sponsor Member

tisto commented Feb 13, 2015

No description provided.

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 14, 2015

I played around a bit with the workflow plugin and created a simple build pipeline:

http://78.47.49.108/job/workflow

stage 'Build'
node {
  git branch: '5.0', changelog: true, poll: true, url: 'https://github.com/plone/buildout.coredev.git'
  sh "python2.7 bootstrap.py"
  sh "bin/buildout -c jenkins.cfg"
  step([$class: 'ArtifactArchiver', artifacts: '**/.*', fingerprint: true])
}

stage 'Test'
node {
  unarchive mapping: ['**/.*': '.']
  sh "bin/jenkins-test -s plone.app.discussion"
  step([$class: 'ArtifactArchiver', artifacts: 'parts/jenkins-test/testreports/*.xml', fingerprint: true])
  step([$class: 'JUnitResultArchiver', testResults: 'parts/jenkins-test/testreports/*.xml'])
}

The 'Build' stage checks out buildout.coredev, runs buildout and archives the entire directory.
The 'Test' stage checks out the archive from the 'Build' stage, runs the tests and archives the test results.

I think this is a good starting point.

@gforcada It would be cool if we could use the jenkins-job-builder to create such a workflow job. Though, I think we have to build this ourself I guess.

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 14, 2015

Example how to copy workspace over to next build stage and run second stage in parallel:

stage name: 'Build'
node {
  git branch: '5.0', changelog: true, poll: true, url: 'https://github.com/plone/buildout.coredev.git'
  sh "python2.7 bootstrap.py"
  step([$class: 'ArtifactArchiver', artifacts: '**/*', fingerprint: true])
}

stage name: 'Test'
parallel(
  test1: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "ls -al"
      sh "pwd"
    }
  }, 
  test2: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "ls -al"
      sh "pwd"
    }
  }
)

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 14, 2015

Example of a build pipeline with a build stage and a test stage that runs two different tests in parallel and then collects the test results of both:

stage 'Build'
node {
  git branch: '5.0', changelog: true, poll: true, url: 'https://github.com/plone/buildout.coredev.git'
  sh "python2.7 bootstrap.py"
  sh "bin/buildout -c jenkins.cfg"
  step([$class: 'ArtifactArchiver', artifacts: '**/.*', fingerprint: true])
}

stage name: 'Test'
parallel(
  test1: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "bin/buildout -c jenkins.cfg"
      sh "bin/jenkins-test -s plone.app.discussion"
      step([$class: 'ArtifactArchiver', artifacts: 'parts/jenkins-test/testreports/*.xml', fingerprint: true])
      step([$class: 'JUnitResultArchiver', testResults: 'parts/jenkins-test/testreports/*.xml'])
    }
  }, 
  test2: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "ls -al"
      sh "bin/buildout -c jenkins.cfg"
      sh "bin/jenkins-test -s plone.app.dexterity"
      step([$class: 'ArtifactArchiver', artifacts: 'parts/jenkins-test/testreports/*.xml', fingerprint: true])
      step([$class: 'JUnitResultArchiver', testResults: 'parts/jenkins-test/testreports/*.xml'])
    }
  }
)

http://78.47.49.108/job/workflow/

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 14, 2015

Build that runs alltests and alltests-at in parallel:

stage 'Build'
node {
  git branch: '5.0', changelog: true, poll: true, url: 'https://github.com/plone/buildout.coredev.git'
  sh "python2.7 bootstrap.py"
  sh "bin/buildout -c jenkins.cfg"
  step([$class: 'ArtifactArchiver', artifacts: '**/.*', fingerprint: true])
}

stage name: 'Test'
parallel(
  test1: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "bin/buildout -c jenkins.cfg"
      sh "bin/jenkins-alltests"
      step([$class: 'ArtifactArchiver', artifacts: 'parts/jenkins-test/testreports/*.xml', fingerprint: true])
      step([$class: 'JUnitResultArchiver', testResults: 'parts/jenkins-test/testreports/*.xml'])
    }
  }, 
  test2: {
    node {
      unarchive mapping: ['**/*': '.']
      sh "bin/buildout -c jenkins.cfg"
      sh "bin/jenkins-alltests-at"
      step([$class: 'ArtifactArchiver', artifacts: 'parts/jenkins-test/testreports/*.xml', fingerprint: true])
      step([$class: 'JUnitResultArchiver', testResults: 'parts/jenkins-test/testreports/*.xml'])
    }
  }
)

@gforcada
Copy link
Sponsor Member

💯 nice work!!

I just checked it and jjb does not have any sort of code on how to create a jenkins flow job... Oh well, is not so much of a problem though:
http://78.47.49.108/job/workflow/config.xml

Will look into it.

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 15, 2015

I guess it is perfectly ok to just push the config.xml since a workflow Jenkins job is nothing more than a title and a groovy script with the workflow definition.

@gforcada
Copy link
Sponsor Member

Yeah I was thinking that also, but then do we want to keep two places where to define jobs? Regular ones in jjb and workflow ones in groovy scripts?

Although it may be a bit of time to work them out, not really much given the amount of XML, I would strongly prefer having only one single way to configure things, i.e. keep everything on jjb.

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 15, 2015

The job itself should be defined in JJB. The workflow jobs allow you to define the groovy script in a remote repository:

https://github.com/plone/jenkins.plone.org/blob/master/flow.groovy

Therefore the JJB workflow jobs will be minimal.

There also might be some overlap between jjb and workflow plugin. We should use the workflow plugin only for jobs that require a pipeline.

@gforcada
Copy link
Sponsor Member

Completely agree, I was mostly stating that we need to make sure that jjb spits out the proper XML boilerplate needed for pipeline jobs.

@gforcada
Copy link
Sponsor Member

That was quick: #74 :D

@tisto
Copy link
Sponsor Member Author

tisto commented Feb 15, 2015

Great. As soon as the jjb run is fixed we can set up an experimental workflow job (we are still missing the xvfb options right now so I don't think we can move to the workflow plugin right away).

Here is the final job configuration:

http://78.47.49.108/job/workflow/config.xml

@gforcada
Copy link
Sponsor Member

👍

@gforcada
Copy link
Sponsor Member

Closing this, the evaluation turned good, though with the xvfb caveat...

@tisto tisto reopened this Jun 12, 2015
@tisto
Copy link
Sponsor Member Author

tisto commented Jun 12, 2015

The jenkins workflow plugin now seems to support build wrappers:

https://github.com/jenkinsci/workflow-plugin/blob/master/basic-steps/CORE-STEPS.md#build-wrappers

There is a XVNC Jenkins plugin for headless X servers that is already supported:

https://wiki.jenkins-ci.org/display/JENKINS/Xvnc+Plugin

@tisto tisto self-assigned this Jun 12, 2015
@tisto tisto added the ready label Jun 12, 2015
@gforcada
Copy link
Sponsor Member

Finally workflow has xvfb support: https://issues.jenkins-ci.org/browse/JENKINS-26555 we need to update to workflow plugin version 1.1.0

@tisto
Copy link
Sponsor Member Author

tisto commented Dec 3, 2015

Status: I upgraded the xvfb plugin and the workflow plugin now works fine. We can run alltests,alltests-at and robot tests in parallel. Unfortunately the robot framework plugin does not support the workflow plugin yet. This is the last missing piece:

jenkinsci/robot-plugin#12

I also noticed that currently the full workflow job takes around 40 minutes, which is 10 minutes more than the longest Plone 5 job (alltests). I guess this is because we have a separate build stage. Though, I guess we might be able to improve that.

@tisto
Copy link
Sponsor Member Author

tisto commented Dec 4, 2015

I just found another blocker. In order to run our builds in parallel the port allocator plugin needs to be fixed:

https://issues.jenkins-ci.org/browse/JENKINS-31449

@Rotonen
Copy link
Contributor

Rotonen commented Oct 16, 2018

The port allocation issues are fixed now, but no, we need to launch a separate build per layer based on the output of:
bin/test --all --list-tests 2>/dev/null | grep -E '^Listing' | awk '{print $2}' | sort -u

Our end to end browser tests are picky about their runtime environment to the point where they want a separate Xvfb instance per layer. So we'll either have to roll a hacky test wrapper like I have done or launch a separate build per layer.

And using the xvfb-run -a wrapper per build has proven more reliable than the Jenkins plugins.

I'll try to give this pipeline approach a spin later this week.

@gforcada
Copy link
Sponsor Member

Are dynamic pipelines possible? We could make a stage that decides how many parallel jobs the tests can be split into and then have these amount of jobs created...

Or just have one stage/job per unit tests/functional/robot and split there.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 16, 2018

They are possible.

https://stackoverflow.com/questions/42837066/can-i-create-dynamically-stages-in-a-jenkins-pipeline

One per layer is the sensible option. Most layers run in 10 to 30 seconds when run standalone, so the burst will clear quick. We have two executors per node per real core exactly to allow for something like this, where any slack caused by waiting on IO can be picked up by something else.

This way we also get just one Xvfb per Robot run.

One of the reasons why I went for having layers pick their own ports dynamically and using xvfb-run -a instead of the Jenkins plugins was to future proof for using the pipelines mechanisms as the ecosystem does not fully extend and/or converge from the old style jobs onto the new stuff.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 16, 2018

Also this way I do not need to solve the Robot output HTML writing for running the layers in parallel in the same workspace.

We can also do two layer discovery runs, one for Robot and one for the rest and have all the Robot ones still split further, per layer per testclass (AKA per Robot suite).

@Rotonen
Copy link
Contributor

Rotonen commented Oct 16, 2018

Also this way we could give the Robot suites a retry threshold.

@gforcada
Copy link
Sponsor Member

Giving it another though, I remember that Timo had some problems at the beginning regarding copying the environment from one step (according to Jenkins docs regarding pipelines) to another one. If we split too much, we might notice that the time to set up/teardown a step is quite high regarding the time it spends running the tests.

If that is still the case we could think of a simple pipeline that does:
build/bootstrap -> unit tests/functional/robot (so 3 steps and within in bin/mtest to parallelize them)

Then we can still do the retry on the robot tests for example.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 17, 2018

Meh. I'll give the massive burst a spin. I find considering optimisation premature at this point.

We cannot give a weight to pipeline jobs, which sucks, as we then should not parallelise in-build if we use a pipeline.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 18, 2018

The distribution of the stashes takes 6 .. 8 minutes. A shame, otherwise that works ok. In any case we need to rerun buildout to set the paths right (and we're not 100% on the binary compatibility across nodes in any case for compiled artefacts either).

@gforcada can you add a user global buildout cache configuration to the nodes?

$ cat ~/.buildout/default.cfg
[buildout]
eggs-directory = /foo/bar
download-cache = /foo/bar
abi-tag-eggs = true

Or is that too racecondition prone? The buildouts take 2minutes each.

Here's the current PoC Jenkinsfile I've been twiddling with:

#!groovy
def layers
def layers_robot

pipeline {
  agent any
  stages {
    stage('Buildout') {
      steps {
        deleteDir()
        withPythonEnv('Python2.7') {
          sh """
          git clone --branch ${branch} --depth 1 https://github.com/plone/buildout.coredev.git
          cd buildout.coredev
          pip install -r requirements.txt
          buildout buildout:git-clone-depth=1 -c core.cfg
          export ROBOTSUITE_PREFIX='ROBOT'
          bin/test --all --list-tests -t 'ROBOT' 2>/dev/null | grep -E '^Listing' | awk '{print \$2}' | sort -u | tee layers-robot.txt
          bin/test --all --list-tests -t '!ROBOT' 2>/dev/null | grep -E '^Listing' | awk '{print \$2}' | sort -u | tee layers.txt
          """
        }
        script {
          layers_robot = readFile("${env.WORKSPACE}/buildout.coredev/layers-robot.txt").trim().split('\n')
          layers = readFile("${env.WORKSPACE}/buildout.coredev/layers.txt").trim().split('\n')
        }
      }
    }
    stage('Dispatch') {
      steps {
        script {
          def all_layers = layers_robot + layers
          def tests = [:]
          for (int i = 0; i < all_layers.length -1; i++) {
            def layername = "${all_layers[i]}"
            tests["${i}"] = {
              node {
                stage("Test ${layername}") {
                  deleteDir()
                  withPythonEnv('Python2.7') {
                    sh """
                    git clone --branch ${branch} --depth 1 https://github.com/plone/buildout.coredev.git
                    cd buildout.coredev
                    pip install -r requirements.txt
                    buildout buildout:git-clone-depth=1 -c core.cfg
                    export ROBOT_BROWSER='chrome'
                    xvfb-run -a --server-args='-screen 0 1920x1200x24' bin/test --all --xml --layer '${layername}'
                    """
                  }
                }
              }
            }
          }
          parallel tests
        }
      }
    }
  }
}

And there is a new experimental pipeline I made manually: https://jenkins.plone.org/view/Experimental/job/5.2-pipeline/

The pipeline name cannot contain spaces or other shell nasties as otherwise the Python management / ShiningPanda integration breaks and cannot activate the virtualenv.

Doing git in the shell scripts is a LOT faster than using the SCM plugin. It just randomly stalls for 2 minutes for no good reason, even for the thin fetch of the Jenkinsfile I cannot avoid doing via the declarative through the web editor. The SCM plugin also does not play nice with pipeline scripting in general.

@gforcada
Copy link
Sponsor Member

That's what we have right now, should I add this abi-tag-eggs = true ?

[buildout]
eggs-directory = /home/jenkins/.buildout/eggs
download-directory = /home/jenkins/.buildout/downloads
extends-cache = /home/jenkins/.buildout/extends
download-cache = /home/jenkins/.buildout/cache

@gforcada
Copy link
Sponsor Member

I already added it to all nodes.

@tisto
Copy link
Sponsor Member Author

tisto commented Oct 19, 2018

@Rotonen @gforcada FYI: I have a coredev pipeline up and running:

https://github.com/plone/buildout.coredev/blob/5.2/Jenkinsfile

Copying python envs is a major problem and it seems neither virtualenv nor pipenv have solved that in any way. I tried different approaches and ended up with creating a tgz and passing that around as a build. Don't try to stash the entire buildout dir, that will take ages. You might have to re-run buildout to be 100% sure it still works (the more you run builds in parallel the more likely it is you break things). Though, this is faster then checking out again. A second checkout also has the problem that you might end up checking out a different version than a previous build step.

I'd consider the workflow job more or less ready. We just need to replace the old jobs with the new one...

@Rotonen
Copy link
Contributor

Rotonen commented Oct 19, 2018

@tisto the issue with how the stash functionality is slow is mostly the massive CPU congestion on uncompressing the stashes in bursts. We'd need some other mechanism to pass artefacts around, as the Jenkins docs suggest - then we could just pass around an uncompressed tarball of the whole buildout as well to avoid hitting git via mr.developer on the subsequent builds. Just rebuilding parts/ bin/ etc. is cheap enough with an egg cache.

@gforcada yes, the ABI tags are important for having multiple Python versions build stuff with C extensions into the same cache.

As we already apparently cache the eggs, in this case, it's apparently mostly the source checkouts taking time.

Hit build on my job and see how it ticks currently.

I'll try what happens when I just stash src/ sometime on Saturday.

@tisto
Copy link
Sponsor Member Author

tisto commented Oct 20, 2018

@Rotonen I don't get your point. What other mechanism to pass around artifacts do you have in mind? What Jenkins docs are you referring to? The problem of re-running buildout is not speed but that you might end up with a different build. As said, I'd consider that to be a CI anti-pattern.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 20, 2018

https://jenkins.io/doc/pipeline/steps/workflow-basic-steps/#stash-stash-some-files-to-be-used-later-in-the-build

Note that the stash and unstash steps are designed for use with small files. For large data transfers, use the External Workspace Manager plugin, or use an external repository manager such as Nexus or Artifactory. This is because stashed files are archived in a compressed TAR, and with large files this demands considerable on-master resources, particularly CPU time. There's not a hard stash size limit, but between 5-100 MB you should probably consider alternatives.

In benchmarking stashing has been 3 .. 5 times slower for me than a cold buildout.

And preventing ending up with a different build is exactly why I'd like to stash the source checkouts - if one splits the checkouts, and the references used by mr.developer are floating references like master, we have no hard guarantees all of the subtasks of the build run with the same code.

@gforcada
Copy link
Sponsor Member

To have a reproducible build one could do:

  • clone buildout.coredev
  • check out the code
  • grab the git commit hash for each repo on src
  • rewrite sources.cfg with rev=hash for each package on src
  • pass around the sources.cfg and the git commit hash of buildout.coredev

Do I miss anything here? If versions.cfg has all and every pin, pinning sources.cfg with the current checkouts.cfg is enough to get a reproducible build.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 20, 2018

Unless the branches on the remotes have gone away in the meanwhile, yes.

src/ would be safe to archive and pass around, though.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 22, 2018

After having slept on it, it'd be better if the source checkouts were pinned to commit hashes by whomever pins them, but I guess that's a hard change to incur. Can mr.roboto help us do that? I'm not very fond of patching files on the fly.

@gforcada
Copy link
Sponsor Member

@Rotonen what do you want mr.roboto to do there? 🤔 to update sources.cfg and change the branch of the package to the current revision in use? 🤔

That would, at least, break mr.roboto own functionality to know if a merge requests targets 4.3/5.0/5.1/5.2 as it relies on the target branch of the pull request to be matched on any sources.cfg of buildout.coredev in any of those branches.

@tisto
Copy link
Sponsor Member Author

tisto commented Oct 26, 2018

@Rotonen thanks for the pointer. This paragraph did not exist when I looked around for solutions some time ago. Stashing still kind of works for us though. I'd highly recommend to not try to be too smart when passing around the build artifacts. When it comes to CI, stability, traceability and simplicity trumps speed. Especially when the build step is significantly smaller/faster than the rest of the steps.

I'd love to see us finding a good solution for a reproducible buildout run. Though, it needs to be simple and stable. Stashing /src seems like a sensible option if we can prevent buildout/mr.developer from overriding things in the buildout run. Messing around with the checkout versions seems error-prone and too complex to me. Though, I guess we need to further investigate the different options we have...

@Rotonen
Copy link
Contributor

Rotonen commented Oct 26, 2018

@tisto we can just pass in any variables to buildout on the command line.

buildout buildout:always-checkout=false -c core.cfg install test

This will make mr.developer skip git interactions for anything already present on the filesystem at that point.

This is what I've run my most recent pipeline tests with:

#!groovy
def layers

pipeline {
  agent any
  stages {
    stage('Buildout') {
      steps {
        deleteDir()
        withPythonEnv('Python2.7') {
          sh """
          git clone --branch ${branch} --depth 1 https://github.com/plone/buildout.coredev.git
          (
          cd buildout.coredev
          pip install -r requirements.txt
          buildout buildout:git-clone-depth=1 -c core.cfg install test
          bin/test --all --list-tests 2>/dev/null | grep -E '^Listing' | awk '{print \$2}' | sort -u | tee layers.txt
          )
          tar cf src.tar buildout.coredev/src
          """
          stash includes: 'src.tar', name: 'src.tar'
        }
        script {
          layers = readFile("${env.WORKSPACE}/buildout.coredev/layers.txt").trim().split('\n')
        }
      }
    }
    stage('Dispatch') {
      steps {
        script {
          def tests = [:]
          for (int i = 0; i < layers.length -1; i++) {
            def layername = "${layers[i]}"
            tests["${i}"] = {
              node {
                stage("Test ${layername}") {
                  deleteDir()
                  unstash 'src.tar'
                  withPythonEnv('Python2.7') {
                    sh """
                    git clone --branch ${branch} --depth 1 https://github.com/plone/buildout.coredev.git
                    tar xf src.tar
                    cd buildout.coredev
                    pip install -r requirements.txt
                    buildout buildout:always-checkout=false -c core.cfg install test
                    export ROBOT_BROWSER='chrome'
                    xvfb-run -a --server-args='-screen 0 1920x1200x24' bin/test --all --xml --layer '${layername}'
                    """
                  }
                }
              }
            }
          }
          parallel tests
        }
      }
    }
  }
}

Wherein ${branch} can be any git refspec.

@gforcada
Copy link
Sponsor Member

@Rotonen nice pipeline, we should stash buildout.coredev as well, or save its commit, as on the parallel tests you are cloning it again and we could easily end up in different commits there. Simple scenario: when you merge two or three (un)related pull requests within a few seconds.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 26, 2018

@gforcada that's what the now-ill-named${branch} is for - that is a build parametre.

@gforcada
Copy link
Sponsor Member

@Rotonen oh cool

@Rotonen
Copy link
Contributor

Rotonen commented Oct 26, 2018

And I'm apparently wrong about how git clone works, but amending that to use git init + git fetch is a triviality in the end. Will sculpt one later.

@Rotonen
Copy link
Contributor

Rotonen commented Oct 26, 2018

For the convenience of trying it out locally to evaluate:

mkdir buildout.coredev
cd buildout.coredev/
git init .
git remote add origin https://github.com/plone/buildout.coredev.git
git fetch --depth 1 origin 8f7d17fa2730feeb669a5b1fd786df369d44c448
git checkout FETCH_HEAD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants