Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking the start-up sequence #1055

Open
rkdarst opened this issue Mar 29, 2020 · 2 comments
Open

Rethinking the start-up sequence #1055

rkdarst opened this issue Mar 29, 2020 · 2 comments
Labels
type:Enhancement A proposed enhancement to the docker images

Comments

@rkdarst
Copy link
Contributor

rkdarst commented Mar 29, 2020

For background: I run a kubernetes-based JH cluster for my university (and a batchspawner based one, too), and have often had to struggle against the way these images are set up. That doesn't mean something is wrong, I just have tended to do lots of setup to integrate things. Combined with difficulties of incremental upgrades, I was wondering if I should try another strategy for some docker images. Maybe it's good to discuss here.

Summary: what about moving to a multi-phase setup (#787, but split out the different parts into multiple scripts, instead of the script re-executing itself), and the start scripts become very minimal: most things are done in hooks, instead. There are root hooks and user hooks, so the root part doesn't have to duplicate the user part. We try to make use of OS-level configuration, instead of doing it ourselves (e.g. NB_UMASK is currently set in jupyter_config.py - this should be a user hook).

Note that I'm not a docker expert - I have more background in sysadmining, but I've maintained our docker images for years now.

initial questions

  • Right now, we have to re-implement some config when starting as root. e.g. NB_UID - in a "normal" docker image, this wolud just be specified as
  • Do we support starting with other custom commands, besides jupyter lab, without knowing how to do our other special setup? Seems not too hard to do, with the type of setup we have now - passing arguments to run through, default to starting jupyter.
  • Do we integrate with the OS enough that one can start the container as root, and su/sudo to the user, and have the environment working as expected? Wouldn't be impossible with .bash_profile, but we have to make sure that when we run a command, it creates a login shell.
  • The issue with [WIP] Have start.sh run through both user and root parts #787 is no tests... if we do anything big, we are going to have some regressions or behavior changes. I think we will have to accept these will come and we'll fix or document.

Example

I haven't really thought about this that much

  • init.sh - are we root or not? root=root-setup.sh $@, otherwise=user-setup.sh $@
  • root-setup.sh - read root hooks, then user-setup.sh $@
  • user-setup.sh - read user hooks, then jupyter.sh $@
  • If it's mostly hooks, all three of the above files can be combined into one. (it's probably worth combining to make it easer to examine if it's short)
  • jupyter.sh - do the option parsing, notebook/lab selection, etc. start jupyter. Right now this happens before the other starting wrapper, spread out across several files.

We can configure user environment in different places:

  • /etc/profile.d if we start via a login shell
  • user hooks
  • Activating conda: source /opt/conda/bin activate in profile.d or a user hook.
  • umask: user hook or profile.d
    I'm not sure if this "integrating with the OS" is a good idea - it's certainly not dockerish and may add in other subtle bus, but then again we are beyond normal docker use...

If any of the above sounds interesting, it's the kind of thing I can quickly hack on, but it would be good to have advice on the docker side.

@dirkcgrunwald
Copy link
Contributor

I agree with needing a simpler model -- as part of switching to the jupyterlab 2.x version of the docker stacks, I changed my Z2JH config to switch from starting up using

    extraConfig:
      modifystart: |
       c.KubeSpawner.cmd = ['/usr/local/bin/start.sh', 'jupyter-labhub']

to

  extraConfig:
    modifystart: |
      c.Spawner.default_url = '/lab'

With the former startup, my scripts in /usr/local/bin/start-notebook.d would execute and do the needful (e.g. ulimit core 0) but jupyterlab 2.x wouldn't start because jupyter-labhub is gone. With the later, jupyterlab starts but no love on the scripts. Walking through why this occurs is currently complex (given my limited time to manage our cluster).

Making the startup sequence more understandable would be nice.

@parente parente added the type:Enhancement A proposed enhancement to the docker images label Nov 29, 2020
@mathbunnyru
Copy link
Member

We made a few changes to our startup sequence:

  • run-hooks.sh is a separate script now, more robust and tested
  • start-notebook.py and start-singleuser.py are rewritten in Python, with comments, and they are simple
  • start.sh is an entrypoint now (so it should at least resolve @dirkcgrunwald issue)
  • conda environment activation is also more uniform and robust

So, please, give it a try 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Enhancement A proposed enhancement to the docker images
Projects
None yet
Development

No branches or pull requests

4 participants