Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Hubs on a Page Clobber Cookies #4512

Open
Mindtoeye opened this issue Jul 14, 2023 · 7 comments
Open

Multiple Hubs on a Page Clobber Cookies #4512

Mindtoeye opened this issue Jul 14, 2023 · 7 comments
Labels

Comments

@Mindtoeye
Copy link

Bug description

We are building an LTI service where instructors can add Jupyter Notebooks as an assignment. They can either add one as a standalone assignment or embed one or more in a page. We're primarily testing with Canvas. All our infrastructure is complete and the single assignment setup works. However, we're seeing an issue when more than one Jupyter hubs are added to a single html page.

Expected behaviour

We are expecting all Jupyter hubs embedded in a page to appear.

Actual behaviour

Depending on the speed at which our server allows the hub to create the single user Jupyter lab containers, the embedded Jupyter lab either appear or not. We noticed that in the Chrome debug tools there are 2 cookies in the case we add just 1 hub to a page. We see a cookie with the name jupyterhub-session-id for the FQDN and we see an additional one for the FQDN with the portnumber. The hub URL we generate always, and which is the only one we use as the iframe src, has a unique port number for that hub. When we embed more than one hub we always see cookies for the FQDN and a separate set for the hub(s) with the port. That is what made us suspect cookie pollution, where multiple hubs are using the same cookie content.

How to reproduce

Create an HTML page where you iframe more than 2 Jupyter hubs, then load that page.

Your personal set up

We're using whatever latest version of the Jupyterhub Docker image currently is.
The Notebook configuration can be any of:

jupyter/base-notebook
jupyter/minimal-notebook
jupyter/r-notebook
jupyter/scipy-notebook
jupyter/tensorflow-notebook
jupyter/jupyter/datascience-notebook
jupyter/pyspark-notebook
jupyter/all-spark-notebook

  • OS:
    Debian 11, 64 RAM

  • Version(s):
    Jupyterhub: 4.0.1 (Docker image)

Full environment

Not sure how to do this since we work in an exclusively containerized environment

Configuration

Spawner: DockerSpawner
Config: see, below

# jupyterhub_config.py

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
# 
# In general:
# 
#   Seen: "Common causes of this timeout, and debugging tips:"
#   From source code: 

#   Common causes of this timeout, and debugging tips:
#   1. Everything is working, but it took too long.
#      To fix: increase `Spawner.start_timeout` configuration
#      to a number of seconds that is enough for spawners to finish starting.
#   2. The server didn't finish starting,
#      or it crashed due to a configuration issue.
#      Check the single-user server's logs for hints at what needs fixing.

# Configuration file for JupyterHub
import os

c = get_config()  # noqa: F821

pref = "eberly-jupyter"
tag = os.environ["DOCKER_HUB_TAG"]
hub = pref + "-" + tag + '-{username}'

# We rely on environment variables to configure JupyterHub so that we
# avoid having to rebuild the JupyterHub container every time we change a
# configuration parameter.

# Spawn single-user servers as Docker containers
# https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/spawners-basics.html
c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"

# Spawn containers from this image, provided by the instructor or the default
c.DockerSpawner.image = os.environ["DOCKER_NOTEBOOK_IMAGE"]

# https://discourse.jupyter.org/t/multiple-jupyterhub-servers-per-user-named-servers-does-not-work/2357/4
c.DockerSpawner.name_template = hub

# JupyterHub requires a single-user instance of the Notebook server, so we
# default to using the `start-singleuser.sh` script included in the
# jupyter/docker-stacks *-notebook images as the Docker run command when
# spawning containers.  Optionally, you can override the Docker run command
# using the DOCKER_SPAWN_CMD environment variable.
spawn_cmd = "start-singleuser.sh"
c.DockerSpawner.cmd = spawn_cmd

# Enable the usage of the internal docker ip. This is useful if you are running jupyterhub (as a container) and the user 
# containers within the same docker network. E.g. by mounting the docker socket of the host into the jupyterhub container. 
# Default is True if using a docker network, False if bridge or host networking is used.
c.DockerSpawner.use_internal_ip = True

# Connect containers to this Docker network, Run the containers on this docker network. If it is an 
# internal docker network, the Hub should be on the same network, as internal docker IP addresses 
# will be used. For bridge networking, external ports will be bound.
c.DockerSpawner.network_name = os.environ["DOCKER_NETWORK_NAME"]

# Specify a timeout for starting the image. Let's give it some more time just in case
# Timeout (in seconds) before giving up on starting of single-user server. Default is 60
# From the docs:
# This is the timeout for start to return, not the timeout for the server to respond. Callers of spawner.start
# will assume that startup has failed if it takes longer than this. start should return when the server process is
# started and its location is known. 
c.DockerSpawner.start_timeout = 120

# Let's give it some more time just in case

c.DockerSpawner.http_timeout = 120

# Explicitly set notebook directory because we'll be mounting a volume to it.
# Most `jupyter/docker-stacks` *-notebook images run the Notebook server as
# user `jovyan`, and set the notebook directory to `/home/jovyan/work`.
# We follow the same convention.
notebook_dir = "/home/jovyan/work"

# the user jovyan is part of the docker image you are using. You could develop your own 
# docker image according to your liking
c.DockerSpawner.notebook_dir = notebook_dir

# Mount the real user's Docker volume on the host to the notebook user's
# notebook directory in the container. Volume mapping for DockerSpawner in jupyterhub_config.py 
# is required configuration for persistence. To map volumes from the host file/directory to the 
# container (referred to as guest) file/directory mount point, set the c.DockerSpawner.volumes 
# to specify the guest mount point (bind) for the volume.
# https://jupyterhub-dockerspawner.readthedocs.io/en/latest/data-persistence.html
# https://discourse.jupyter.org/t/dockerspawner-and-volumes-from-host/7008/5

HOST_NOTEBOOK_PATH = os.environ["HOST_HOME_PATH"] + "/{username}"

# https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html

c.DockerSpawner.volumes = {
  HOST_NOTEBOOK_PATH : "/home/jovyan"
}

# https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/spawners-basics.html
# For debugging arguments passed to spawned containers, equivalent to above?
c.DockerSpawner.debug = True

# Remove containers once they are stopped. We might temporarily disable to to
# be able to inspect logs
c.DockerSpawner.remove = True

# User containers will access hub by container name on the Docker network
#c.JupyterHub.hub_ip = "jupyterhub"
c.JupyterHub.hub_ip = "0.0.0.0"

# Persist hub data on volume mounted inside container
c.JupyterHub.cookie_secret_file = "/data/jupyterhub_cookie_secret"
c.JupyterHub.db_url = "sqlite:////data/jupyterhub.sqlite"

# Probably mostly for debugging. If you want to use the API to start servers you will need this
# https://jupyterhub.readthedocs.io/en/stable/tutorial/server-api.html
c.JupyterHub.allow_named_servers = True

c.JupyterHub.services = [
    {
        "name": "admin",
        "api_token": os.environ["JUPYTER_TOKEN"],
        "admin": True
    }
]

c.JupyterHub.authenticator_class = 'jwtauthenticator.jwtauthenticator.JSONWebTokenAuthenticator'

# (Don't) Allow anyone to sign-up without approval
c.NativeAuthenticator.open_signup = False

## Set of users that will have admin rights on this JupyterHub.
#
#  Admin users have extra privileges:
#   - Use the admin panel to see list of users logged in
#   - Add / remove users in some authenticators
#   - Restart / halt the hub
#   - Start / stop users' single-user servers
#   - Can access each individual users' single-user server (if configured)
#
#  Admin access should be treated the same way root access is.
#
#  Defaults to an empty set, in which case no user has admin access.

c.Authenticator.admin_users = ['admin']
c.JupyterHub.admin_access = True

# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/frame-ancestors
c.JupyterHub.tornado_settings = {
    'headers': {
        'Content-Security-Policy': "frame-src * script-src *;",
        'Access-Control-Allow-Origin': '*'
    }
}

c.NotebookApp.tornado_settings  = {
    'headers': {
        'Content-Security-Policy': "frame-src * script-src *;",
        'Access-Control-Allow-Origin': '*'
    }
}

# https://jupyter-notebook.readthedocs.io/en/stable/config.html

#c.NotebookApp.open_browser = False
c.NotebookApp.allow_password_change = False
# If True, display a button in the dashboard to quit (shutdown the notebook server).
c.NotebookApp.quit_button = False
# Whether the banner is displayed on the page. By default, the banner is displayed.
c.NotebookApp.show_banner = False
# You can set c.NotebookApp.terminals_enabled = False to disable terminals in the UI, but this doesn’t 
# mitigate any risks. There’s nothing that can be done in a terminal that cannot be done from a regular 
# notebook, including start more terminals. Terminals may also be automatically disabled if the terminado 
# package is not available.
c.NotebookApp.terminals_enabled = False
Logs

In this particular case it's difficult to provide logs since the failure is mostly silent. Anything we see manifests itself as things like HTTP 431 errors or HTTP 500 errors (without noticeable errors in the Docker logs)

@Mindtoeye Mindtoeye added the bug label Jul 14, 2023
@welcome
Copy link

welcome bot commented Jul 14, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@minrk
Copy link
Member

minrk commented Sep 12, 2023

Sorry for the delay, I was on vacation in July and didn't get through the backlog when I came back.

Since cookies in general are per-hostname and not per-port, collisions are likely if you have many Hubs on one hostname only differentiated by port. I'm not sure if this is avoidable. The jupyterhub-session-id cookie being shared shouldn't be an issue, but the jupyterhub-hub-login cookie won't work if it's shared, and I imagine it would be if the port is the only difference.

JupyterHub should be setting its cookies on the correct paths, so if your Hubs were differentiated by JupyterHub.base_url, you might have better success. It would be even better to use different hostnames if that's at all available to you, since cross-site protections and credential management by browsers are much better in that situation.

@Mindtoeye
Copy link
Author

I'll look into this. Unfortunately we don't have a choice. We run one JuypterHub per a student's assignment and for most courses will have to run about 50 assignments. We simply can't afford 50 VMs to run this on.

Our current workaround is to create a unique username per assignment (hub) for a user. That appears to be working but we don't have enough data yet to guarantee this to be a good final solution.

@minrk
Copy link
Member

minrk commented Sep 21, 2023

We simply can't afford 50 VMs to run this on.

Neither suggestion requires more than one VM. You can use multiple base_url prefixes without any other changes (hostname is unchanged, urls only change from https://host/hub to https://host/prefix1/hub), or even multiple public hostnames for one host for better isolation. Wildcard DNS can be useful for this, e.g. CNAME *.hub.domain.tld -> hub.domain.tld so anything.hub.domain.tld would resolve to the same host as hub.domain.tld.

@Mindtoeye
Copy link
Author

I'm going to try the base_url prefix approach. All hubs are dynamically generated by our system when an instructor creates a new assignment, so modifying DNS would be tricky unless we have a lot of them pre-assigned. We simply get too many requests to make that practical.

@minrk
Copy link
Member

minrk commented Sep 21, 2023

That makes sense. Lots of requests is where the wildcard DNS becomes useful, since you'd only need to do that once, and add a single route on the server when a new Hub is added. But if you don't have wildcard DNS and easy SSL certificates, it would definitely be a big pain.

@Mindtoeye
Copy link
Author

Thanks for your help by the way! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants