Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify port number in JUPYTERHUB_SERVICE_URL #4310

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

twalcari
Copy link
Contributor

When implementing a custom spawner, a sneaky bug can appear when Spawner.port is not implemented: when calling def get_env() the JUPYTERHUB_SERVICE_URL is generated with port number 0. While this was ignored by nbviewer in the past, it now results in difficult to diagnose behavior where the URL on which the Jupyter server starts defaults to port 80 ( See jupyter/docker-stacks#1862).

I think it is preferable that the Spawner crashes instead.

Instructions to solve the issue: either define a static port in your Spawner:

    @default("port")
    def _default_port(self):
        return 8888

or initialize the port to a random port number, as happens in LocalProcessSpawner.start: https://github.com/twalcari/jupyterhub/blob/cbce162e7cfa7da7997df6a6c2c37248d06ac527/jupyterhub/spawner.py#L1662-L1663

@@ -56,6 +56,7 @@ def new_spawner(db, **kwargs):
kwargs.setdefault('term_timeout', 1)
kwargs.setdefault('kill_timeout', 1)
kwargs.setdefault('poll_interval', 1)
kwargs.setdefault('port', 5555)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since starting a new spawner will generally not be passed a port, we shouldn't need a port here. If this is required for tests to pass, we may need to think about what's changing in a breaking way.

@@ -123,6 +123,8 @@ def user_env(self, env):

def start(self):
"""Start the process"""
if self.port == 0:
self.port = random_port()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

services already have this logic, because it's inherited from LocalProcessSpawner here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You override the start() function in this class, and don't call super.start() anywhere.

Hence this is why the tests started failing, and I copied the logic into this class: https://github.com/jupyterhub/jupyterhub/actions/runs/4005957446/jobs/6876931281#step:10:421

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgot about the lack of super start. Then this change is definitely correct. Thanks!

@@ -994,6 +994,9 @@ def get_env(self):
# this should only occur in mock/testing scenarios
base_url = '/'

if self.port < 1 or self.port > 65535:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change that takes away the option of letting the launched process pick the port, as is done in e.g. batchspawner, which phones home the resulting port after launch - a pattern I think we want more spawners to be able to do, since it will select ports more reliably than a random number on the Hub's system which is typically not the same port namespace.

Ideally, this could be communicated better and fail when not supported, which is most of the time.

How about this:

  • validate 0 <= port <= 65535
  • if port is 0, log that the launched process will be picking the port, and that the Spawner better be able to handle that.

Or, we could have a small breaking change that defines an overrideable method validate_port that can be overridden to allow for port 0 (the default would be your check). Maybe even with a boolean attribute on the spawner class to indicate whether port 0 should be allowed.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that I raise an error instead of logging an error is that in the case of port == 0 an invalid JUPYTERHUB_SERVICE_URL will be generated on the next lines. Invalid in the sense that jupyter/docker-stacks is tripping over it and starting the server on port 80 instead of the default port 8888 or even an explicitely set JUPYTER_PORT.

Maybe part of the solution is to add a check in nbviewer to make sure that a valid port is used? I can create a pull request for this commit: twalcari/nbviewer@ef2ac83

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't produce an invalid string, though. It produces a string instructing the server to bind on port 0, which is asking the server to pick its own port. To work properly with JupyterHub, this must be combined with the logic required to pass the port info back to the Hub after starting, as is done in BatchSpawner.

For services, which don't allow custom spawners, the 0->random port above will fix this. The problem with the validation here that it affects all Spawners, prohibiting a valid use case, when it should only be affecting the service case, and thus more appropriately confined to the Service class.

Maybe part of the solution is to add a check in nbviewer to make sure that a valid port is used?

Yeah, I think that's a good idea.

@minrk
Copy link
Member

minrk commented Jan 26, 2023

Do you have your configuration for the nbviewer service? It should have a url field in it, which should ultimately populate this field.

I think the bug might really be that $JUPYTERHUB_SERVICE_URL is defined at all when nbviewer is registered as a service without a URL. It shouldn't be possible to specify that a service has a web endpoint without specifying the port where it will run. Unlike a regular Spawner, there are services that don't have URLs (this is the default if a URL is not specified). An else: branch here that explicitly sets env['JUPYTERHUB_SERVICE_URL'] = '' if self.url is not set would be an indicator that it's launching a service it doesn't expect to have a URL. That should also make it impossible for this event to occur, I think.

@twalcari
Copy link
Contributor Author

Some context: I maintain a custom spawner for our JupyterHub service which works on top of 'GPULab' at Ghent University. We run the jupyter/docker-stacks Docker images as they perfectly fit our needs.

Our users all use JupyterLab. nbviewer is not used. I do observe that changing the port of the JUPYTERHUB_SERVICE_URL to 0 causes JupyterLab to launch on port 80. When another port is set (for example with a JUPYTERHUB_SERVICE_URL like http://127.0.0.1:5555/user/twalcari@ilabt.imec.be/) then JupyterHub will launch on port 5555 instead.

I honestly don't fully grasp what this environment variable does in the context of JupyterLab and the jupyter/dockerstacks Docker images. Why is it influencing the port on which JupyterLab launches on in the first place?

Steps to reproduce:

  1. Launch a webserver which answers to all requests (I use the example code from this gist)

  2. Run docker run --rm -p 8888:8888 --user 1000 --env-file env-test jupyter/minimal-notebook with the content of the env-test file being:

JPY_API_TOKEN=0c1f95e5d1384faeaa9b945bf21b8e3b
JUPYTERHUB_ACTIVITY_URL=http://172.24.124.101:19998/hub/api/users/twalcari@ilabt.imec.be/activity
JUPYTERHUB_API_TOKEN=0c1f95e5d1384faeaa9b945bf21b8e3b
JUPYTERHUB_API_URL=http://172.24.124.101:19998/hub/api
JUPYTERHUB_BASE_URL=/
JUPYTERHUB_CLIENT_ID=jupyterhub-user-twalcari%40ilabt.imec.be
JUPYTERHUB_OAUTH_ACCESS_SCOPES=["access:servers!server=twalcari@ilabt.imec.be/", "access:servers!user=twalcari@ilabt.imec.be"]
JUPYTERHUB_OAUTH_CALLBACK_URL=/user/twalcari@ilabt.imec.be/oauth_callback
JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES=[]
JUPYTERHUB_OAUTH_SCOPES=["access:servers!server=twalcari@ilabt.imec.be/", "access:servers!user=twalcari@ilabt.imec.be"]
JUPYTERHUB_SERVICE_PREFIX=/user/twalcari@ilabt.imec.be/
JUPYTERHUB_SERVICE_URL=http://127.0.0.1:0/user/twalcari@ilabt.imec.be/
#JUPYTERHUB_SERVICE_URL=http://127.0.0.1:5555/user/twalcari@ilabt.imec.be/
JUPYTERHUB_USER=twalcari@ilabt.imec.be

@minrk
Copy link
Member

minrk commented Jan 26, 2023

If you're launching a singleuser server, the $JUPYTERHUB_SERVICE_URL->port is handled here and there is a bug in that if url.port will behave the same for http://host/path and http://host:0/path which is not correct - port 0 should mean bind to port 0 and let the OS assign a port.

I do think it may be time for a small backward-incompatible change:

  1. add an attribute on Spawner, e.g. Spawner.supports_server_assigned_port
  2. make it opt-in (False on the base Spawner class)
  3. reject port 0 if not self.supports_server_assigned_port

This will result in port 0 failing informatively and by default, as you're doing here, but still allowing for the 'user server decides' pattern to work going forward.

@twalcari
Copy link
Contributor Author

twalcari commented Jan 26, 2023

deleted as this comment crossed with the reply of minrk above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants