Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn callback #2727

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Spawn callback #2727

wants to merge 2 commits into from

Conversation

rkdarst
Copy link
Contributor

@rkdarst rkdarst commented Sep 9, 2019

At the JupyterHub/BinderHub workshop, @minrk and I discussed adding the spawn callback from batchspawner to JupyterHub itself. Quote from batchspawner issue:

I talked to @minrk and he thought that it would be reasonable to put the the callback handler into the main hub and expect every spawner to hit it once (even if they don't send any useful information back). He thought that there would be other benefits, too: right now, after the spawner returns the URL, the hub polls it until the single-user HTTP server is reachable. This would avoid that polling and allow faster response when starting. Then, spawners can return the URL (or port... but whole URL will be the most general) if it wasn't known at the time of Spawner.start() returning. The payload from the callback can be sent to a spawner hook to process it and do what is needed.

This is a non-working pull request that just has the template of the code that Min showed me, in case anyone else wants to take a look before I get to it (feel free!). It hasn't even been tested and I haven't tried to make the pieces actually work yet.

The issues we know of so far (may be edited):

  • Some spawners don't know their own address (but this is still useful - reduces polling for startup).
  • The spawner has to be able to tell the single-user server what information should be sent back. Spawners are not necessarily installed in single-user environments and we probably don't want to require them to be all the time.
  • Should the spawner be able to indicate that the hub should/shouldn't wait for the
  • What should the default handler do?

Copied here from jupyterhub/batchspawner#146

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks for opening this!

I think the only trick right now is to figure out how and if Spawners customize what goes in the payload. Notebook server extensions are one option, but that becomes tricky very fast. If we can support a baseline of information (bind url, hostname, pid), hopefully most cases can be covered. I see in BatchSpawner that only port is passed along. I would think that hostname would also be useful, perhaps replacing state_gethost?

I'll poke around here to see if we can come up with something.

self.hub.api_url,
'users',
# tolerate mocks defining only user.name
getattr(self.user, 'escaped_name', self.user.name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for getattr check here, since this is in the same codebase that defines escaped_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code right above (JUPYTERHUB_ACTIVITY_URL) has this too with the same message (mocks defining only user.name)... if that needs it then this would, too. What should be done?

async def notify():
self.log.debug("Notifying Hub of readyness via spawn_callback")
req = HTTPRequest(
url=self.hub_activity_url,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawn_callback_url?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And only call it if spawn_callback_url is set, which effectively implements single-user -> hub backward compatiblity (not the other way around)

jupyterhub/singleuser.py Show resolved Hide resolved
body=json.dumps({}),
)
try:
await client.post(req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use client.fetch here and method='POST' above in the Request

@rkdarst
Copy link
Contributor Author

rkdarst commented Sep 24, 2019

I think the only trick right now is to figure out how and if Spawners customize what goes in the payload. Notebook server extensions are one option, but that becomes tricky very fast. If we can support a baseline of information (bind url, hostname, pid), hopefully most cases can be covered. I see in BatchSpawner that only port is passed along. I would think that hostname would also be useful, perhaps replacing state_gethost?

That's roughly what I was thinking, then it's up to the spawner handler to decide which is actually useful, how it should actually be interpreted, and if any is . Then it's a balance between adding extra things to the jupyterhub singleuser-server or having spawners add extensions. But we can deal with this later. Things I can think of to collect: port, hostname, raw IP(s), pid. Maybe the contents of a JUPYTERHUB_SINGLEUSER_BIND_URL environment variable... that would allow spawner start scripts to send arbitrary data without making full-scale extensions.

Spawner authors, what would be useful?

@rkdarst
Copy link
Contributor Author

rkdarst commented Sep 24, 2019

I'm getting this problem in tests:

[W 53:32.933 MockHub web:1782] 400 POST /@/space%20word/hub/api/users/admin/servers/spawn_callback (127.0.0.1): Named servers are not enabled.
[W 53:32.934 MockHub log:174] 400 POST /@/space%20word/hub/api/users/admin/servers/spawn_callback (admin@127.0.0.1) 9.43ms

so it thinks that /spawn_callback is the server name. Handlers are defined this way:

    (r"/api/users/([^/]+)/servers/([^/]*)", UserServerAPIHandler),
    (r"/api/users/([^/]+)/servers/([^/]*)/progress", SpawnProgressAPIHandler),
    (r"/api/users/([^/]+)/servers/([^/]*)/spawn_callback", SpawnCallbackAPIHandler),

If progress works, then spawn_callback should as far as I could tell. Even reordering the handlers so that the spawn_callback one is first doesn't seem to help. I've banged on this long enough I thought would ask for other eyes. My first thought is something trivial like a small misspelling, but I can't find it. Also ,the progress URL seems to be called with only one slash successfully...

On a side note... if there's a named server called "progress" or "spawn_callback", does stuff start to break?

@Hoeze
Copy link

Hoeze commented Mar 29, 2020

@rkdarst Is this functional?

@rkdarst
Copy link
Contributor Author

rkdarst commented Mar 29, 2020 via email

@Hoeze
Copy link

Hoeze commented Apr 5, 2020

Hey, I tried to get this running but the only thing I accomplished until now is to have tests passing with the current Jupyterhub master:
https://github.com/Hoeze/jupyterhub

  • How do I get a good testing setup? At the moment I just run:
export JUPYTERHUB_API_URL=http://127.0.0.1:8081/hub/api
export JUPYTERHUB_CLIENT_ID=jupyterhub-user-<username>-asdf
export JUPYTERHUB_OAUTH_CALLBACK_URL=/user/<username>/asdf/oauth_callback
export JUPYTERHUB_SERVER_NAME=asdf
export JUPYTERHUB_SERVICE_PREFIX=/user/<username>/asdf/
export JUPYTERHUB_USER=<username>
jupyterhub-singleuser --port 8234
  • What is missing to get this working?

@manics
Copy link
Member

manics commented May 21, 2022

I've marked this as draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants