Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter(Hub) conceptual intro #2726

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rkdarst
Copy link
Contributor

@rkdarst rkdarst commented Sep 9, 2019

At the JupyterHub/BinderHub workshop, one of our ideas was to make a conceptual intro to JupyterHub so that people could know what it does, and in particular what it doesn't do (what is handled by other components). We get many issues that end up misdirected or that have a root cause of not understanding what the components are.

This PR is my initial draft - comments welcome. I've written it informally and with a certain opinion - it's supposed to be like a teaching giving a first lesson, and not technical reference. It goes far beyond just JupyterHub, but the reason for having it here is that you really need to start knowing this stuff once you start administrating a JupyterHub, before that you can sort of get by without a perfect mental model.

Issues I know of:

  • What is a proper title?
  • Where should it go under the table of contents?
  • I used the very verbose "single-user notebook server", I'd like a shorter term that still avoids ambiguity.
  • All parts need fact-checking, because I haven't done that yet
  • After the content is somewhat stable, we need to add links and references.
  • Intro needs improvement and we should make sure it explains the purpose well
  • Conclusion needs improvement and guides to what is next

@consideRatio
Copy link
Member

@rkdarst thank you for putting in this excellent work to give an cohesive overview how things relate! I appreciate how it is written in a readable non-reference like manner, I'm thinking about the clarifications in the early part of the text for example where you clarify the purpose of the text is to give overview etc.

@rkdarst
Copy link
Contributor Author

rkdarst commented Apr 13, 2020

I did a big pass on this, and I think we should working on polishing. It's late and I'm sure everyone will be able to find little improvements - instead of making things perfect, I'll just let everyone read and make their suggestions.

The CI failures seem unrelated, but at least the docs one should be fixed.

@betatim
Copy link
Member

betatim commented Apr 15, 2020

Thanks for coming back to this!

The docs build error is:

Warning, treated as error:
/home/circleci/project/jupyterhub/auth.py:docstring of jupyterhub.auth.Authenticator.admin_users:1:duplicate object description of jupyterhub.auth.Authenticator.admin_users, other instance in api/auth, use :noindex: for one of them
Makefile:64: recipe for target 'html' failed
make: *** [html] Error 2

which I don't fully understand. There is one definition https://github.com/jupyterhub/jupyterhub/blob/master/docs/source/api/auth.rst (the "other instance"). But where is the first definition?

@betatim
Copy link
Member

betatim commented Apr 15, 2020

The weird thing is that on master the build seems to work. Maybe rebase your branch on the latest master ?

@rkdarst
Copy link
Contributor Author

rkdarst commented Apr 15, 2020 via email

@rkdarst
Copy link
Contributor Author

rkdarst commented Apr 15, 2020

https://github.com/jupyterhub/autodoc-traits/blob/master/autodoc_traits/autodoc_traits.py#L28

I added some "print" statements right before this line, and sure enough, admin_users is in both trait_members and members. Same for Authenticator and LocalAuthenticator, but not for PAMAuthenticator or DummyAuthenticator. The first two have :members: in the rst file. Anyone know details of which way is correct?

@willingc ?

rkdarst added a commit to rkdarst/autodoc-traits that referenced this pull request Apr 15, 2020
- When run with the `:members:`, then traitlets traits are duplicated,
  because they are added to the autodoc list from both from
  `:members:` and this autodetection. (I think)
- Discussed in at least
  jupyterhub/jupyterhub#2726.  Currently
  causing JupyterHub docs to fail, because sphinx gives an error if
  there are duplicate autodoced traits and it is run with `-W`.
- This seems to be started in a new version of sphinx, but we aren't
  completly sure.  JH has been using the `-W` option since 2017.
- I'm unsure if this is the right solution, but it works and gets me
  past these errors.
@rkdarst
Copy link
Contributor Author

rkdarst commented Apr 15, 2020

jupyterhub/autodoc-traits#1

After this, I am able to get the build to run further - but then there are other new errors. I guess sphinx really has gotten a lot stricter in more ways than one. Easy fix is removing -W from docs/Makefile...

@betatim
Copy link
Member

betatim commented Apr 16, 2020

Thanks for diving into autodocs! We introduced the -W flag a few years back to help keep the docs in "reasonable shape". At the time there the number of warnings was so big it was hard to tell if a change added more warnings or not. So the weary maintainer in me wants to keep -W to help us keep improving the quality of the docs or at least not adding new debt to them.

However we don't want to block progress with this. Maybe as a compromise we can limit ourselves to sphinx<3 for now? A next step would be to start a (hopefully) small coordinated effort to get the docs building with v3.

Curiously the last build on master still uses Sphinx-2.4.4 and it is green.

@rkdarst
Copy link
Contributor Author

rkdarst commented Apr 16, 2020

Discussion of docs failure moved to #3021.

@betatim
Copy link
Member

betatim commented Apr 18, 2020

Hurrah! All CI robots are happy again. Great work digging into sphinx and autotraits!

@rkdarst
Copy link
Contributor Author

rkdarst commented Jun 1, 2020

Now that CI works, would someone like to take a look at this? It is ready now, I've done multiple passes.

Some hints on what to check is at the top, but roughly a) is the placement within the docs good? Should it entirely be somewhere else?

... and b) fact-checking. I've done lots with Jupyter so I think there's not much wrong, but of course there's just so much that any other eyes will of course help.

## JupyterHub

**JupyterHub** is the central piece that provides multi-user
login. Despite this, the end user only briefly interacts with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... provides multi-user login $thing" The sentence somehow ends abruptly. Could it be "...multi-user login capabilities." or "...functionality."?

[reference](../reference/authenticators)) if the
username/password is valid(&). The authenticator can also return user
groups and admin status of users, so that JupyterHub can do some
higher-level management. The authenticator returns a username(&),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the "The authenticator can also return..." sentence to the end of the paragraph? Lets tell people about the main thing the authenticator does "return a username" and then afterwards tell them about "but wait there is more".

the user's notebook servers. It actually isn't directly between,
because the JupyterHub **proxy** relays connections between the users
and their single-user notebook servers. What this basically means is
that the hub itself can shut down, and if the proxy can continue to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing words in the second half of the sentence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem was an extra "if". Removed.

Copy link
Contributor

@rcthomas rcthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkdarst as promised I took a read through of the docs here. I think this puts a lot of useful information in one place to try to dispel some folks' confusion about what's what. I made a few comments where some things may need to be clarified.

@@ -0,0 +1,465 @@
# What is Jupyter and JupyterHub?

JupyterHub is not what you think it is. Most things you think are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is not" -> "may not be" ?

part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and -> but

In this document, we occasionally leave things out or bend the truth
where it helps in explanation, and give our explanations in terms of
Python even though Jupyter itself is language-neutral. The "(&)"
symbol highlights important points where there is more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"where there is more."

Not sure what there is more of? Did the ending get chopped off or should it be "there is more to it" or something?

Before we get too far, let's remember what our end goal is. A
**Jupyter Notebook** is really nothing more than a Python(&) process
which is getting commands from a web browser and displaying the output
via that browser. What the process actually sees can roughly like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can roughly like" -> "can roughly be thought of like" ?

docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
JupyterHub: when someone wants a notebook server, the spawner allocates
resources and starts the server. The notebook server could run on the
same machine as JupyterHub, on another machine, on some cloud service,
or even more. They can limit resources (CPU, memory) or isolate users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure who "They" refers to, is this the administrator who configured the hub?

docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
opens in a separate tab. It is traditionally started by `jupyter
notebook`.

Does anything need to be said here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't have enough for 2 ###-level sections, maybe just smush them into the single user notebook server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - I don't think we need much depth on the interfaces, maybe beyond mentioning that they'll live at different URL prefixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the sections but not the text, someone else can do that later. Anyway, there is jupyter_server now, which will some sort of updates here, right?


## I want to...

TODO: answers to common cross-layer questions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the foregoing text actually does a good job of laying things out. You might want to omit this section for now to move forward with getting these docs integrated and then add this section later if things come up that aren't handled better any other way.

there are still plenty of details, implementations, and exceptions.
When setting up JupyterHub, the first step is to consider the above
layers, decide the right option for each of them, then begin putting
everything together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cite the JupyterCon talk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean this one? https://www.youtube.com/watch?v=JxyKBNJnfVM
Since it's mine and perhaps old, I'll let someone else decide to add it.

@willingc
Copy link
Contributor

Great to see this PR moving forward again. I will take a more detailed look. One thing that I'm thinking is that we come up with an improved title. "JupyterHub Concepts for New Users", "JupyterHub: A Conceptual Look"

@choldgraf added you to the reviewers as well.

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willingc for pinging me. I took a pass through the document and made several suggestions and comments.

I think that this content will be really helpful for people trying to wrap their heads around JupyterHub (and the broader Jupyter ecosystem). In my opinion we should do a round or two to make sure the content is of "MVP quality" and then get it in the docs, and iterate on it over time. The PR is big enough that I worry it'll get bogged down for a long time if we try to make it perfect. Does that make sense to others?

docs/source/index.rst Outdated Show resolved Hide resolved
@@ -0,0 +1,465 @@
# What is Jupyter and JupyterHub?

JupyterHub is not what you think it is. Most things you think are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should avoid narrative flourishes like this. I think it's good writing and I enjoy it, but I don't think it's the most helpful for newcomers who are learning technical concepts. It may also make things harder to understand for non-native english speakers

e.g., rather than saying "Most things you think are part of JupyterHub are actually handled by some other component", we could say "JupyterHub is designed in a modular fashion, and much of its functionality is handled by pluggable components."


JupyterHub is not what you think it is. Most things you think are
part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link to sections where we discuss these in the docs?

part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the "story of this document" stuff like "hasn't been assembled in one place before"


In this document, we occasionally leave things out or bend the truth
where it helps in explanation, and give our explanations in terms of
Python even though Jupyter itself is language-neutral. The "(&)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the & symbol have a functional use on the page? (e.g., does it create a hyperlink or something like that?) if not, I think we should either:

  1. Turn these into footnotes or in-line links to other sections
  2. Remove the & symbol because I think many readers won't really understand what to do with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this idea form regex(7), which uses (!) to indicate a certain thing that is commonly said but shouldn't be spelled out each time (possibly non-portable decisions). I thought I had two choices: a) say thing that are not entirely true or the whole story or b) very often say "but there is more/this is a simplification". Perhaps the definition of (&) could be c) emphasized in an admonition, or d) it could be changed to "everything on this page may be simplified and expect some inaccuracies here". But I'd expect (d) could lead to lots of updates adding more details, making the page too long for it's main purpose...

docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
docs/source/index.rst Outdated Show resolved Hide resolved
docs/source/installation-guide-hard.md Outdated Show resolved Hide resolved
docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
@choldgraf
Copy link
Member

choldgraf commented Apr 20, 2021

also - somebody (@willingc ? @minrk ?) should enable ReadTheDocs builds for PRs so we can preview these changes! https://readthedocs.org/projects/jupyterhub/ (I don't have permissions)

@manics manics marked this pull request as draft May 21, 2022 14:37
@manics manics closed this May 21, 2022
@manics manics reopened this May 21, 2022
@manics
Copy link
Member

manics commented May 21, 2022

RTD build is active, this needs a rebase since docs/source/installation-guide-hard.md was moved into a separate repo, and some of the RTD deps may have changed

@rkdarst
Copy link
Contributor Author

rkdarst commented Jan 8, 2023

All of a sudden I'm reminded that this exists. I'll try to make more improvements to it based on the suggestions, but I'm not very good at following up with things these days, so if anyone wants to push things forward, by all means go ahead!

Does anyone know if rebasing (but not renaming the file) to resolve the conflicts above will mess up the per-line issues? My thought is "probably not" but just want to make sure...

@choldgraf
Copy link
Member

No ideas re: rebasing but I usually find it to work sensibly for this kind of thing.

I am a big fan of merging this one quickly. It has a lot of useful information and I'd prefer merging something imperfect and then iterating, rather than having all of this knowledge locked up in a PR draft.

rkdarst and others added 7 commits January 9, 2023 21:52
- Single-user servers are same you get with `jupyter notebook`.
- Kernels by default in single-user server environment but don't have
  to be.
- Apparently recommonmark does intelligently uses links like
  sphinx+rst, and you shouldn't use `.html` on the links.
Thanks to @betatim

Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Chris Holdgraf <choldgraf@gmail.com>
docs/source/index.rst Outdated Show resolved Hide resolved
Comment on lines 6 to 14
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
when setting up a sufficiently complex Jupyter(Hub) setup.

This document was originally written to assist in debugging: very
often, the actual problem is not where one thinks it is and thus
people can't easily debug. In order to tell this story, we start at
JupyterHub and go all the way down to the fundamental components of
Jupyter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
when setting up a sufficiently complex Jupyter(Hub) setup.
This document was originally written to assist in debugging: very
often, the actual problem is not where one thinks it is and thus
people can't easily debug. In order to tell this story, we start at
JupyterHub and go all the way down to the fundamental components of
Jupyter.
obvious how the parts relate.

Removing as per suggestion

company), or whitelist only the allowed users (e.g. your group's
Github usernames). Some other popular authenticators include:

- **OAuthenticator** uses the standard OAuth protocol to verify users.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should (not just here but many places scattered around here), but to get it out I'll save that for later...

what it does out of the box) and makes the hub not too dissimilar to
an advanced ssh server.

There are many more advanced spawners:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link to /reference/spawners, but didn't remove any from here since they somehow were chosen to represent the diversity of what's available... I'll let someone else pick what to remove.


The proxy always runs as a separate process to JupyterHub (even though
JupyterHub can start it for you). JupyterHub has one set of
configuration options for the proxy addresses (`bind_url`) and one for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got this right now (in the push that is upcoming), might be worth someone checking that it matches modern standards...

docs/source/getting-started/what-is-jupyterhub.md Outdated Show resolved Hide resolved
opens in a separate tab. It is traditionally started by `jupyter
notebook`.

Does anything need to be said here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the sections but not the text, someone else can do that later. Anyway, there is jupyter_server now, which will some sort of updates here, right?

there are still plenty of details, implementations, and exceptions.
When setting up JupyterHub, the first step is to consider the above
layers, decide the right option for each of them, then begin putting
everything together.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean this one? https://www.youtube.com/watch?v=JxyKBNJnfVM
Since it's mine and perhaps old, I'll let someone else decide to add it.

@rkdarst
Copy link
Contributor Author

rkdarst commented Jan 9, 2023

In the sprit of "getting it out and update", I made rebased to current upstream and did the quick revisions from the reviews. There are still more extensive things to do, but I'll leave that for someone else for later.

@rkdarst
Copy link
Contributor Author

rkdarst commented Jan 9, 2023

linkcheck failures seem unrelated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants