Skip to content
Gunjan Baid edited this page Feb 12, 2017 · 11 revisions

Handling course materials with Github

If you'd like to use Github to handle your course materials (and to create interact links), you'll need an account on [Github][github] and the ability to push to the Github repo associated with your connector course.

[Github][github] is a website that hosts code and files (it is the site you are currently visiting). A repository on Github holds the files for a specific project. Each connector class has a repository. If you don't know what your repository is and you'd like to know, contact the datahub team.

Creating interact links

It is possible to create a link that will automatically pull new files into a student jupyterhub instance from your class github repository. See the url_to_interact function in the connectortools module. There is also a demo in the connectortools binder to show how this is done. [github]: https://github.com/

Storing datasets on GitHub

GitHub is one of several options for storing datasets. For other options see the hardware page.

Installing libraries and packages

Each user will have several common packages for scientific computing installed by default. However, many connectors wish to packages specific to their class.

There are two main options to install packages on the cluster. Choose the one that is best suited for your needs:

  1. To install a new package that all students will automatically have access to, create a new issue in the connector-instructors repo and flag the jupyterhub administrator. Attach (or point to) a small notebook that uses this package. We'll try to integrate any additional libraries that your notebook specifies. The more lead time we have the better. This can be a little clunky if you want to update a package on a regular basis.
  2. Install packages directly to the cluster in each user session. In this case packages must be installed to your user directory. If you get permissions errors when using pip, it's usually because you're trying to write to the base cluster directory, not your user directory. Check out the connectortools module for a function called install_package that makes this straightforward.

Whenever a package is updated to a new version, students should restart their kernel. If you updated the package by contacting the datahub tech crew, it requires that students stop and start their server.

datascience package documentation

The datascience package was written for use in Berkeley’s Data Science courses and contains useful functionality for investigating and graphically displaying data. There is detailed documentation available on Tables, Maps, and other components of the datascience package.

General tips

  • Run your code from start to finish in one go before pushing it to students. This ensures that it runs in a timely fashion, and that you aren't baking memory issues into the code itself.
  • Always run the code on the cluster before distributing it, just in case you haven't accounted for some hardware or library restriction on the cluster.