Skip to content

Storing Notebooks on GitHub

Anthony Suen edited this page Jun 27, 2017 · 9 revisions

Using the Web Interface

You can perform many actions such as uploads and downloads directly through GitHub's web interface directly, without having to use the command line interface. Here are some directions on how to upload assignments to GitHub. If you did your development on JupyterHub, download the notebook onto your computer. Then, go to your connector's GitHub repository and click Upload Files on the right side.

upload

You can drag and drop your desired files onto the page. Then, write a short sentence describing the files you're adding. This short sentence is called a commit message.

commit-message

You will then see an option to select the branch for your changes. The default for most repositories will be the master branch. If you are a Git beginner, you can stick to the default and add your changes to the master branch. If you are a more advanced Git user and want to use different branches, you may want to select the option to create a new branch. Please see the additional GitHub resources on this page to learn more about branching.

branch

Once you've gone through the above steps, you can save your changes. A set of changes in Git is called a commit.

commit

GitHub

Datasets and the corresponding Jupyter Notebook can be stored in a folder on GitHub. You can then create an interact link for the entire folder. When students click this link, the entire folder will appear on their JupyterHub account.

Outside Hosts

You can store the data on an online host such as Box, Google Drive, or even GitHub. You can then include a cell with this download_dataset function or have students read the data directly via URL. The read_table function for the Table data structure supports URLs.

Shared Copy on JupyterHub

Contact us on Piazza if you want your data to be saved in shared folder on JupyterHub directly. Notebooks stored on JupyterHub will be able to access this data. This is the preferred method for large datasets.

Direct Upload

Students can directly upload data files that you provide them onto their JupyterHub accounts. This method can get messy if notebooks expect the data to be stored at a certain filepath and students upload the files to a different location. Therefore, we recommend using the other methods listed on this page.

Large Datasets

For datasets on the order of GB, we recommend that you contact us regarding hosting a shared copy on JupyterHub. You can also use use outside hosts and provide students with a URL to the data, which they can then read into a Table or other data structure.

Using the Command Line

GitHub can also be used via the command line. You can store your connector's Git repository locally and use a local terminal application to access the command line. You can also store the repository on datahub.berkeley.edu and use the terminal that is present on the JupyterHub site. The instructions below are tailored towards command line use over JupyterHub, but the commands listed can be run on a local terminal as well.

You can access the terminal on JupyterHub by clicking on the New dropdown, and then clicking on Terminal.

terminal

You will then see a terminal page in the browser.

terminal-page

In order to push to your connector's repository, you must have the repository downloaded (aka cloned). If you have not yet cloned the repository, type the below command into the terminal. The <repo_name> is the name of the repository for your connector. The repository names are listed at https://github.com/data-8. Once you run the below command, you will see a folder for your repository in your home directory on JupyterHub. You do not have to repeat this step again.

    git clone https://github.com/data-8/<repo_name>

For example, if your repository is called health-connector, you'd type:

    git clone https://github.com/data-8/health-connector

After this step, you should be able to see your connector's folder at https://datahub.berkeley.edu. Create, upload, or move content (Notebooks, datasets, etc.) into the folder. For more information on creating Notebooks, see this page. For more information on storing datasets, see this page. Once you have your content in the newly created connector repository folder, you can follow the steps below on the terminal to push to GitHub.

    cd ~/<repo_name>
    git status

You should see something that lists the files you've changed or added. If your files don't show up, ensure that they are in your repo's folder.

    git add -A
    git commit -m "Update"
    git push origin master

If the push is successful, you should be able to go GitHub and see the newly uploaded file in the connector repo. If you run into something that looks the below error, contact us on Piazza and we will make sure you have the permissions needed.

    ERROR: Permission to data-8/some-connector.git denied

Here are the above commands, consolidated. This workflow is intended for Git beginners. Git offers many additional features that are not demonstrated in these steps.

    git clone https://github.com/data-8/<repo_name>
    cd ~/<repo_name>
    git status
    git add -A
    git commit -m "Update"
    git push origin master

Additional Resources

Web Interface

  • Managing Files - contains information under the "Managing Files on GitHub" section on how to perform many basic file operations using the GitHub web interface.
  • Hello World Exercise - a short exercise that walks you through additional GitHub features such as branches and pull requests.

Command Line

Desktop GUI