Skip to content

Introduction to Tools

Gunjan Baid edited this page Feb 20, 2017 · 6 revisions

This page is intended to provide an introduction to the tools that are available for use in connector courses.

What are Jupyter notebooks?

Jupyter notebooks are a great tool for completing assignments that are used by Data 8 and other connector courses. Jupyter notebooks are a tool used for in-browser computing that allow you to include code, text, and visualizations on the same page. Once created, these notebooks can be distributed to students and then downloaded in various formats for submission. You can see an example of a Data 8 notebook here. The link might take some time to download all of the files. Once you see a page of many files, click on the file called lab01.ipynb. This is the Jupyter notebook assignment.

What is JupyterHub?

The Jupyter notebooks mentioned above are used in conjunction with JupyterHub. The term JupyterHub refers to the infrastructure set up on datahub.berkeley.edu. This site provides cloud-based internet storage for assignments. Instructors and students can work on and store assignments entirely through JupyterHub. It is essentially the equivalent of Google Drive for Jupyter notebooks.

Do you need JupterHub to use notebooks?

You don't need to use JupyterHub in order to use the Jupyter notebooks. Notebooks can be used and stored locally in your computer's filesystem. This requires some extra installation and setup that we have not provided directions for on this site. You don't need an internet connection in order to use and access the notebooks on your own computer. You do need an internet connection in order to use the datahub.berkeley.edu JupyterHub.

Why does JupyterHub exist?

Working on JupyterHub ensures that all students and instructors are using the same computing environment, reducing installation or compatibility issues that come up with different computers and operating systems. In addition, the work stored over the cloud can be accessed from anywhere. Students who don't have access to a personal computer can still access their work through library computers.

What are Git and GitHub?

Git is a version control software that tracks changes in files and allows multiple users to work on the same files in parallel smoothly. Git is often used in conjunction with GitHub, which is a website where files can be stored. GitHub’s web interface displays the information that Git tracks, such as which users are working on a file and what changes have been made to the file.

What is the datascience package?

The datascience Python package was written for use in Berkeley’s Data Science courses and contains useful functionality for investigating and graphically displaying data. There is detailed documentation available on Tables, Maps, and other components of the datascience package. Students use this package extensively in Data 8 coursework.