Skip to content

🚆This web scraper builds a dataset for São Paulo subway operation status

Notifications You must be signed in to change notification settings

douglasnavarro/sp-subway-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What it is

This project consists of basically a single python script to write the status of the São Paulo subway lines to a docs.google worksheet.

The sheets can be viewed (and freely used for any datascience project) here.

How it works

Every 5 minutes the script fetches the official subway company page using 'requests' module and extracts the operation status as shown in the column on the right-side of the page using 'beautiful soup' module. The last-update time shown is also stored and later on is associated with each subwat line.

Once everything is properly parsed, the information is stored in the worksheet using the 'gspread' module.

The script runs indefinately on heroku.

Unavailability or other issues

If for some reason the data points registered are empty, an e-mail is sent with the page attached so I can see the page and if necessary the logs to find out what happend.

If this data is ever useful to you, let me know. Enjoy! 🍻

Data Analysis

An analysis of the data was made by Paulo! You can read it here

About

🚆This web scraper builds a dataset for São Paulo subway operation status

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages