Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README URLs based on HTTP redirects #41

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Documentation
With Upton, you can scrape complex sites to a CSV in just a few lines of code:

```ruby
scraper = Upton::Scraper.new("http://www.propublica.org", "section#river h1 a")
scraper = Upton::Scraper.new("https://www.propublica.org/", "section#river h1 a")
scraper.scrape_to_csv "output.csv" do |html|
Nokogiri::HTML(html).search("#comments h2.title-link").map &:text
end
Expand All @@ -31,7 +31,7 @@ Upton can handle pagination too. Scraping paginated index pages that use a query

To handle non-standard pagination, you can override the `next_index_page_url` and `next_instance_page_url` methods; Upton will get each page's URL returned by these functions and return their contents.

<b>For more complete documentation</b>, see [the RDoc](http://rubydoc.info/gems/upton/frames/index).
<b>For more complete documentation</b>, see [the RDoc](http://www.rubydoc.info/gems/upton/frames/index).

<b>Important Note:</b> Upton is alpha software. The API may change at any time.

Expand All @@ -44,7 +44,7 @@ Here are some similar libraries to check out for inspiration. No promises, since
- [Pismo](https://github.com/peterc/pismo)
- [Spidey](https://github.com/joeyAghion/spidey)
- [Anemone](http://anemone.rubyforge.org/)
- [Pupa.rb](https://github.com/opennorth/pupa-ruby) / [Pupa](https://github.com/opencivicdata/pupa)
- [Pupa.rb](https://github.com/jpmckinney/pupa-ruby) / [Pupa](https://github.com/opencivicdata/pupa)

And these are some libraries that do related things:

Expand All @@ -57,7 +57,7 @@ Examples
If you want to scrape ProPublica's website with Upton, this is how you'd do it. (Scraping our [RSS feed](http://feeds.propublica.org/propublica/main) would be smarter, but not every site has a full-text RSS feed...)

```ruby
scraper = Upton::Scraper.new("http://www.propublica.org", "section#river section h1 a")
scraper = Upton::Scraper.new("https://www.propublica.org/", "section#river section h1 a")
scraper.scrape do |article_html_string|
puts "here is the full html content of the ProPublica article listed on the homepage: "
puts "#{article_html_string}"
Expand All @@ -68,14 +68,14 @@ end
Simple sites can be scraped with pre-written `list` block in `Upton::Utils', as below:

```ruby
scraper = Upton::Scraper.new("http://nytimes.com", "ul.headlinesOnly a")
scraper = Upton::Scraper.new("http://www.nytimes.com/", "ul.headlinesOnly a")
scraper.scrape_to_csv("output.csv", &Upton::Utils.list("h6.byline"))
```

A `table` block also exists in `Upton::Utils` to scrape tables to an array of arrays, as below:

```ruby
> scraper = Upton::Scraper.new(["http://website.com/story.html"])
> scraper = Upton::Scraper.new(["http://www.website.com/story.html"])
> scraper.scrape(&Upton::Utils.table("//table[2]"))
[["Jeremy", "$8.00"], ["John Doe", "$15.00"]]
```
Expand All @@ -94,7 +94,7 @@ This example shows how to scrape the first three pages of ProPublica's search re

Contributing
----------------------
I'd love to hear from you if you're using Upton. I also appreciate your suggestions/complaints/bug reports/pull requests. If you're interested, check out the issues tab or [drop me a note](http://github.com/jeremybmerrill).
I'd love to hear from you if you're using Upton. I also appreciate your suggestions/complaints/bug reports/pull requests. If you're interested, check out the issues tab or [drop me a note](https://github.com/jeremybmerrill).

In particular, if you have a common, *abstract* use case, please add them to [lib/utils.rb](https://github.com/propublica/upton/blob/master/lib/utils.rb). Check out the `table_to_csv` and `list_to_csv` methods for examples.

Expand Down