Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper methods for scraping one page and for scraping multiple #31

Open
jeremybmerrill opened this issue Jan 15, 2014 · 5 comments
Open
Milestone

Comments

@jeremybmerrill
Copy link
Contributor

That Scraper.new takes EITHER a url and a selector OR an array of URLs is confusing. Should keep both on new for backwards compatibility, but add a helper method for each pattern -- and use those helper methods in the README.

This will hopefully allay some of the confusion in #30 and address the API problems that were mentioned in #5 without such a dramatic refactor.

@jeremybmerrill
Copy link
Contributor Author

Scraper#index will return a Scraper instance with (perhaps deferred for actual fetching later) on which a #scrape call will fetch the links on the index specified by the selector expression. Scraper#instances will return a Scraper instance on which a #scrape call will fetch the links on the index specified in the argument to #instances.

@jeremybmerrill jeremybmerrill added this to the 0.4.0 milestone Feb 15, 2014
@jeremybmerrill
Copy link
Contributor Author

I think for 1.0.0 the Scraper returned by "index" will immediately fetch the index page, so that the Scraper can be added to other scrapers, see #35. For now, it'll still only be fetched on#scrape.

@jeremybmerrill
Copy link
Contributor Author

I changed my mind in the last 31 minutes.

For 0.4.0 the semantics of #initialize will change. The index page will be scraped immediately. However, the syntax will not change.

@jeremybmerrill
Copy link
Contributor Author

Hmm, if it makes requests on the first call (e.g. Scraper.new, Scraper.index), when are options set? I guess as a hash on that first call? That'll be a breaking change. So I'll cue that up for 1.0.0

@jeremybmerrill
Copy link
Contributor Author

Mostly implemented in future (1.0.0) at a25e84e

Partially implemented for 0.4.0 at 24cb65e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant