Releases · lorey/mlscraper

24 Jun 14:04

lorey

v1.0.0rc3

e63203f

1.0.0rc3 Pre-release

Pre-release

improved training performance by 10x (again) by trying to generate scrapers for highly similar matches first
added first pseudo css selectors by implementing nth-child. e.g. div a:nth-child(1)
added child selector generation, e.g. .user-box > a
added attribute-based css selectors, e.g. a[itemprop="user"]
added automated tests for GitHub profile pages
added lazy hashing for node elements
extended text matching to also include parent elements that contain the same text
fixed a bug where searching for values resulted in image dimensions being matched
fixed a bug where text did not exactly match the sample provided but was selected anyway

Assets 2

21 Jun 19:39

lorey

v1.0.0rc2

8590a22

1.0.0rc2 Pre-release

Pre-release

fixed a bug where text inside a tag was only selected if not enclosed by whitespace

Assets 2

21 Jun 16:02

lorey

v1.0.0rc1

f58ddd5

1.0.0rc1 Pre-release

Pre-release

mlscraper has been rewritten from the core and is now easier to use, more flexible, and faster than ever. This is the first release candidate for the upcoming 1.0 version. Feel free to try it out with pip install --pre mlscraper.

scrapers can extract arbitrary data structures (lists, dicts, lists of dicts and even lists of lists)
depending on the page, one example might be enough to train a scraper
the generation of CSS selectors has been overhauled and is now more efficient

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: lorey/mlscraper

1.0.0rc3

1.0.0rc2

1.0.0rc1