Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer alternative to pycurl #147

Open
alexzabbey opened this issue Dec 29, 2019 · 11 comments
Open

Offer alternative to pycurl #147

alexzabbey opened this issue Dec 29, 2019 · 11 comments
Assignees
Milestone

Comments

@alexzabbey
Copy link

Installing pycurl on windows is a real hassle

@siznax
Copy link
Owner

siznax commented Dec 31, 2019

Thanks for trying wptools @alexzabbey. That's a good idea! Will try to implement, or you're welcome to submit a PR... 😉

@siznax siznax changed the title Is it possible to change the pycurl dependency? Offer alternative to pycurl Dec 31, 2019
@alexzabbey
Copy link
Author

I found this in request.py:

        # consistently faster than requests by 3x
        #
        # r = requests.get(url,
        #                  headers={'User-Agent': self.user_agent})
        # return r.text

I guess that means you've tried to use requests instead of pycurl but it was slower. Is it that significant?
I think helping windows users is more important, and we can probably work to speed things up in requests with sessions and so on or alternatively try doing the requests asynchronously.
What do you think?

@siznax
Copy link
Owner

siznax commented Jan 7, 2020

Yes, I started with requests, but couldn't figure out why it was so much more slow than just using curl. Turns out, all of that scaffolding around urllib3 is costly. I found, up to 3x more costly.

I agree that offering requests as an alternative for folks having trouble with pycurl would be good. Worse performance is better than no performance, heh.

@lisongx
Copy link
Contributor

lisongx commented Jan 8, 2020

I found, up to 3x more costly.

That's really noticeable difference!
I found a benchmark project which documents some of the python request client https://github.com/svanoort/python-client-benchmarks

@ukanuk
Copy link

ukanuk commented Apr 12, 2020

See #44 which isn't linked to a pull request, but likely the initial impetus for changing the library.

@siznax siznax self-assigned this Apr 15, 2020
@siznax siznax added this to the Release v0.4.18 milestone Apr 15, 2020
@uriva
Copy link
Contributor

uriva commented May 8, 2020

+1 this is causing a real headache with docker files (jupyter, kubernetes etc').

@siznax
Copy link
Owner

siznax commented Nov 16, 2020

Planning to add urllib3 as alternative/replacement for pycurl

@Nathan1123
Copy link

Yes, I started with requests, but couldn't figure out why it was so much more slow than just using curl. Turns out, all of that scaffolding around urllib3 is costly. I found, up to 3x more costly.

Maybe it is just my machine, but I am finding that wptools as it currently is (using PyCurl) is incredibly slower than trying to manually using requests. Naturally I will use wptools, because all the parsing of wikitext has been a pain in the neck, but my script went from ~20 min long to an overnight run.

@Simonsoto
Copy link

I had the same issue. I'm on a windows machine but using conda. You can install libcurl, pycurl with conda and the using pip to install wptools. Maybe adding this would help to guide others.

@applieddesign
Copy link

Any update on this milestone?

@uriva
Copy link
Contributor

uriva commented Mar 16, 2021

Hey - did something initial that seems to work: https://github.com/uriva/wptools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants