Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update congresspeople advisors script #252

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bruno-schmidt
Copy link
Contributor

@bruno-schmidt bruno-schmidt commented Jun 14, 2017

Fix #237!

There was a bug due to HTML and url changes in CAMARA_URL.

Also I changed the words "deputy"/"deputies" to "congressperson"/"congresspeople" in src/fetch_congresspeople_advisors.py as pointed by @jtemporal (#149)!

@jtemporal
Copy link
Collaborator

@bruno-schmidt this is great! thank you!

a question though: I got some errors while running the script as follows, is this expected?

HTTPConnectionPool(host='www2.camara.leg.br', port=80): Max retries exceeded with url: /transparencia/recursos-humanos/servidores/lotacao/consulta-secretarios-parlamentares/layouts_transpar_quadroremuner_consultaSecretariosParlamentares (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1192540f0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))

@bruno-schmidt
Copy link
Contributor Author

hey @jtemporal !

I did some runs and looks like their servers are a bit instable today. I got some (different) errors too, but the script run fine afterwards.

That's not normal, but happens. :(

The errors I got was [Errno 110] Connection timed out and an exception, probably because a different HTML showed up (an error page?).

If it's happening a lot or your computer is sending/processing the requests too fast, you may decrease the connection pool in line 65, param size to be a bit friendly to their servers XD, but I guess it's a temporary problem. n__nd

@jtemporal
Copy link
Collaborator

jtemporal commented Jun 19, 2017

I guess it's a temporary problem

I got those the 3 times I ran it in the last few days, I'll check if reducing the pool size improves the results. Apart from that, everything seems to be working fine! 🎉

@cuducos
Copy link
Collaborator

cuducos commented Jun 19, 2017

In a Jarbas script that also reaches one of Chamber serves we stop for a while after 256 requests (I could push that as far as ~500 requests before pausing for 2s but I decided for a huge margin).

@bruno-schmidt
Copy link
Contributor Author

Ouch! :(

I will write down some info, maybe it can help to figure out something!

This script will send aprox. 800 requests. In my computer it completes in ~90 seconds with connection pool size 8 and ~65 seconds with 16. I have an Intel i3-2100 (2 cores + 2 threads) and HDD 5200rpm.

What do you guys think?

Irio pushed a commit that referenced this pull request Feb 27, 2018
Use full text search to query reimbursements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants