Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate pulling Lobbyist Quarterly Report PDF data #31

Open
tdooner opened this issue May 1, 2020 · 0 comments
Open

Investigate pulling Lobbyist Quarterly Report PDF data #31

tdooner opened this issue May 1, 2020 · 0 comments

Comments

@tdooner
Copy link
Member

tdooner commented May 1, 2020

The Oakland PEC is updating their process for Lobbyist Disclosures to be a digital process. The filings in Netfile for the new process look like this:

https://netfile.com/Connect2/api/public/image/189594727

That document is a PDF with fillable fields containing the values for the person. The values can be extracted with the pdftk command like this:

» wget -O189594727.pdf https://netfile.com/Connect2/api/public/image/189594727
» pdftk 189594727.pdf dump_data_fields | grep FieldValue
FieldValue: Reynaldo A. Fuentes
FieldValue: The Partnership for Working Families
FieldValue: 1305 Franklin St Suite 501
FieldValue: Oakland, CA 94612
FieldValue: (510) 925-4013
FieldValue: rey@forworkingfamilies.org
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Emergency Paid Sick Leave
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Paid Sick Leave Enforcement
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Department of Workplace and Enforcement Standards
FieldValue: East Bay Alliance for a Sustainable Economy
FieldValue: Department of Workplace and Enforcement Standards
FieldValue: April 26, 2020
FieldValue: Choice2
FieldValue: Choice3
FieldValue: Support
FieldValueDefault:
FieldValue: Support
FieldValueDefault:
FieldValue: Policy Development
FieldValueDefault:
FieldValue: Informational Briefing
FieldValueDefault:
FieldValue:
FieldValueDefault:
FieldValue:
FieldValueDefault:

Is there an easy way to get this data out of the PDF with ruby? Maybe a gem that wraps PDFtk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant