Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add huge_tree=True to the XMLParser used for responses. #55

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jiemakel
Copy link

Without huge_tree=True, lxml parsing apparently fails on certain, even slightly largish responses (apparently of more than 9.5MB).

Because also recover=True, from the viewpoint of Sickle, this happens silently. I only noticed it happening because it results in losing also the resumption token and therefore ending the crawl, upon which I started to wonder why I had way less records than I should have had.

Alternatively, if one wanted to get fancy, one might want to add the XMLParser to use as an optional parameter passed to Sickle and from then on down to the OAIResponse. This would allow people to customize for themselves what kind of XML parsing behaviour they want. For this PR however, I opted for the most simple fix.

…sing fails silently on certain large response (apparently of more than 9.5MB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant