Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAF Harvester parsing issues #309

Open
benjwadams opened this issue May 22, 2023 · 1 comment
Open

WAF Harvester parsing issues #309

benjwadams opened this issue May 22, 2023 · 1 comment

Comments

@benjwadams
Copy link

benjwadams commented May 22, 2023

WAF harvesting can fail to parse on numerous things which are a de facto a WAF, such as this listing: https://gcoos4.tamu.edu/erddap/metadata/iso19115/xml/

Because the harvester is looking explicitly for "a href", anything that doesn't exactly follow that string ordering will fail to harvest? Is there any reason why a proper XML parsing library isn't used when finding links instead of using a parsing library, which has known pitfalls when parsing XML?

Also, on the above link, the "apache" parser is used due to the "Server" header, even though this is clearly not an Apache directory listing, but rather a reverse proxied application. This was difficult to track down when I had to create custom logic for the "other" parser to account for some of the shortcomings of the WAF parser mentioned above.

@amercader
Copy link
Member

@benjwadams you are right that the parser used in WAF is very brittle. Any improvements on that front would be a great contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants