-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode strings before using them as URLs for Add Column by Fetching URLs - fixes #6140 #6142
base: master
Are you sure you want to change the base?
Conversation
a861035
to
726f07b
Compare
726f07b
to
4da8049
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this.
main/src/com/google/refine/operations/column/ColumnAdditionByFetchingURLsOperation.java
Outdated
Show resolved
Hide resolved
4da8049
to
3fbc475
Compare
3fbc475
to
a969ac5
Compare
@wetneb ping. I addressed your comment a couple of weeks ago and this is ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay!
final String BASE_URL = "https://example.com/"; | ||
final String TEST_URL = BASE_URL + ARABIC_PATH; | ||
assertEquals(HttpClient.getEscapedUrl(TEST_URL), BASE_URL + UrlEscapers.urlPathSegmentEscaper().escape(ARABIC_PATH)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be worth adding some test cases to demonstrate the handling of already-escaped URLs, to make sure they are not doubly escaped?
Fixes #6140
Changes proposed in this pull request:
One question I have is whether we should be percent-decoding the string before feeding it to the URL encoder. This could potentially make it more compatible with existing recipes which uses
.encode("url")
as well as allowing the use of other sources of pre-encoded URLs, but is opening up a little bit of a can of worms.Note also that this is based on the branch with the fix for #6137 and includes it, on the assumption that it will be reviewed and merged first.