Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch Guzzle Exception to avoid breaking harvest #4075

Closed
stefan-korn opened this issue Dec 4, 2023 · 4 comments · Fixed by #4076 or #4084
Closed

Catch Guzzle Exception to avoid breaking harvest #4075

stefan-korn opened this issue Dec 4, 2023 · 4 comments · Fixed by #4076 or #4084

Comments

@stefan-korn
Copy link
Contributor

Describe the bug

During harvesting the method getRemoteMimeType is called to determine the mime type of distribution resource by calling the downloadURL. If for some reason the downloadURL is not available or broken, the harvesting will fail alltogether with a Guzzle Exception logged.

Steps To Reproduce

Do a harvest with one resource download url not being available.

Expected behavior

The method getRemoteMimeType should retun NULL on failure to call the downloadURL (as it is already said in the methods declaration) instead of quitting with an Exception.

@stefan-korn
Copy link
Contributor Author

Beside this problem, there is another possible issue with that method. If the download url does timeout this might take forever or very long if the harvest is started via drush and the default timeout of 0 = forever of Guzzle client is used.

Maybe one should use Drupal's http_client service instead to get Drupal's default settings and be able to override via settings?

@dafeder
Copy link
Member

dafeder commented Dec 11, 2023

Switching to http_client service seems like a good move. As @janette mentions in the PR we are still thinking through the right way to deal with bad URLs.

paul-m added a commit that referenced this issue Dec 19, 2023
#4075: Catch Guzzle Exception to avoid breaking harvest
@paul-m
Copy link
Contributor

paul-m commented Dec 20, 2023

We found that this issue also affects the use-case of trying to create a dataset in the Drupal UI, using a bad URL for the source URL. This led to a WSOD with the error message that guzzle had gotten a 404.

@paul-m
Copy link
Contributor

paul-m commented Dec 21, 2023

Fixed. Special thanks to @stefan-korn

@paul-m paul-m closed this as completed Dec 21, 2023
DKAN 2 Issue Triage automation moved this from Incoming/Triage to Closed Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
3 participants