Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKAN process killed when uploading large files (> 200 Mb) #8061

Open
vabatista opened this issue Feb 6, 2024 · 9 comments
Open

CKAN process killed when uploading large files (> 200 Mb) #8061

vabatista opened this issue Feb 6, 2024 · 9 comments
Assignees

Comments

@vabatista
Copy link

vabatista commented Feb 6, 2024

CKAN version

2.10

Describe the bug

I deployed CKAN using docker into a container with 4Gb RAM and 2 vCPUs. We are using s3filestore plugin (https://github.com/keitaroinc/ckanext-s3filestore/).
When uploading large files to datasets, the process handling the upload to S3 got killed. Here is a exerpt of log from AWS ECS container

image

My max_resource_size is set to 102400 (100 Gb)

We also disabled the plugin and use the container filesystem to store files. Same problem holds. When submitting large files, web application returns HTTP error 502 (bad request).

Expected behavior

Upload limited only to 100Gb.

@wardi wardi self-assigned this Feb 8, 2024
@wardi
Copy link
Contributor

wardi commented Feb 8, 2024

At our meeting @tino097 mentioned that @ThrawnCA is maintaining a fork of s3filestore that you might try instead: https://github.com/qld-gov-au/ckanext-s3filestore

I'm more familiar with https://github.com/TkTech/ckanext-cloudstorage which has support for large file uploads, direct uploading to s3 and resuming uploads.

It would also be great to fix the upload issue with plain file storage so that data is streamed to the file system instead of being loaded into memory, but most sites hosting large files don't want them stored on the web server directly.

@ThrawnCA
Copy link
Contributor

ThrawnCA commented Feb 8, 2024

We actually started maintaining our s3filestore fork because we couldn't get the cloud storage plug-in to work. IIRC the issue was that it couldn't use an IAM role for its permissions, it wanted an access key in config, and that wasn't how we wanted to run things. But we've added more features since then.

Edit: Also, ckanext-cloudstorage relies on cookie-based API authentication, which is a bad idea, in order to do multipart uploads.

@vabatista
Copy link
Author

Hi @ThrawnCA , I finally got your branch of plugin working. I had some issues with ACL and SSL, because my company's policies. It is working, but I still getting HTTP 502 submitting large files.
When I increase the size of ECS container instance it support a bigger file, but still much less than the limite I stablished in ckan.max_resource_size property.

@vabatista
Copy link
Author

vabatista commented Feb 19, 2024

@wardi , can you tell me if https://github.com/TkTech/ckanext-cloudstorage plugin can work with IAM role authorization, without having to pass access_key?
My CKAN application runs in an ECS container within same AWS account used by S3. My company doesn't allow access through access_key for production environments.

@vabatista
Copy link
Author

@wardi, I inserted several log.debug statements in s3filestore plugin. I found that the problem with large files happens before calling any constructor or upload method. It seems CKAN "blows" before calling the plugin. I also tried few timeout configuration variables, but without effect.

@ThrawnCA
Copy link
Contributor

@vabatista Can you share any lessons learned from making the s3filestore plugin work? I'd be interested in fixing errors or better documenting requirements.

@vabatista
Copy link
Author

@ThrawnCA Sure! I'll comment on your repo.

@Zharktas
Copy link
Member

@vabatista we have a fork of ckanext-cloudstorage which uses temporary access keys from IAM role vrk-kpa/ckanext-cloudstorage@6ffe92b. None of these have any tests and I should someday write some simplified extension for it.

@chaman53
Copy link

[uwsgi-body-read] Error reading 1 bytes. Content-Length: 75348413 consumed: 0 left: 75348413 message: Client closed connection also getting same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants