Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃捄馃ォ馃嵎 First-class Files #8197

Open
wardi opened this issue Apr 24, 2024 Discussed in #8081 · 1 comment
Open

馃捄馃ォ馃嵎 First-class Files #8197

wardi opened this issue Apr 24, 2024 Discussed in #8081 · 1 comment
Assignees

Comments

@wardi
Copy link
Contributor

wardi commented Apr 24, 2024

Discussed in #8081

Originally posted by wardi February 16, 2024
Uploaded files in CKAN are limited to 0 or 1 file attached to only groups or resources.

The group or resource model stores a reference to the file with a plain text column that can be updated like other metadata values. Resources can store the length, hash and format of a file uploaded, but these are metadata fields free for users to update (or not) that aren't durably linked to the file itself.

Uploaded files can leak, staying on the underlying storage and costing money even though there is no longer any way to reach them from the CKAN site.

There is no shared way to represent files that aren't yet attached to a group or resource, e.g:

It's not possible to attach multiple files to a resource even when they represent the same data. This would be very useful for:

Model solution

Let's create a model for uploaded files in CKAN that can be linked to resources or groups or anything else that a site might need.

Files would have:

  • owner type + id for permissions (e.g. resource, user, group, etc.)
  • original file name
  • file reference (specific to storage back end)
  • total size in bytes
  • format detected or determined from file name
  • completion state (ranges received for background/parallel uploads)
  • hash(es) (when supported by back end)

Other possibilities:

  • name of back end (multiple back end support or for migrating files live)
  • support for "files" that are actually links to externally managed resources so we can monitor changes to content based on hash/size when retrieved
  • alternate links for redundancy when some services aren't available
  • custom fields for permissions, tracking, validation reports or other plugin data

This model would make file metadata reliable, allow us to build new features and potentially save people money by better tracking hosted data in CKAN.

@wardi wardi self-assigned this Apr 24, 2024
@wardi
Copy link
Contributor Author

wardi commented Apr 24, 2024

@smotornyuk 's work on https://github.com/DataShades/ckanext-files is the most advanced in this area. We should model a core feature for ckan on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant