You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, datasets stored on the PlanktoScope's SD card are structured by img, objects, clean, and export folders at the top level, with each folder having with their own duplicate copy of the dataset folder structure as subfolders. This makes it annoying to delete all the data just for one dataset. It also interacts badly with our filebrowser web interface for trying to download a zip archive of the raw images for a single dataset (e.g. to segment later, and/or for archival): when we try to download the img folder, it will include raw images for any previous datasets we had forgotten to delete; and it's inadvisable to instead to download a specific subfolder of the img folder, the zip archive loses the name of that folder. The annoying workflow for downloading and deleting a dataset caused me lots of frustration during high-intensity PlanktoScope operations on the ToTS Sikuliaq research cruise in the summer of 2023, where the high workload often resulted in forgetting to delete the dataset from the previous acquisition.
Proposal
We should reorganize the folder structure of datasets so that img, objects, clean, and export are the subfolders of each dataset folder. This way, we can download all files of all types associated with a dataset just by downloading a single folder as a zip archive; and we can just delete a dataset folder to delete all data associated with it.
In order to keep it easy to access, download, and delete all the EcoTaxa export ZIP files from the PlanktoScope, we would need to provide an alternate interface which aggregates all the EcoTaxa export ZIP files from all datasets. This could be a simplified "file manager" interface which provides individual & bulk dataset management actions, while our file browser interface would be for more advanced usage. Perhaps this interface could be made accessible at http://pkscope.local/ps/data/index . In the future, that interface could also be a frontend to rclone for uploading/transferring datasets to cloud storage.
Unresolved questions to address as part of this proposal:
Would it be necessary/helpful/simpler to just have a single folder (e.g. project-id_sample-id_acq-id) for each dataset, instead of an entire tree of nested folders (which is what we have right now)?
The text was updated successfully, but these errors were encountered:
Motivation
Currently, datasets stored on the PlanktoScope's SD card are structured by
img
,objects
,clean
, andexport
folders at the top level, with each folder having with their own duplicate copy of the dataset folder structure as subfolders. This makes it annoying to delete all the data just for one dataset. It also interacts badly with our filebrowser web interface for trying to download a zip archive of the raw images for a single dataset (e.g. to segment later, and/or for archival): when we try to download theimg
folder, it will include raw images for any previous datasets we had forgotten to delete; and it's inadvisable to instead to download a specific subfolder of theimg
folder, the zip archive loses the name of that folder. The annoying workflow for downloading and deleting a dataset caused me lots of frustration during high-intensity PlanktoScope operations on the ToTS Sikuliaq research cruise in the summer of 2023, where the high workload often resulted in forgetting to delete the dataset from the previous acquisition.Proposal
We should reorganize the folder structure of datasets so that
img
,objects
,clean
, andexport
are the subfolders of each dataset folder. This way, we can download all files of all types associated with a dataset just by downloading a single folder as a zip archive; and we can just delete a dataset folder to delete all data associated with it.In order to keep it easy to access, download, and delete all the EcoTaxa export ZIP files from the PlanktoScope, we would need to provide an alternate interface which aggregates all the EcoTaxa export ZIP files from all datasets. This could be a simplified "file manager" interface which provides individual & bulk dataset management actions, while our file browser interface would be for more advanced usage. Perhaps this interface could be made accessible at http://pkscope.local/ps/data/index . In the future, that interface could also be a frontend to rclone for uploading/transferring datasets to cloud storage.
Unresolved questions to address as part of this proposal:
project-id_sample-id_acq-id
) for each dataset, instead of an entire tree of nested folders (which is what we have right now)?The text was updated successfully, but these errors were encountered: