Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplicate data on Archive #10

Open
4 tasks
sg-s opened this issue Jun 20, 2019 · 5 comments
Open
4 tasks

De-duplicate data on Archive #10

sg-s opened this issue Jun 20, 2019 · 5 comments
Assignees
Labels
not-working Something isn't working
Projects

Comments

@sg-s
Copy link
Member

sg-s commented Jun 20, 2019

  • run rmlint
  • verify that only duplicated files will be deleted
  • Run a small part of the script to check that only duplicates and not originals are deleted
  • Run it on the whole dataset
@sg-s sg-s transferred this issue from cosmojg/srinivas-projects Aug 7, 2019
@sg-s sg-s added the not-working Something isn't working label Aug 7, 2019
@cosmojg cosmojg added this to To do in computers Aug 7, 2019
@cosmojg cosmojg moved this from To do to In progress in computers Aug 7, 2019
@sg-s
Copy link
Member Author

sg-s commented Nov 4, 2019

bump?

@cosmojg
Copy link
Collaborator

cosmojg commented Nov 12, 2019

@sg-s I did this already, but I'm going to do it again once I've made sure everything that needs to be backed up is at least somewhere on crab (for example, there's still some stuff sitting on the fileshare from when crab was having an existential crisis). Let's keep this issue open for now.

@cosmojg cosmojg moved this from In progress to To do in computers Nov 13, 2019
@cosmojg cosmojg moved this from To do to In progress in computers Sep 21, 2021
@klwadland
Copy link

In addition to de-duplicating the data on the Archive we also need to dig through and delete unnecessary items (pictures of people's cat, dog, significant other, vacation, etc.; financial documents; resumes; pdfs of interesting papers; c:\ drive contents/programs; and so on). Is there a way to do automate or semi-automate this so someone doesn't have to review the whole Archive file by file?

@klwadland klwadland changed the title De-duplicate data on crab De-duplicate data on Archive Sep 22, 2021
@cosmojg
Copy link
Collaborator

cosmojg commented Oct 22, 2021

@Liaro0903 Katelyn said she'll ask Steve about letting one of us go down there and help out with expanding the Archive. In the meantime, we should work on finding or writing a program to speed up the process of sorting through everything that's on there and deleting all the junk.

@klwadland
Copy link

Sent an email to Steve and am waiting to hear what he says.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not-working Something isn't working
Projects
computers
  
In progress
Development

No branches or pull requests

3 participants