Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

biotoolsID handling in GitHub ecosystem and bio.tools #165

Open
hansioan opened this issue Mar 17, 2020 · 4 comments
Open

biotoolsID handling in GitHub ecosystem and bio.tools #165

hansioan opened this issue Mar 17, 2020 · 4 comments

Comments

@hansioan
Copy link
Collaborator

hansioan commented Mar 17, 2020

@bgruening @joncison @hmenager @matuskalas @piotrgithub1

I am creating this issue to discuss / consult on how we manage bio.tools IDs in both bio.tools and GitHub.

This topic is important because bio.tools is a tool ID provider which means the bio.tools IDs need to be persistent and not change without a serious reason.

Currently how I see bio.tools IDs working:
At tool creation:

  • In bio.tools registration interface: suggest the biotoolsID as an URL-safe version of the tool name, but also allow the user to edit the biotoolsID field (with validations on top of course), but once the tool has been registered and approved (see below), the ID cannot change anymore
  • In GitHub: When a pull request for a new tool comes in it has to follow the basic rules for a new tool which involve having a new folder with the name:
    {{biotoolsID of the new tool}} and in the new folder to have a JSON file with the name
    {{biotoolsID of the new tool}}.biotools.json and that JSON file with the new tool annotations needs to contain a biotoolsID field with the same value for the biotoolsID; it also needs to contain a biotoolsCURIE field with the value:
    biotools:{{biotoolsID of the new tool}} and of course a name, a homepage , description (and other things we require)

We need to decide on how we handle the pull request merging, if one of the core team users needs to approve/review the new entry, or perhaps the user who created the entry needs to review the PR or if the tool creation PR gets automatically merged after all validations (I would not go for that).

My opinion is that since we already allow the tool into the bio.tools database immediately then we can reserve the right to approve new tools before they get added to the GiHub side.

The initial addition to the bio.tools database should be done with a "pending approval" flag which should be resolved on the GiHub side into either in approval of the tool or a rejection.

I think the approval or rejection should mainly focus on the fact that the tool is indeed an actual tool and if the id of the tool is acceptable (i.e. not completely different from the name or having some other weird value; I think we will rarely encounter a situation where a tool has an unacceptable id). Everything else about the tool annotation can be fixed later (e.g. wrong toolType, missing license etc)

At tool update

  • bio.tools will ignore any changes to the biotoolsID and biotoolsCURIE and will not pass these changes to GitHub. Perhaps bio.tools should even report a validation error if an ID change is attempted.
  • Similar to bio.tools GitHub should not allow changes to biotoolsID-related values (either in a file, or the foldername or anywhere), at least not in the files concerning bio.tools. If for some reason an ID change is needed then we create a whole new entry (and the corresponding file structure) and then delete the old one. Given that there might be other files in the parent folder from other provider (e.g. from Bioconda, OpenEBench etc.) we need to keep them and allow the other providers to update their entries. We can also discuss about keeping the old entry/id and in bio.tools redirect to the new id, and also flag this situation in GitHub somehow.

Please give your opinions on the above and also tag others.

@joncison
Copy link
Contributor

joncison commented Mar 17, 2020

The above looks good from a quick read, I'll just highlight / clarify some key points:

  1. Agree biotoolsID should be based on tool name but editable by registrant at registration time (subject to syntax constraint), but only editable post-registration by superuser.
  2. Superuser needs to verify (manually inspect and adjust if necessary) the ID as per the guidelines.
  3. Folk should review the ID guidelines and suggest changes (if really needed) ASAP.
  4. Once manually verified, an idverified or some such flag should be set, at which point the ID can be taken as immutable, and bio.tools URLs based upon it to be persistent.
  5. If for some edge-case reason, a change or new ID really is needed on an existing entry, then this can be requested (and minted by superuser), preserving both IDs in the record (likely the old ID in the otherID field) and ensuring both IDs work, i.e. URLs based on them persistently resolve (to same page)

This is quite a bit of work, but long experience shows that it's necessary, esp. bearing in mind a lot of the value of bio.tools comes from its IDs, and it's nice to have those easily human-readable and concise (hence usable).

Hope this helps!

@joncison
Copy link
Contributor

Oh, and the ID status (and the implications) clearly explained by a label and corresponding pop-up information window in the UI (this used to be there).

@hansioan
Copy link
Collaborator Author

Regarding the biotoolsID change, it should happen not as an update, but as a delete + new registration, the update will be too big of a hassle because:

  • the GitHub folder is the name of the biotoolsID and all the other systems (bio.tools, OpenEBench, Bioconda etc) will have to deal with the sudden change in the folder name and also change their annotations to point to the new id. Creating a new folder is much easier in this case.
  • The bio.tools API for tool updates accepts requests at /api/{old_biotoolsID} , if we change the "biotoolsID" property in the JSON annotation we are still sending PUT requests at the old /api/{old_biotoolsID} ? Seems like a nightmare to handle

Given the above I think it's good enough to do a delete + new registration , the only thing we will lose is the additionDate, but I don't see that as an issue, if it is we can handle it.

@hmenager
Copy link
Collaborator

Thanks @hansioan for stating things clearly. I agree to all of what is there. A few points here:

  • indeed IDs should not be modified as much as possible after "official" acceptation
  • ID modifications, on the github side, should be managed through git mv. So it doesn't really matter on the github side if we lose the additionDate, we still have all of the history stored in the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants