Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking and forbidding "citation explosion" #88

Open
matuskalas opened this issue Feb 28, 2020 · 1 comment
Open

Checking and forbidding "citation explosion" #88

matuskalas opened this issue Feb 28, 2020 · 1 comment

Comments

@matuskalas
Copy link
Contributor

(This issue comes from a consensus discussion incl. @bgruening @OlegZharkov )

By "citation explosion", we mean that an article (or another DO) is used as a (primary) citation in way too many tools.

This occured e.g. in some Galaxy tools annotated with a generic Galaxy article DOI, or some Bioconductor packages annotated with a generic Bioconductor article DOI. The records for the overall workbenches | suites | collections should be annotated with these article DOIs, but then not their member tools.

Note: In bio.tools, we can annotate a publication/DOI at the level of a workbench or toolkit, but we should consider if we want to allow it also at the level of (arbitrary) collections. Applicable when the representation of collections is strengthened. It would surely make many maintainers of collections very happy (think also about ELIXIR Service Bundles, or community efforts like Debian Med etc.). @hansioan @joncison @jvanheld

Another kind of occurence, applying to bio.tools, is in multiple deployments (services) of popular tools (e.g. again Galaxy servers, or sequence alignment & search tools). Then it would be ok to allow citation of the generic tool article DOI, but it shouldn't be a 'Primary' publication, and there definitely should be a relation to the actual tool (which then has the citation stated).

There are many reasons why this should not be allowed: noise (incl. worsened search & info integration), multiplication of information, not fair, getting high metrics from different-level publications (altmetrics, cit counts, ...) that isn't about this particular tool or service, etc.

It's not clear yet what a good cut-off of "too many" should be, but very most likely something greater than 4 and MUCH smaller than 10, when we speak in terms of bio.tools records. In case of Bioconda or Debian packages, the number could be a bit bigger, max up to double (more granularity of src pkgs, but no deployed services).

It will be the task of the CI in bio-tools/content to uncover such inappropriate "citation explosion", and warn everyone that curation is needed. The citations over the yet-to-be-found cut-off could also be automatically removed, or in any case at least ignored when integrating. (If auto-removed, then curation will have to add them back to those single or couple of records where they legitimately belong. If not auto-removed, curation will have to go through the reported lists and remove them everywhere except where actually beloging).

The cut-off needs to be explicitly documented for the users, and part of the checks of each record ("Error: This citation has already been used X times as 'Primary'.")

@joncison
Copy link
Contributor

please head over to https://github.com/bio-tools/biotoolsLint to add this to the list of checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants