Skip to content

📓 Companion code for Grenié et al. 2022 MEE "Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices" preprint: 10.32942/osf.io/e3qnz

Notifications You must be signed in to change notification settings

Rekyt/taxo_harmonization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Taxonomic Harmonization

Code DOI MEE Paper DOI EcoevoRxiv Preprint

Description

This repository contains the companion code and the associated shiny app taxharmonizexplorer of Grenié et al. (2022, MEE). It contains the associated figures of the paper, the code to generate the ones made from code, as well as the code to run the taxonomic harmonization workflows used in the paper.

Citation

This repository accompanies the following article:

Grenié, M., Berti, E., Carvajal-Quintero, J., Dädlow, G.M.L., Sagouis, A. and Winter, M. (2022), Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices. Methods in Ecology and Evolution. Accepted Author Manuscript. https://doi.org/10.1111/2041-210X.13802

Version of the code available in this repository have been archived on Zenodo:

DOI

Refer to the above mentioned DOI to get the last up-to-date version or use the following citation:

Matthias Grenié, Alban Sagouis, & Emilio Berti. (2021, July 22). Rekyt/taxo_harmonization: Submitted version (Version submitted). Zenodo. https://doi.org/10.5281/zenodo.5380285.

Running taxharmonizexplorer (companion shiny app)

The shiny app is available at https://mgrenie.shinyapps.io/taxharmonizexplorer/

The development version of the shiny app can be run with the following command in RStudio:

shiny::runGitHub("Rekyt/taxo_harmonization", subdir = "taxharmonizexplorer")

If you have a local copy of the repository you can run the app with:

shiny::runApp("taxharmonizexplorer")

Scripts used to build the network of tools

The scrips used to build the networks of tools are available in the scripts folder. They are numbered from 01 to 03 to be run sequentially. They all used the raw data available in the Excel sheet available in data/data_raw/Table comparing taxonomic tools.xlsx.

Example harmonization workflows (BioTIME species data)

This repository also contains scripts to harmonize the taxonomy of the BioTIME database. We here present four different workflows on the raw taxonomy from BioTIME (available at data/data_raw/biotime.txt):

  1. Workflow 1 (Torino): working with higher taxonomic groups (standardize species names through higher taxonomic group assignation).
  2. Workflow 2 (Bogota): matching the entire list of species on different taxon-specific databases.
  3. Workflow 3 (GBIF only with pre-processed): pre-process the species names and then match the obtained list against GBIF.
  4. Wrofklow 4 (GBIF only no pre-processing): match raw species list on GBIF.

Results from cleaning are saved in the folder data/data_cleaned/biotime_results/ as csv files. Two columns are present in the csv, parsed is the input name as obtained using rgnparser on BioTIME species names; the second column is the named matched in a specific database. biotime_common.csv is the original BioTIME names with the parsed version from rgnparser and the class and phylum to which they belong obtained using rgbif. The last column common refer to how the high taxonomies are referred (e.g. "vascular plants" for "Trachephyta").

Running the workflow

You can run the entire workflow thanks to the harmonize.R script available in the scripts folder. To run the workflow run the following code:

source("scripts/harmonize.R")

About

📓 Companion code for Grenié et al. 2022 MEE "Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices" preprint: 10.32942/osf.io/e3qnz

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published