Skip to content

bmaitner/R_citations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository contains code for investigating how often manuscripts in Ecology and Evolutionary Biology that cite the R software language make their R code available. The R scripts that this work relies upon are contained in the folder 'R_scripts'. The data generated by this work (which includes both scripted and manual components) are stored in the 'data' folder. The 'figures' folder contains figures produced for a related manuscript (in review). For more information, see the preprint at: https://www.authorea.com/doi/full/10.22541/au.170003886.68548206/v1

Important Data

Citation Data

The main data file in this repository is cite_data.RDS. This is an RDS file containing information on citation counts for R files and associated predictor variables. Many of the variables are returned from the Rscopus package (https://cran.r-project.org/web/packages/rscopus/index.html) using the scopus API (https://dev.elsevier.com/sc_apis.html). Metadata on fields returned by the scopus API is available at https://dev.elsevier.com/sc_apis.html. Below, we provide information on fields which are NOT returned by the scopus API (i.e., data which we collected).

  • uid = A unique ID assigned to each record.
  • r_scripts_available = A binary variable (yes/no) describing whether any R code was shared as part of the publication.
  • r_used = A binary variable (yes/no) describing whether R was used in the publication (as opposed to simply referenced without being used).
  • data_available = A binary variable (yes/no) describing whether the full data underlying the publication were included.
  • comments = Unstructured comments about the record. This may contain information about why a judgement was made or where code was found.
  • code location = Text string describing where the code was located, options include: NA, "SI", "figshare", "website", "appendix", "dryad", "github", "Github", "zenodo", "environmental data initiative", "sciencebase.gov", "mendeley data", "osf", "bitbucket"
  • code format = Text string describing the format a code was shared in, options include: NA, "word", "pdf", "R", "typeset text", "rtf", "txt", "rmd"
  • code license = Text string describing the license for the shared code, if any. Note that "NA" means that a license was not specified, where NA means we did not check. Options include: NA, "NA", "GPL", "CC0", "CC-BY", "MIT", "Open", "copyright"
  • n = A numeric index variable used to stratify randomization.

See https://dev.elsevier.com/sc_apis.html for information on the following fields:

  • title
  • author
  • year
  • doi
  • journal
  • issn
  • volume
  • pages
  • date
  • display_date
  • citations
  • article_type
  • open_access

Impact Factor Data

The other important data file in this repository is impact_factor.csv. This is a CSV file containing information on the impact factors of journals used in this work, as recorded on June 16, 2023. This information on impact factor was provided by the R package "scholar" (https://cran.r-project.org/web/packages/scholar/index.html). Below we provide information on the fields included.

  • needed_journals = The list of journals submitted to the scholar R package. These were extracted from the "journal" field of the file cite_data.RDS (see above).
  • Journal = The journal title matched by scholar.
  • Cites = The number of citations of that journal.
  • ImpactFactor = The journal's impact factor.
  • Eigenfactor = The journal's Eigenfactor.
  • dist = The distance between the submitted journal name and the returned journal name, as calcualted by scholar.

Important Code

There are two important R scripts in this repository: 1_data_collection.R and 2_analyses_and_figures.R. The former file was used to select publications for the study (along with relevant metadata). The latter file contains code underlying analyses and visualizations.

DOI