Enable bulk loading or gracious failing if certain indicators are not available. #35

SebKrantz · 2021-07-28T13:06:58Z

Hello, thanks a lot for this package and the comprehensive access to World Bank statistics it provides. I am using this package in part to download large volumnes of indicators to populate a local macroeconomic database for a particular country. I have 2 issues here which are (1) loading many indicators sequentially is quite slow, and (2) if a partticular indicator is not available for my country the whole API call fails and everything that was loaded before is lost. To give an example: I wanted to load all 4269 indicators in the Education Statistics database. For Example:

library(wbstats)
indlist <- wb_indicators()
ind <- subset(indlist, source == "Education Statistics")[[1]]
WB_EDU <- wb_data(indicator = ind, country = "KEN", start_date = 1960, end_date = 2021)

Fails after a while with "Error: World Bank API request failed for indicator LO.EGRA.NCWPM.DAG.2GRD", and the rest of my query is lost.

I know this probably means some extra work, but in the medium term, an option to bulk loading all indicators from a specific source and / or skipping indicators which cannot be loaded (i.e. using tryCatch around the API calls and displaying a warning without terminating the request) would be great. Both could also be enabled through additional arguments to wb_data.

The text was updated successfully, but these errors were encountered:

jpiburn · 2021-07-28T13:45:03Z

Hi,

Thank you for opening up this issue. Good idea and something I have been meaning to integrate for awhile. I will keep this issue updated with progress

SebKrantz · 2021-07-28T16:48:42Z

Hi, thanks for the response, I just wrote a function to do bulk loading by source. maybe it is of help. Is there any way to strip the JSON query to reduce the amount of information downloaded for each data value?

wbstats::wbsources()

library(jsonlite)
library(data.table)

WBAPI <- function(country = "KEN", sourceID = 12, series = "all", wide = FALSE, per_page = 100000L, custom_query = NULL) {
  if(length(custom_query)) {
    x <- fromJSON(custom_query) 
  } else {
    per_page <- as.integer(per_page)
    x <- fromJSON(paste0("http://api.worldbank.org/v2/sources/", as.integer(sourceID), "/country/", 
                          as.character(country), "/series/", as.character(series), "?format=json&per_page=", 
                          per_page, "&page=1"))
    if(x$total > per_page) {
      iter <- floor(x$total / per_page)
      for(i in seq_len(iter)) {
        x_i <- tryCatch(fromJSON(paste0("http://api.worldbank.org/v2/sources/", as.integer(sourceID), "/country/", 
                        as.character(country), "/series/", as.character(series), "?format=json&per_page=", 
                        per_page, "&page=", i + 1L)), error = function(e) {
                          warning("Could not complete downloading, stopping at iteration", i)
                          return(x)})
        x$source$data <- rbind(x$source$data, x_i$source$data)
      }
    }
  }
  d <- x$source$data
  cc <- which(!is.na(d$value))
  fld <- function(y, z) c(as.vector(unlist(.subset(y, -1L), use.names = FALSE), "list"), list(z))
  res <-  rbindlist(Map(fld, d$variable[cc], as.vector(d$value[cc], "list")))
  names(res) <- c("iso3c", "indicator", "yr", "country", "label", "year", "value")
  res$year <- as.integer(res$year)
  res$value <- as.numeric(res$value)
  setcolorder(res, c("iso3c", "country", "year", "yr"))
  setorder(res, iso3c, indicator, year)
  if(!wide) return(res)
  un <- which(!duplicated(res$indicator))
  lab <- res$label[un]
  ind <- res$indicator[un]
  res$label <- NULL
  res <- dcast(res, ... ~ indicator, value.var = "value")
  if(!identical(names(res)[-(1:4)], ind)) warning("indicator mismatch")
  oldClass(res) <- NULL # to speed up for loop
  for(i in 5:length(res)) attr(res[[i]], "label") <- lab[i-4L] # setting labels
  oldClass(res) <- c("data.table", "data.frame")
  setDT(res)
  return(res)
}

jpiburn added the enhancement label Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable bulk loading or gracious failing if certain indicators are not available. #35

Enable bulk loading or gracious failing if certain indicators are not available. #35

SebKrantz commented Jul 28, 2021 •

edited

jpiburn commented Jul 28, 2021

SebKrantz commented Jul 28, 2021 •

edited

Enable bulk loading or gracious failing if certain indicators are not available. #35

Enable bulk loading or gracious failing if certain indicators are not available. #35

Comments

SebKrantz commented Jul 28, 2021 • edited

jpiburn commented Jul 28, 2021

SebKrantz commented Jul 28, 2021 • edited

SebKrantz commented Jul 28, 2021 •

edited

SebKrantz commented Jul 28, 2021 •

edited