BIEN_trait_mean performance #16

achmurzy · 2020-05-21T20:34:24Z

I'm trying to pull as many trait means as possible for the following list of species:
names.txt

Using vectorized versions of BIEN_trait_mean(vector_of_species_names, vector_of_traits) usually crashes my R console. I'm not sure if its on the backend, but returning the list of trait ids by default could be part of the issue. Maybe we could add a flag to optionally add the list of trait IDs? It greatly increases the size of the data frame that gets returned, and it would be nice if it were optional.

So what I'm doing now is querying means one-by-one:
for species in species_list:
for trait in trait_list:
BIEN_trait_mean(species, trait)
rbind(traits, new_trait)
This isn't the 'R' way of doing it but it works quickly - vectorizing a list of 20 species crashes my console.

achmurzy · 2020-05-23T19:31:38Z

Okay playing with this further I was able to determine that:
-The trait IDs aren't the problem, at least I don't think so
-rather, I didn't realize that BIEN_trait_mean is only intended to return one trait at a time. I had been inputting a vector of traits like so:
trait_list <- BIEN_trait_list()
BIEN_trait_mean(species, trait_list)
to pull everything. This returns the warning:
In if (!trait %in% traits_available$trait_name) { :
the condition has length > 1 and only the first element will be used
Then returned traits all have the same value.
1 Pentaclethra macrophylla 15.7878787878788 flower color
2 Pentaclethra macrophylla 15.7878787878788 flower pollination syndrome cm
3 Pentaclethra macrophylla 15.7878787878788 fruit type
4 Pentaclethra macrophylla 15.7878787878788 inflorescence length cm
level_used sample_size
1 Family 533
2 Family 533
3 Family 533
4 Family 533

I think it will be common for people to want to pull every trait and to call the function as I did above. Right now you have to write a for-loop to do it one at a time (which works great and is pretty fast). However, it might be better to prevent putting multiple traits into BIEN_trait_mean, or make sure it supports vectorized trait lists.

-Finally, Querying DBH also tends to be extremely slow as you suggested and I think you're right about this crashing the console. In particular, calculating mean DBH at the Family level could be drawing many thousands of records without being very informative. Additionally, the trait 'whole plant height' seems to behave the same way. The R process gets 'Killed' probably because the SQL query returns way too much stuff. Maybe DBH data should only be available through the stem.R module? These are traits that take > 15 minutes to query data then eventually just crash the console, so maybe higher density measurements need some special treatment. The other traits return values in less than 30 seconds.

achmurzy changed the title ~~BIEN_trait_mean performance and accuracy~~ BIEN_trait_mean performance May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIEN_trait_mean performance #16

BIEN_trait_mean performance #16

achmurzy commented May 21, 2020 •

edited

achmurzy commented May 23, 2020 •

edited

BIEN_trait_mean performance #16

BIEN_trait_mean performance #16

Comments

achmurzy commented May 21, 2020 • edited

achmurzy commented May 23, 2020 • edited

achmurzy commented May 21, 2020 •

edited

achmurzy commented May 23, 2020 •

edited