Add lambda parameter for q-value estimation in enrichment script #1953

ivagljiva · 2022-07-06T15:38:16Z

This PR addresses an issue discovered here, whereby very small p-values would lead to an ugly seq bug like this:

Error in seq.default(0.05, max_lambda, 0.05) :
  wrong sign in 'by' argument

(which can happen because the input file is too short, due to the fact that we remove all core functions/annotations when we generate the input file).

There are two new aspects of the code as of these changes:

When the max_lambda is too small, we stop the script and print a nicer error for the user:

Error: Unfortunately, the maximum lambda for q-value estimation is < 0.05. This
         value is coming from the p-values in the enrichment test, so something in
         your data is making those p-values low. This sometimes happens when you
         have a very short input file (your input file has 4 rows). Regardless,
         a possible solution is for you to pick your own max lambda that is >= 0.05,
         and provide it to the program calling this script by using the
         --qlambda parameter.

The enrichment script now accepts a --qlambda parameter allowing the user to set max_lambda themselves. This optional parameter has been added to the program anvi-compute-functional-enrichment-across-genomes.

On my test data (Enterococcus genomes from the Infant Gut Tutorial, using the annotation source COG20_PATHWAY which only has 3 annotations differentially present across the genomes), using the --qlambda parameter allows the script to proceed through the q-value estimation, but it still ends in this later error:

Error: Doh! We still can't estimate the proportion of features that are truly null. It's Amy's fault, so please let her know ASAP so she can fix it!

Which I suspect is happening because the input file is just too darn small. So even the addition of this parameter doesn't really fix this issue. But at least now users have a bit more control.

@adw96, could you please take a look at these changes and let me know what you think? If you like them, I'll update the documentation and merge the code :)

…t script

adw96 · 2022-07-09T00:37:00Z

@ivagljiva Thanks for your work on this!

As an alternative to erroring when we can't estimate lambda is to return the rest of the results (with p-values) but not the q-values. I think a better approach is to provide the users with a warning that we couldn't do FDR correction (maybe with q-value = NA for all entries so they don't miss the warning) but not an error.

Essentially I think we skip over

qvalues_df <- pvalues_df %>%
  mutate("adjusted_q_value" = qvalue::qvalue(p=unadjusted_p_value, lambda=lambdas, pi0.method="smoother")$qvalues)

(or replace it with qvalues_df <- pvalues_df %>% mutate("adjusted_q_value" = NA) so we have the same number of columns always)

if the "try-error" %in% class(pi0_est) is TRUE.

What do you think?

ivagljiva added 8 commits July 5, 2022 17:51

add --qlambda parameter to enrichment script

77f5cd6

add warning and exit when max lambda is too low

7d2f32d

use user's q-value lambda when provided

a3a8e94

add qlambda parameter to enrichment across genomes script

b2e07f3

add qlambda parameter to utils function that calls enrichment script

dcfed28

if user passes qvalue param, this class will send it to the enrichmen…

f11ee9d

…t script

metabolism enrichment += qlambda param

764b7ef

pan enrichment += qlambda param

d13815d

ivagljiva mentioned this pull request Jul 6, 2022

get-enriched-functions-per-pan-group, R issue? #1383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lambda parameter for q-value estimation in enrichment script #1953

Add lambda parameter for q-value estimation in enrichment script #1953

ivagljiva commented Jul 6, 2022

adw96 commented Jul 9, 2022

Add lambda parameter for q-value estimation in enrichment script #1953

Are you sure you want to change the base?

Add lambda parameter for q-value estimation in enrichment script #1953

Conversation

ivagljiva commented Jul 6, 2022

adw96 commented Jul 9, 2022