Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learn_network defaults #28

Open
nick-youngblut opened this issue May 15, 2021 · 1 comment
Open

learn_network defaults #28

nick-youngblut opened this issue May 15, 2021 · 1 comment

Comments

@nick-youngblut
Copy link

The learn_network doc shows:

help?> learn_network
search: learn_network

  learn_network(data_path::AbstractString, meta_data_path::AbstractString) -> FWResult{<:Integer}

  Works like learn_network(data::AbstractArray{<:Real, 2}), but instead of a data
  matrix takes file paths to an OTU table and optionally a meta data table as an
  input.

    •  data_path - path to a file storing an OTU count matrix (and JLD2 meta
       data)

    •  meta_data_path - optional path to a file with meta data

    •  *_key - HDF5 keys to access data sets with OTU counts, Meta variables and
       variable names in a JLD2 file. If a data item is absent the corresponding
       key should be 'nothing'. See '?load_data' for additional information.

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  kwargs... - additional keyword arguments passed to
       learn_network(data::AbstractArray{<:Real, 2})

  ────────────────────────────────────────────────────────────────────────────────────

  learn_network(data::AbstractArray{<:Real, 2}) -> FWResult{<:Integer}

  Learn an interaction network from a data matrix (including OTUs and optionally meta
  variables).

    •  data - data matrix with information on OTU counts and (optionally) meta
       variables

    •  header - names of variable columns in data

    •  meta_mask - true/false mask indicating which variables are meta variables

  Algorithmic parameters

    •  heterogeneous - enable heterogeneous mode for multi-habitat or -protocol
       data with at least thousands of samples (FlashWeaveHE)

    •  sensitive - enable fine-grained association prediction (FlashWeave-S,
       FlashWeaveHE-S), sensitive=false results in the fast modes (FlashWeave-F,
       FlashWeaveHE-F)

    •  max_k - maximum size of conditioning sets, high values can lead to the
       removal of more spurious edgens, but may also strongly increase runtime
       and reduce statistical power. max_k=0 results in no conditioning
       (univariate mode)

    •  alpha - statistical significance threshold at which individual edges are
       accepted

    •  conv - convergence threshold, e.g. if conv=0.01 assume convergence if the
       number of edges increased by only 1% after 100% more runtime (checked in
       intervals)

    •  feed_forward - enable feed-forward heuristic

    •  fast_elim - enable fast-elimiation heuristic

    •  max_tests - maximum number of conditional tests that is performed on a
       variable pair before association is assumed

    •  hps - reliability criterion for statistical tests when sensitive=false

    •  FDR - perform False Discovery Rate correction (Benjamini-Hochberg method)
       on pairwise associations

    •  n_obs_min - don't compute associations between variables having less
       reliable samples (non-zero samples if heterogeneous=true) than this
       number. -1: automatically choose a threshold.

    •  time_limit - if feed-forward heuristic is active, determines the interval
       (seconds) at which neighborhood information is updated

  General parameters

    •  normalize - automatically choose and perform data normalization method
       (based on sensitive and heterogeneous)

    •  track_rejections - store for each discarded edge, which variable set lead
       to its exclusion (can be memory intense for large networks)

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  prec - precision in bits to use for calculations (16, 32, 64 or 128)

    •  make_sparse - use a sparse data representation (should be left at true in
       almost all cases)

    •  make_onehot - create one-hot encodings for meta data variables with more
       than two categories (should be left at true in almost all cases)

    •  update_interval - if verbose=true, determines the interval (seconds) at
       which network stat updates are printed

What are the defaults for these parameters (eg., prec)?

@jtackm
Copy link
Member

jtackm commented May 19, 2021

Hi Nick! Good point, I will look into adding these to the docs. Currently one would have to look directly at the method definitions in learning.jl (e.g. prec defaults to 32).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants