Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings to produce a convertalis table to output a blast outfm6 table. #87

Open
JoshuaTCooper opened this issue Feb 1, 2024 · 1 comment

Comments

@JoshuaTCooper
Copy link

Expected Behavior

  • I would like to use metaeuk easy-predict to taxtocontig workflow against UniRef90 to annotate contigs from a metagenome and output a blast outfmt 6 table as input for Blobtoolkit (decontamination of bacteria from a microeukaryote assembly).

  • its not clear from the documentation of Metaeuk and to produce a blast-like table without using mmseqs convertalis. The convertalis function will not work with the files produced from taxtocontig either. It says its the wrong database type (needs an alignment db). Attempted to add -a to convert to alignments with mmseqs convertalis module

Current Behavior

using the -a setting, the metaeuk easy-predict and taxtocontig workflow should produce an alignment file but it doesn't finish and complains about the column number in the input is incorrect?
**-a BOOL Add backtrace string (convert to alignments with mmseqs convertalis module) [0]**

at the end of easy-predict the script ERRORS with :
**_there should be 20 columns in the input file. This doesn't seem to be the case._**

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

The sensitivity was set low to make it work quickly and the max-seqs reduced for speed and just for testing. But it usually is kept at 4.0.

metaeuk createdb prokarya_scaffolds.fasta prokarya_scaffoldsDB

metaeuk easy-predict prokarya_scaffoldsDB /home/hh.nku.edu/cooperjo/databases/MetaEuk_db/UniRef90 RESULTSprok tmpFOLDER -s 1 --metaeuk-eval 0.01 --max-seqs 25 -a

MetaEuk Output (for bugs)

Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.

https://gist.github.com/JoshuaTCooper/5f4f1280e767472ac524e836776a9495

Context

Providing context helps us come up with a solution and improve our documentation for the future.

  • I wish the documentation for Metaeuk was more specific and not just referring to MMSEQ. I've tried for a 3 days to interpret MMSEQs guide and I couldn't figure out how to use the -a boolean to output an alignDB (I think?). I seems the mmseqs taxonomy workflow is specific, and produce the correct? files for convertalis.

  • My goal was to create a blast outfmt 6 table to determine taxonomy of my metagenome contigs to be used within Blobtoolkit

  • https://blobtoolkit.genomehubs.org/blobtools2/blobtools2-tutorials/adding-data-to-a-dataset/adding-hits/

Alternatively,

  1. What steps would I run using mmseqs taxonomy to reproduce the settings within metaeuk easy-predict to taxtocontig workflow to create the files for mmseqs convertalis? I also tried using the step by step workflow starting with predicting exons, and got the same error message (FYI).
    ``

  2. If I have run the metaeuk easy-predict and taxtocontig workflow and still have my temp folders, is there a way to extract that information in another way without re-running the full program to get a blast table?

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MetaEuk Version:" when you execute MetaEuk without any parameters):
  • Which MetaEuk version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:

metaeuk Version: 6.a5d39d9
bioconda installed
CPU support for AVX2
256 GB RAM, 16 core server

Thanks in advance!
Josh

@elileka
Copy link
Member

elileka commented Apr 16, 2024

Hi,

I am sorry it took much of your time and I agree the MetaEuk documentation can be improved.

I looked at the link you provided (https://blobtoolkit.genomehubs.org/blobtools2/blobtools2-tutorials/adding-data-to-a-dataset/adding-hits/) and it seems their goal is to assign a taxonomic label to a contig based on aggregating the information of many hits. That is also the goal of MetaEuk's "taxtocontig", so why not try running that after "easy-predict"?

At any rate, adding the '-a' label is not well-defined for MetaEuk and therefore it is being blocked with the error message you received. It is not well-defined since the backtrace information refers to a single exon in the case of MetaEuk, not the entire match, which is what MetaEuk reports. We therefore intentionally chose not to carry this information all throughout MetaEuk's modules.

Best,
Eli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants