Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile file for cgMLST #13

Open
lskatz opened this issue Jun 27, 2022 · 3 comments
Open

Profile file for cgMLST #13

lskatz opened this issue Jun 27, 2022 · 3 comments

Comments

@lskatz
Copy link

lskatz commented Jun 27, 2022

Hi, related to #11, could you show me what a profile file should look like for cg/wgMLST? I downloaded the wgMLST scheme from the ChewBBACA website and I want to make a STing database out of it. I basically ran this inside the folder of locus fasta files:

ls *.fasta| head | \
  perl -MFile::Basename=basename -lane '
    BEGIN{print "[loci]";} 
    $n=basename($F[0], ".fasta"); 
    $n=~s/INNUENDO_cgMLST-//; 
    print join("\t", $n, $F[0]); 
    END{print "[profile]"; print "profile\tprofile.txt";}
  ' > config.txt

touch profile.txt

And then I get this error

[gzu2@monolith3 Salmonella_enterica.stringMLST]$ indexer -c config.txt
Loading sequences from sequences files:

N       Loci    #Seqs.  File
1       00031717        15      ./INNUENDO_cgMLST-00031717.fasta
2       00031718        35      ./INNUENDO_cgMLST-00031718.fasta
3       00031719        14      ./INNUENDO_cgMLST-00031719.fasta
4       00031720        42      ./INNUENDO_cgMLST-00031720.fasta
5       00031721        5       ./INNUENDO_cgMLST-00031721.fasta
6       00031722        30      ./INNUENDO_cgMLST-00031722.fasta
7       00031723        17      ./INNUENDO_cgMLST-00031723.fasta
8       00031724        17      ./INNUENDO_cgMLST-00031724.fasta
9       00031725        23      ./INNUENDO_cgMLST-00031725.fasta
10      00031726        11      ./INNUENDO_cgMLST-00031726.fasta

Total sequences loaded: 209

Loading the profiles file...
ERROR: At least 11 columns (a ST column + # loci in config file) are required in the profiles file but only 0 were found.
@ar0ch
Copy link
Member

ar0ch commented Jun 27, 2022

Hey Lee, I can pull out an example this evening but to unblock you asap the profile file needs to minimally have two lines that looks like:

ST loci1 loci2 ... lociN
1 1 1 ... 1

Which is just a single dummy ST with allele 1 for all loci.

@lskatz
Copy link
Author

lskatz commented Jun 30, 2022

Ok I bashed it out a bit with those instructions. Thanks for that clarity. Here are the steps I took after the perl -lane step.

numLoci=$(\ls -f1 *.fasta | wc -l)
(
  echo -ne "ST\t"; 
  grep -B 999999 'profile' config.txt | grep -v profile | tail -n +2 | cut -f 1 | tr '\n' '\t'; 
  echo;
) > profile.txt
(for i in `seq 1 $numLoci`; do echo -ne "1\t"; done; echo) >> profile.txt
indexer -c config.txt

It seemed to be working but I ran into a segmentation fault. What do you think?

...
8548    00040264        49      ./INNUENDO_cgMLST-00040264.fasta
8549    00040265        30      ./INNUENDO_cgMLST-00040265.fasta
8550    00040266        144     ./INNUENDO_cgMLST-00040266.fasta
8551    00040267        21      ./INNUENDO_cgMLST-00040267.fasta
8552    00040268        35      ./INNUENDO_cgMLST-00040268.fasta
8553    00040269        12      ./INNUENDO_cgMLST-00040269.fasta
8554    00040270        33      ./INNUENDO_cgMLST-00040270.fasta
8555    00040271        105     ./INNUENDO_cgMLST-00040271.fasta
8556    00040272        71      ./INNUENDO_cgMLST-00040272.fasta
8557    00040273        38      ./INNUENDO_cgMLST-00040273.fasta
8558    00040274        50      ./INNUENDO_cgMLST-00040274.fasta

Total sequences loaded: 2898302

Segmentation fault

@lskatz
Copy link
Author

lskatz commented Jun 30, 2022

I ran time to see if memory or something might be an issue and I guess it is saying it didn't take too much. Just about 7G.

\time index -c config.txt

24.62user 6.63system 0:38.31elapsed 81%CPU (0avgtext+0avgdata 7153812maxresident)k
0inputs+0outputs (0major+1767315minor)pagefaults 0swaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants