Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need documentation on cgMLST #13

Open
lskatz opened this issue May 5, 2022 · 1 comment
Open

Need documentation on cgMLST #13

lskatz opened this issue May 5, 2022 · 1 comment

Comments

@lskatz
Copy link

lskatz commented May 5, 2022

I am taking some notes on how I ran cgMLST, and I hope you can add documentation for it.

Create database: this took a very long time

# Downloaded the cgMLST scheme from enterobase FTP into Salmonella.cgMLSTv2.enterobase (undocumented)
\ls -f1 Salmonella.cgMLSTv2.enterobase/*.fasta | \
  grep -v cgMLST_v2_ref.fasta `# ignore already-established reference file` | \
  xargs seqtk seq -l 0 `# cat out all the fasta contents and two-line fasta format` | \
  perl -lane '
    # get the id with '>' and the seq on the next line since it is in a two-line fasta format
    $id=$F[0]; 
    $seq=<>; 
    chomp($seq); 
    # I don't think this will matter but just avoid any infinite loops by quitting if we see the same sequence
    my %seen; 
    if($seen{$id}++){print STDERR "Already seen $id. Done."; last;} 

    # Avoid deflines that might be problematic
    if($id =~ /[^_>0-9a-zA-Z]/){
      print STDERR "Skipping ".$id; 
      next;
    } 
    print "$id\n$seq";
  ' > enterobase.filtered.fasta
@verylili
Copy link

I also need.
I downloaded the cgMLST scheme for E.coli. When I tried to create the database for 4 days, the machine-time is only 1.2 hour. I found that the machine time nearly no longer increased when it was close to 1.2 hour. So I had to stop the command for creating a database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants