-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1:1 alignment for thousands of sequence pairs, not pairwise. #813
Comments
This is possible but a bit tricky: Please make one FASTA file containing all sequence entries. Then call
Then take a look at the
The first two columns are important, you can ignore the last. You will need to make a new TSV file, of the keys of the two matching accessions. In your example, you should see something like the following in the
The new tsv file you need to create should look like this:
Next sort this file according to the first column:
Now you can pass this file to
|
Oh, also this will only work for protein sequences. Nucleotide sequences need a diagonal seed point to compute the alignment, which would be much more difficult to hack. |
Thank you so much for your fast reply. Unfortunately, my sequences are nucleotide. Would be of any help if I provide the seed points? |
You can try. The tsv you need to create has the same format of, but with two more columns score (can be 0) and match diagonal (position i-j):
Then call:
In either case, you can't mix nucleotides and protein-pairs in one run, needs to be either or. |
What criteria should I follow to define the score? |
You can ignore the score, its not used further in the You can pass only one diagonal per query-target pair. You should be able to create multiple lines per query target pair with different diagonals though. The seed point refers to the seed start positions i and j in query and target, respectively. Just out of curiosity: Can I ask what your use-case for this is? |
Of course. We are developing a method that, given a set of dna sequences from different environmental samples, generates pairs of candidates across samples, suggesting they could be generated from the same source. In this case, we would like to curate the candidates through alignment. |
Dear mmseqs2 developvers,
I have a list of thousands of subject - query fasta pairs, and I would like to run mmseqs to align each of these pairs in a way that each sequence is only aligned to its pair. This is how my list looks:
Is there an efficient way to perform these 1:1 alignments? Could I create a database that contains all sequences and then align a database subentry?
I am trying to avoid aligning all against all.
The text was updated successfully, but these errors were encountered: