Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does SAMBA list changes that were made to the assembly somewhere? #330

Open
MarkusRainerSchmidt opened this issue Oct 5, 2023 · 2 comments

Comments

@MarkusRainerSchmidt
Copy link

Hi,

I am wondering if samba outputs the changes (i.e. what sequences have been inserted, changed, removed) that it made to the assembly?
I would also be fine getting that information from the intermediate files. Is format of these files documented somewhere?

I am guessing that the .patches.uniq.links.txt file holds the changes to the assembly, is that correct?
I also do not fully understand the format.
Every line contsins the following columns: <ctg1>.<pos1> <num1> <str1> <ctg2>.<pos2> <num2> <str2> <len> <seq>
Does each line then mean that <seq> has been placed between <ctg1>.<pos1> and <ctg2>.<pos2>.
If the two strands are different, it would reverse-complement one of the contigs?
What do <num1> and <num2> stand for?

Thanks,

Markus

@bioinfoMMS
Copy link

Hi Markus,

Did you ever figure this out? I have the same questions about the samba output file formats and can't seem to find documentation for it anywhere.

Thanks!

@MarkusRainerSchmidt
Copy link
Author

No, from the output files I could not figure it out.

However, I am running SAMBA in the mode, where it is only allowed to fill in gaps.
So i used this information to create a script that matches the contigs (here: continuous sequences between gaps) of the input assembly against the output assembly.
Since the contigs are not changed at all, you do not even need an aligner here, an exact string match (e.g. str.index in python) is enough.
Then knowing where the contigs from the input are located in the output, you can reproduce the size and position of filled in gaps.
Oh and you have to make sure to cut away the first and last 1000bp (-o parameter of SAMBA) of the input contigs before the matching since SAMBA will mess with these.

Hope that helps,

Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants