Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqm_longreads yields wrong mcount table #686

Open
jllavin77 opened this issue May 19, 2023 · 10 comments
Open

sqm_longreads yields wrong mcount table #686

jllavin77 opened this issue May 19, 2023 · 10 comments
Labels
bug Something isn't working sqm_longreads SQM_reads

Comments

@jllavin77
Copy link

Dear developers,

I have run sqm_longreads on 24 ONT samples and the .out.allreads.mcount file is not correct, Every sample's "reads" related columns are either empty or filled with 0 values, even though "ORF" columns are correct and account for the detected ORFS in each read.
Here is a sample of the table to illustrate my words:

imagen

I tried the same analysis with 1 sample and 9 samples and in both cases, the .out.allreads.mcount file was completely correct.

The command I run was (I'm using the latest available version of SqueezeMeta 1.6.2):

sqm_longreads.pl -p PROJECT -s list.txt -f /storage/SQM/PROJECT/ --euk -t 16

Is there any problem if I analyze more than 10 samples in the same batch (when it comes to parsing each sample's results and generating the full table)?

By the way, I can't import the results into SQMtools as I receive an error referring to the results parsing... I presume it is caused by the malformed .mcount table.

I would appreciate it if you could help me fix this problem due to its urgency.

Thanks in advance & Best wishes

JL

@jllavin77
Copy link
Author

No ideas on this issue?

@fpusan
Copy link
Collaborator

fpusan commented Jun 5, 2023

Hi,
Sorry for the delay!
The error in sqmreads2tables.py will indeed come from the malformed mcount table, but I am not sure on what is causing that in the first place.
Can you try running it without the --euk flag and see if the error persists?

@jllavin77
Copy link
Author

Thank you for your answer!

I will try to run it that way as soon as I get enough disk available for the run.

The thing is that it didn't happen with the "shorter" (less than 10 samples) runs...

@fpusan
Copy link
Collaborator

fpusan commented Jun 6, 2023

This is indeed weird. Were the shorter runs also using the same parameters?

@jllavin77
Copy link
Author

Yep, only changed paths/filenames in the run.

@jtamames
Copy link
Owner

Hello!
Sorry for the delay. Could you please do a "tree" command of your project directory? Sometimes the results are produced but the tables are not correctly generated, I want to know if this is the case.
Best,

@jllavin77
Copy link
Author

jllavin77 commented Jun 24, 2023

Hello everybody,

SOrry for the delay but I was following your suggestions before answering back.

  1. I ran the project removing the --euk tag , as suggested, and the results table is still malformed

  2. Find the outcome of the tree command attached (I missread your coment the other day)
    Tree_SQM.txt

I hope you can spot the problem, because I have already run the project twice (8 full days running each time) and feel a little concerned with this issue.

Thanks in advance

JL

@jllavin77
Copy link
Author

jllavin77 commented Jul 5, 2023

Any insight after looking at the results' directory contents?

I've run the test_install.pl script and All checks seem to be correct. I've tried to run in on 1 and 5 samples only and this time the .mcounts file seems to be incorrect in both cases, now... The only thing I can think about is that I updated SqueezeMeta to version 1,6 before running this batch of samples...
Has anyone experienced this issue too?

@jllavin77
Copy link
Author

jllavin77 commented Jul 27, 2023

I finally found out what is the problem.
To sum up quickly, every time the script sqm_longreads stops in the middle of any run for any reason (e.g. power shortage), when you restart the run, the file *.out.allreads resets to 0 Kb, therefore, losing any previous information about each sample's reads stored there up to that moment. That is why the reads are lost, and the correspondent column appears empty for those samples.
Another thing I tried, was to rerun the script on the previous results trying to recover the hits info, but that doesn't work either. It recognizes the previous results and gives you the options to overwrite or keep those results, but if you keep them, the reads related info is still missing.

Is this something that you can fix or should we (the users) accept that sqm_longreads is not a script that can benefit from your awesome "restart" feature?

Thanks in advance & Best wishes

JL

@jtamames
Copy link
Owner

Thanks for the insight!
I guess we could read the allreads file in case it exists and has content in it, record the last processed query sequence, restart with the rest, and merge the result. Not immediate, but it could be done.
Nevertheless Diamond, who is generating that file, often takes a lot of time to store its first results in the file. Not sure if that is also customizable and what could be the impact in performance.
I am putting this in my list of things to look at.
Best,
J

@fpusan fpusan added bug Something isn't working SQM_reads sqm_longreads labels Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sqm_longreads SQM_reads
Projects
None yet
Development

No branches or pull requests

3 participants