sqm_longreads yields wrong mcount table #686

jllavin77 · 2023-05-19T11:24:07Z

Dear developers,

I have run sqm_longreads on 24 ONT samples and the .out.allreads.mcount file is not correct, Every sample's "reads" related columns are either empty or filled with 0 values, even though "ORF" columns are correct and account for the detected ORFS in each read.
Here is a sample of the table to illustrate my words:

I tried the same analysis with 1 sample and 9 samples and in both cases, the .out.allreads.mcount file was completely correct.

The command I run was (I'm using the latest available version of SqueezeMeta 1.6.2):

sqm_longreads.pl -p PROJECT -s list.txt -f /storage/SQM/PROJECT/ --euk -t 16

Is there any problem if I analyze more than 10 samples in the same batch (when it comes to parsing each sample's results and generating the full table)?

By the way, I can't import the results into SQMtools as I receive an error referring to the results parsing... I presume it is caused by the malformed .mcount table.

I would appreciate it if you could help me fix this problem due to its urgency.

Thanks in advance & Best wishes

JL

The text was updated successfully, but these errors were encountered:

jllavin77 · 2023-06-02T14:36:44Z

No ideas on this issue?

fpusan · 2023-06-05T08:22:13Z

Hi,
Sorry for the delay!
The error in sqmreads2tables.py will indeed come from the malformed mcount table, but I am not sure on what is causing that in the first place.
Can you try running it without the --euk flag and see if the error persists?

jllavin77 · 2023-06-06T12:39:40Z

Thank you for your answer!

I will try to run it that way as soon as I get enough disk available for the run.

The thing is that it didn't happen with the "shorter" (less than 10 samples) runs...

fpusan · 2023-06-06T13:57:49Z

This is indeed weird. Were the shorter runs also using the same parameters?

jllavin77 · 2023-06-06T14:09:30Z

Yep, only changed paths/filenames in the run.

jtamames · 2023-06-19T14:27:14Z

Hello!
Sorry for the delay. Could you please do a "tree" command of your project directory? Sometimes the results are produced but the tables are not correctly generated, I want to know if this is the case.
Best,

jllavin77 · 2023-06-24T12:53:40Z

Hello everybody,

SOrry for the delay but I was following your suggestions before answering back.

I ran the project removing the --euk tag , as suggested, and the results table is still malformed
Find the outcome of the tree command attached (I missread your coment the other day)
Tree_SQM.txt

I hope you can spot the problem, because I have already run the project twice (8 full days running each time) and feel a little concerned with this issue.

Thanks in advance

JL

jllavin77 · 2023-07-05T09:14:15Z

Any insight after looking at the results' directory contents?

I've run the test_install.pl script and All checks seem to be correct. I've tried to run in on 1 and 5 samples only and this time the .mcounts file seems to be incorrect in both cases, now... The only thing I can think about is that I updated SqueezeMeta to version 1,6 before running this batch of samples...
Has anyone experienced this issue too?

jllavin77 · 2023-07-27T08:18:13Z

I finally found out what is the problem.
To sum up quickly, every time the script sqm_longreads stops in the middle of any run for any reason (e.g. power shortage), when you restart the run, the file *.out.allreads resets to 0 Kb, therefore, losing any previous information about each sample's reads stored there up to that moment. That is why the reads are lost, and the correspondent column appears empty for those samples.
Another thing I tried, was to rerun the script on the previous results trying to recover the hits info, but that doesn't work either. It recognizes the previous results and gives you the options to overwrite or keep those results, but if you keep them, the reads related info is still missing.

Is this something that you can fix or should we (the users) accept that sqm_longreads is not a script that can benefit from your awesome "restart" feature?

Thanks in advance & Best wishes

JL

jtamames · 2023-07-27T08:32:21Z

Thanks for the insight!
I guess we could read the allreads file in case it exists and has content in it, record the last processed query sequence, restart with the rest, and merge the result. Not immediate, but it could be done.
Nevertheless Diamond, who is generating that file, often takes a lot of time to store its first results in the file. Not sure if that is also customizable and what could be the impact in performance.
I am putting this in my list of things to look at.
Best,
J

fpusan added bug Something isn't working SQM_reads sqm_longreads labels Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqm_longreads yields wrong mcount table #686

sqm_longreads yields wrong mcount table #686

jllavin77 commented May 19, 2023

jllavin77 commented Jun 2, 2023

fpusan commented Jun 5, 2023

jllavin77 commented Jun 6, 2023

fpusan commented Jun 6, 2023

jllavin77 commented Jun 6, 2023

jtamames commented Jun 19, 2023

jllavin77 commented Jun 24, 2023 •

edited

jllavin77 commented Jul 5, 2023 •

edited

jllavin77 commented Jul 27, 2023 •

edited

jtamames commented Jul 27, 2023

sqm_longreads yields wrong mcount table #686

sqm_longreads yields wrong mcount table #686

Comments

jllavin77 commented May 19, 2023

jllavin77 commented Jun 2, 2023

fpusan commented Jun 5, 2023

jllavin77 commented Jun 6, 2023

fpusan commented Jun 6, 2023

jllavin77 commented Jun 6, 2023

jtamames commented Jun 19, 2023

jllavin77 commented Jun 24, 2023 • edited

jllavin77 commented Jul 5, 2023 • edited

jllavin77 commented Jul 27, 2023 • edited

jtamames commented Jul 27, 2023

jllavin77 commented Jun 24, 2023 •

edited

jllavin77 commented Jul 5, 2023 •

edited

jllavin77 commented Jul 27, 2023 •

edited