Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"OperationalError: attempt to write a readonly database" and others #301

Open
MafaldaSFerreira opened this issue Jul 4, 2020 · 4 comments

Comments

@MafaldaSFerreira
Copy link

Hello Diogo,

I am using TriSeq to filter positions with missing data from different types of alignments (whole chromosome to smaller alignments). I am getting an error particularly when I try to run TriSeq on whole chromosomes. TriSeq starts, but it ends up failing.

At first I was using gnu parallel to run TriSeq with 10 chromosomes at the same time but I got errors. When I thought parallel could be the issue, I decided to copy and paste a TriSeq command for each chromosome to four different bash scripts so, each script would run TriSeq for 5 different chromosomes at the same time. The same error happens. This is the command I am using:

TriSeq -in chr_2_full_algn.fa -of fasta -o chr_2_noMiss_TriSeq --upper-case --missing-filter 0 0

And I get the following error:

( 0 of 1 ) |                                      |N/A% [Elapsed Time: 0:00:00] Traceback (most recent call last):
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/base.py", line 142, in __call__
    self.func(*args)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/TriSeq.py", line 239, in main_parser
    pbar=pbar, use_main_table=True)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 4767, in filter_missing_data
    self._filter_terminals(table_in, table_out, ns)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 4558, in _filter_terminals
    self._create_table(temp_table)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 3292, in _create_table
    "{})".format(table_name, cols))
OperationalError: attempt to write a readonly database
Executing TriSeq module at 03/07/2020 11:06:44
Parsing 1 alignments
Filtering by missing data
Program exited with errors!

I tried googling this error but found little help. I then thought this could be from running TriSeq with parallel or in parallel for several chromosomes at the same time (maybe memory issues?). I tried to call one chromosome at the time. This is what I'm getting:

( 0 of 1 ) |                                      |N/A% [Elapsed Time: 0:00:00] Traceback (most recent call last):
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/base.py", line 142, in __call__
    self.func(*args)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/TriSeq.py", line 239, in main_parser
    pbar=pbar, use_main_table=True)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 4770, in filter_missing_data
    table_in, table_out, ns)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 4632, in _filter_columns
    for p, (column, aln_idx) in enumerate(self.iter_columns(table_in)):
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 3235, in iter_columns
    idx=group_idx))):
OperationalError: database or disk is full

Which is... odd. In the server partition I'm currently working on, we have 604T of space available. My home folder (the one in the error messages - /home/mf239/) has 180T of space available. Could this be related to some other place in the server where TriSeq is trying to write to? For example, a tmp folder? We do have limited space on the default tmp folder, and usually I need to specify a tmp folder on my working space. However, reading the TriSeq options it doesn't seem I can specify this in the command.

Anyway, do you have any idea what could be happening or what I could test to try and solve this?

Thank you very much,
Mafalda

@MafaldaSFerreira
Copy link
Author

MafaldaSFerreira commented Jul 4, 2020

Hi again,

In the meantime other TriSeq commands that I was running on smaller alignments (50 kb alignments from across the genome) also started to fail with a different (but possibly related?) error. I'm going to post this error here as well in case it is somehow related:

Executing TriSeq module at 03/07/2020 11:03:34
Parsing 1 alignments
Traceback (most recent call last):
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/base.py", line 142, in __call__
    self.func(*args)
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/TriSeq.py", line 146, in main_parser
    pbar=pbar)
  File "/home/mf2396/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 2995, in __init__
    index=("main_idx", "aln_idx"))
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/process/sequence.py", line 3292, in _create_table
    "{})".format(table_name, cols))
OperationalError: database is locked
Traceback (most recent call last):
  File "/home/mf239/.local/bin/TriSeq", line 8, in <module>
    sys.exit(main())
  File "/home/mf239/.local/lib/python2.7/site-packages/trifusion/TriSeq.py", line 510, in main
    main_parser(arguments, arguments.infile)
  File "/home/mf239.local/lib/python2.7/site-packages/trifusion/process/base.py", line 166, in __call__
    shutil.rmtree(self.temp_dir)
  File "/home/mf239/anaconda3/envs/GG/lib/python2.7/shutil.py", line 279, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/home/mf239/anaconda3/envs/GG/lib/python2.7/shutil.py", line 277, in rmtree
    os.rmdir(path)
OSError: [Errno 2] No such file or directory: '.trifusion-temp' 

Again, thank you!
Mafalda

@fernandoblalves
Copy link
Collaborator

Dear Mafalda,

Let's try to cover some basics possible errors first.
TriFusion uses ~/.trifusion to store some files. Do you have enough space in your home folder to accommodate temporary files and databases?
Also, it is possible that SQLite uses /tmp in its internal operations. Is there enough space in there as well?

Best

@MafaldaSFerreira
Copy link
Author

Thank you for responding.

I cannot find a ~/.trifusion/ directory in ~/. I installed TriFusion with pip2 since the default python on the server is python3. So, the directory I can find with the same name is under ~/.local/lib/python2.7/site-packages/trifusion. These are its contents:

ls .local/lib/python2.7/site-packages/trifusion
app.py   benchmarks_mem.py   benchmarks.pyc  __init__.pyc          orthomcl_pipeline.pyc  trifusion.kv   TriOrtho.py   TriSeq.pyc
app.pyc  benchmarks_mem.pyc  data            ortho                 process                TriFusion.py   TriOrtho.pyc  TriStats.py
base     benchmarks.py       __init__.py     orthomcl_pipeline.py  progressbar            TriFusion.pyc  TriSeq.py     TriStats.pyc

Would this be the correct directory? If so, there should be enough space (180T) to right here, since this is within my home folder.

As for the /tmp folder, I don't have permission to check how much available space there is in it, but this could be the problem. We recurrently run out of space in the /tmp folder, since it's shared by everyone. To solve this issue before, I've specified a /tmp folder in my home or scratch every time the program allows to do so (with a flag, for example). I couldn't find that option in the list of flags when I call TriSeq, but is there a possibility to specify the path to a temporary folder? I run my jobs on a server managed by slurm, and it seems that I could specify a $TMPDIR variable to force programs to write somewhere different than /tmp (example). Do you think this could work for TriSeq?

Thanks,
Mafalda

@ODiogoSilva
Copy link
Owner

Hi Mafalda,

I'll try to address all the issues you raised. The first one, with the error OperationalError: attempt to write a readonly database, this is because TriFusion programs were not designed to be run in parallele. TriFusion relies on an sqlite database internally and expects to be the only process access that tdatabase (hence the readonly database error). Unfortunatelly, there is no workaround at the moment, which means that multiple runs of TriSeq/TriStats must be sequentially.

Regarding the OperationalError: database or disk is full error, that's very odd indeed. As @fernandoblalves mentioned, it may have to do with the location of the sqlite database (and possibly some internal usage of TMP by sqlite). TriFusion will store the database locally in ${HOME}/.trifusion. I'm not sure whas is set as $HOME on your cluster, but if that variable is set, then it should be there (unless something very wrong happened in the previous execution). There is also the possibility that sqlite is using temporary data internally in a location where it runs out of space. According to sqlite documentation, there are these possible locations: https://sqlite.org/tempfiles.html#tempdir (under Temporary File Storage Locations). Are you able to check if you have access to those locations and if there is enough space in them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants