Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite headers when write mode is append? #21

Closed
xizhihui opened this issue Jun 28, 2021 · 9 comments
Closed

Rewrite headers when write mode is append? #21

xizhihui opened this issue Jun 28, 2021 · 9 comments

Comments

@xizhihui
Copy link

Hi, it seems that headers will be append many times if I append contents to an existing sam when these contents are not appended in just one time(open and close multiple times). Or in other words, does simplesam only support for write not append?

@mdshw5
Copy link
Owner

mdshw5 commented Jun 28, 2021

Or in other words, does simplesam only support for write not append?

Thanks for asking this question. I did only consider w mode file objects, and did not do any check for file modes in the source. If you could describe your use case a bit maybe we can support this. One issue I can see is that appending to an existing SAM file would mean that the existing SAM headers would be immutable, since there is no way we can update just the header portion of an existing file.

@xizhihui
Copy link
Author

Thanks for your reply!
I just wanna split reads into single cells identified by cell barcodes in tag "CR" or "CB" (I found it is hard to do that, maybe there are some other convenient ways.) . So it is not a common usage ~~

@mdshw5
Copy link
Owner

mdshw5 commented Jun 28, 2021

Ah, so you would need to either keep ~10,000 file handles open or open and append to an existing file. I think that is a reasonable use case, and I'll take a look to see how that could be implemented here.

@xizhihui
Copy link
Author

so you would need to either keep ~10,000 file handles open or open and append to an existing file

Yeah, it troubles me hours 😅.
I just find some codes in bamtools issue, herrinca firstly sort the bam to avoid this. Hope it will help.

@mdshw5
Copy link
Owner

mdshw5 commented Jun 28, 2021

@xizhihui I've added some initial support for appending to existing SAM files using simplesam.Writer. If you pass an open file with mode="a" then it the Writer will only add a header if we determine that the SAM file is empty (new). In all other cases we just ignore the header and continue appending records to the existing file.

simplesam/simplesam.py

Lines 223 to 242 in 8e914ce

class Writer(object):
""" Write SAM/BAM format file from :class:`.Sam` objects. """
def __init__(self, f, header=None):
try:
_, ext = os.path.splitext(f.name)
if ext == '.bam':
# Why not just pipe to samtools?
raise NotImplementedError("Bam writing support is not implemented.\n")
except AttributeError: # pipe?
pass
self.file = f
if self.file.mode == 'a' and self.file.tell() == 0:
# We're appending to an empty file. Assume we need a header.
self._merge_header(header)
self._header_dict_write()
elif self.file.mode == 'a' and self.file.tell() > 0 and header:
raise NotImplementedError("Updating headers on existing SAM files is not supported.\n")
else:
self._merge_header(header)
self._header_dict_write()

@mdshw5 mdshw5 reopened this Jun 28, 2021
@mdshw5
Copy link
Owner

mdshw5 commented Jun 28, 2021

If you want to test this out, you can install from the develop branch using pip install -e "git+https://github.com/mdshw5/simplesam.git@develop#egg=simplesam".

@xizhihui
Copy link
Author

Thanks for you updating.

I've tested it. In this check code, if mode is append and self.file.tell() >0, and provide no new header, it goes to the else branch which add a None type header.
I modified it and worked well:

        if self.file.mode == 'a' and self.file.tell() == 0:
            # We're appending to an empty file. Assume we need a header.
            self._merge_header(header)
            self._header_dict_write()
        elif self.file.mode == 'a' and self.file.tell() > 0:
            if header:
                raise NotImplementedError("Updating headers on existing SAM files is not supported.\n")
        else:
            self._merge_header(header)
            self._header_dict_write()

mdshw5 added a commit that referenced this issue Jun 29, 2021
@mdshw5
Copy link
Owner

mdshw5 commented Jun 29, 2021

Thanks @xizhihui. I've incorporated that change and will push this as a new point release.

mdshw5 added a commit that referenced this issue Jun 29, 2021
Add support for appending to existing SAM files (#21).
@mdshw5
Copy link
Owner

mdshw5 commented Jun 29, 2021

Closing this as the change is now in v0.1.4.0. Please let me know if there's anything else I can do to make your use case work better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants