Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileDataSource is in SAMRecord #8671

Open
1 task done
LeeTL1220 opened this issue Jan 31, 2024 · 1 comment
Open
1 task done

FileDataSource is in SAMRecord #8671

LeeTL1220 opened this issue Jan 31, 2024 · 1 comment

Comments

@LeeTL1220
Copy link
Contributor

Bug Report

Affected tool(s) or class(es)

SAMRecord from GATKRead

Affected version(s)

  • Latest master branch as of January 30, 2024

Description

When I run a tool with a bam file as input, the following code will give me a null:

    @Override
    public void apply(GATKRead read, ReferenceContext referenceContext, FeatureContext featureContext) {

        // Build sets of read IDs for each file.
        final SAMRecord samRecord = read.convertToSAMRecord(getHeaderForReads());
        final SAMFileSource fileSource = samRecord.getFileSource();
        System.out.println(fileSource);

Output:
(a long list of null)

Steps to reproduce

Create a ReadWalker that takes in a bam file. Here is an integration test that will replicate the issue:

public class ReadConcordanceIntegrationTest extends CommandLineProgramTest {

    @Test
    public void testTwoCrams() throws IOException {
        final File output = createTempFile("testReadConcordanceOutputFile", ".txt");
        final File input = new File(GATKBaseTest.largeFileTestDir, "expected.K-562.splitNCigarReads.chr20.bam");

        final ArgumentsBuilder args = new ArgumentsBuilder();

        args.addInput(input);
        this.runCommandLine(args.getArgsArray());
    }
}

Expected behavior

Output should be the file used in the read data source (bam file) for each read.

Actual behavior

I get nulls instead

@droazen
Copy link
Collaborator

droazen commented Jan 31, 2024

We should expose a getSource() method at the GATKRead level, and have GATK delegate to samRecord.getFileSource() in the SAMRecord case. We might need to do something extra to get HTSJDK to populate this field for us.

Use case is a ReadsDataSource backed by multiple bam/cram inputs, with the reads merged into a single sorted stream, and the tool needing to be able to tell where each read came from. Can't always use read groups / sample names to accomplish this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants