Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficently extracting counts from subset of kmers? #36

Closed
nikostr opened this issue May 7, 2024 · 1 comment
Closed

Efficently extracting counts from subset of kmers? #36

nikostr opened this issue May 7, 2024 · 1 comment

Comments

@nikostr
Copy link

nikostr commented May 7, 2024

I have run kmdiff and identified overrepresented kmers among two groups. Following this, I created a membership matrix to identify kmers present in all my case samples, and intersected these with the overrepresented kmers identified by kmdiff. Now I am interested in getting the counts of these in each of my case samples. I already have the count matrices produced by kmdiff. Dumping these to text and grepping them is obviously one way of doing it, but clearly not very efficient. What would your recommendation be here? Unfortunately my C++ is terrible.

@nikostr
Copy link
Author

nikostr commented May 30, 2024

I posted this question before I understood the merge and aggregate command. In case someone else has the same issue, I solved it by doing the following:

kmtricks merge \
    --recurrence-min $N_CASES \
    --cpr \
    --run-dir kmdiff-count \
    --threads 16

kmtricks aggregate \
    --run-dir kmdiff-count \
    --matrix kmer \
    --format text \
    --cpr-in \
    --output count-matrix.out \
    --threads 16

The first command creates a matrix with kmers occurring in at least as many samples as I have cases (N_CASES), and the second command dumps this as a text file. Following this I grepped count-matrix.out with the list of kmers I had identified previously.

Note: using this count matrix it should be possible to find these kmers without creating the membership matrix.

@nikostr nikostr closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant