Question: Can i build a HMM, where the distribution depends on a value of a specific position #1080

LauraSkak · 2024-02-27T14:18:34Z

Hi

This is not a bug, but a question.
I'm in the search for a software that can help me build a HMM that can identify imprinted regions using data produced with dorado and modkit from Oxford Nanopore Technologies.

For each position and each haplotype i have a count modified, count unmodified and a total valid coverage count. In my opinion, there is three/four states; both haplotypes are modified, both haplotypes are unmodified and one haplotype is modified and the other is not. My theory is that i can build a HMM where the emission probability of a given position depends on the coverage of that position. No sure if i should use a binomial or beta distribution here. Is this possible with pomegranate?

Thanks in advance for any help you can provide... I'm a bit lost as to how i should attack this.

Laura

jmschrei · 2024-02-28T01:11:33Z

Hi Laura

It is possible to use a variety of distributions with pomegranate. Binomial distributions are already implemented (https://github.com/jmschrei/pomegranate/blob/master/pomegranate/distributions/bernoulli.py) but these assume a fixed number of total trials. However, you can easily make your own distributions using the existing ones as a template. You just need to fill in the few functions that are there and they can be dropped into any existing model.

My understanding of your specific problem is that you modified and unmodified counts and want to get a probability for each hidden state given both the ratio of these counts and their total value, because higher total counts means higher accuracy, right? Modeling just the ratios is easy -- you can use a beta distribution since the range is restricted to (0, 1) (you'd have to implement that yourself unfortunately) or a Gaussian distribution if you want to approximate it for now. If you want to include both the ratio and the counts you might be better off using binomial distributions but, as I mentioned above, you'd probably need to modify the code to handle variable numbers of total counts.

Sorry that I couldn't be of more help.

LauraSkak · 2024-02-28T13:14:17Z

Thank you for your quick response!

Yeah, thats the problem, that the probability of modification in each state should be constant but the emission probability should then depend on how many reads are mapped to that position and how many of those reads are classed as modified.

I'm still quite new in the field, so figuring out how i implement my own distributions is a big ask for me atm... I'm currently trying to implement something myself from scratch, so we'll see how that goes.

Again, thanks for the help!

jmschrei · 2024-02-28T17:34:20Z

I understand, good luck!

If you're new to the field I'd suggest starting with a more basic model, like using Gaussian emissions, and seeing how good you can get that model before running into issues. Then, you'd already have a good baseline set up to demonstrate why the more complicated emissions are necessary. Hopefully by that point you'd be more comfortable with the setting and modeling and more able to write your own emissions and test them in a setting you know already works fairly well (by replacing your Gaussian emissions with them).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Can i build a HMM, where the distribution depends on a value of a specific position #1080

Question: Can i build a HMM, where the distribution depends on a value of a specific position #1080

LauraSkak commented Feb 27, 2024

jmschrei commented Feb 28, 2024

LauraSkak commented Feb 28, 2024

jmschrei commented Feb 28, 2024

Question: Can i build a HMM, where the distribution depends on a value of a specific position #1080

Question: Can i build a HMM, where the distribution depends on a value of a specific position #1080

Comments

LauraSkak commented Feb 27, 2024

jmschrei commented Feb 28, 2024

LauraSkak commented Feb 28, 2024

jmschrei commented Feb 28, 2024