Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] HMM Function to calculate joint probability P(O | λ) based on learned HMM λ #1085

Open
mikimiky opened this issue Mar 12, 2024 · 1 comment

Comments

@mikimiky
Copy link

mikimiky commented Mar 12, 2024

Hello, using Pomegranate for the first time here (and making a github post for the first time!) , and I am trying to use Pomegranate to construct a continuous Gaussian Mixture HMM to predict stock price returns.

I have finished the EM-learning part of the model using the training data, but I am currently struggling to find the right function under the denseHMM package to use to calculate the probability of the next most plausible observations after the training set.

I am using the approximation of the Maximum a Posteriori (MAP) to find the next best observation, which is as follows:

  1. Let's assume I have stock returns from 10 days, (O(1), O(2), O(3), ... ,O(10)). I would like to find what is the most plausible return in the 11th day, O(11), using the learned HMM model λ.
  2. I come up with discretized values that O(11) can possibly be. For instance, I assume that O(11) will be in the range of [-0.1, 0.1]. I discretize it by a difference of 0.005, so the possible values of O(11) is -0.1, -0.095, -0.090, -0.085,..., -0.005, 0, 0.005, ..., 0.095, 0.1
  3. I iterate through all the above possible values of O(11)
  4. I note which of the possible O(11) values maximises the joint probability P( O(1), O(2), O(3), ..., O(10), O(11) | λ )

(Theoretically, P(O | λ) = Σ(i = 1 to n) α (T) (i), where n is the number of hidden states and α(T)(i) is the value of forward variable of the i-th state at time T (in my example, T = 11))

However, I am confused as to which function under DenseHMM on Pomegranate reflects this joint probability. As I just need the observation probabilities and not the state probability, predict_proba, predict, and predict_log_proba are not relevant.

I then tried to use forward_backward. I would think that the relevant returns for this probability is the logp tensor. However, it gives me a bizarre result of tensor[19.5827]. This is very bizarre to me as this would reflect such a huge actual probability of exp(19.5827).

So 3 lingering questions for now:

  1. what is the best function on denseHMM to calculate this probability?
  2. how is the value of probability and log_probability in pomegranate calculated?
  3. More on the HMM theory: What is the theoretical possibility and inference of a more than 0 log probability (as theoretically, logP is upper bounded by 0)?

Let me know if there are further details needed, thanks in advance!

Edit: Just found out that when inputting my learned GMM distributions, GMM.probability (value) gives an emission probability that is more than 1, which leads to my forward probabilities being more than 1. How can this be the case? Is this because the probability/log-probability calculates the PDF value at that point?
If this helps: I also tried to re-initialize the gaussian mixture obtained from the HMM into a new gaussian mixture object as a test. model.probability([[-0.01779722]]) still gives a value larger than 1.

e3 is a gaussian mixture model that i choose to represent state 3's emission probability.

image

@jmschrei
Copy link
Owner

Hi @mikimiky

I don't think that HMMs are usually used for forecasting like this. Rather, their aim is to assign discrete state labels to an observed time series. Your strategy of calculating the joint probability for each potential next observation seems reasonable, though a bit compute-intensive.

(1) I'd use the log_probability method for that. predict_proba and its friends give you the probability of the observation being generated by each hidden state given all the other observations -- not the probability of observing the data. You could follow the same strategy by calculating the probability (or log probability for numerical stability) for each binned hypothetical observation and then softmaxing.

(2/3) Each distribution has its own way of calculating probabilities but they're fairly standard. I think what's happening in your case is that you are getting a distribution with a tiny covariance matrix. For example, if you have a 1D normal distribution with a mean of 0 and a variance of 0.01, you'll have a probability much higher than 1 at 0, but the integral of the probability function will still be equal to 1. Assuming e3 is a GeneralMixtureModel you can test this by looking at e3.distributions[0].covs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants