Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What is the difference between predict_proba and log_probability methods for HMMs #1089

Open
ko62147 opened this issue Apr 1, 2024 · 6 comments

Comments

@ko62147
Copy link

ko62147 commented Apr 1, 2024

Hello,

I fitted an HMM to a set of observation sequences, however I get positive log probability values (or probability values greater than 1) when I call the log_probability method on some test observation sequences. What does positive log probability values mean in the context of the HMM inference, and how is the log_probability method different from the predict_proba method?

@jmschrei
Copy link
Owner

jmschrei commented Apr 2, 2024

predict_proba gives you the posterior probabilities that each observation aligns to each hidden state in the model given all of the other observations in the sequence. It's also called the forward-backward algorithm.

log_probability can be positive when you have continuous observations and have a distribution with a very small variance. For instance, if you have a normal distribution with a mean of 0 and a std of 0.0001, a value of 0 will have a probability above 1.

@ko62147
Copy link
Author

ko62147 commented Apr 2, 2024

Thanks for the reply. I am trying to understand the physical meaning of the results from the log_probability method. Does it mean that situations where continuous observations return a positive log probability (or probability greater than 1) have complete certainty (i.e. 100% probability) that the observations/data are generated from the distribution/model?

@jmschrei
Copy link
Owner

jmschrei commented Apr 2, 2024

I think you're entering one of the confusing areas of probability theory. Basically, just because a point estimate is above 1 doesn't mean that it's guaranteed to happen. For instance, in my example above, P(0.0001) would be above 1 but so would P(0.00011). Both can't be guaranteed to happen. Instead, people usually look at probabilites of events happening within ranges of a probability distribution and then set those ranges to be very small, e.g., (P(x+e) - P(x - e)) / 2e In my experience, the most practical interpretation of probabilities greater than 1 is that your model has overfit to something.

@ko62147
Copy link
Author

ko62147 commented Apr 3, 2024

Understood. Thanks for the clarification. What do you recommend to reduce overfitting for HMMs?

@ko62147
Copy link
Author

ko62147 commented Apr 3, 2024

I am fitting/training HMMs using time series (datetime) data transformed into radial basis functions or sine/cosine vectors scaled using min-max scaler. However, I keep obtaining positive log_probability values for some of the test sequence observations using these time series (datetime) transformations. Based on your experience:

  1. What would you recommend to address the positive log_probability values returned for the test observation sequences?
  2. What time series (datetime) transformation would you recommend for datetime observations to fit a HMM?
  3. What do you recommend to eliminate overfitting in HMMs trained on these (continuous) observations?
  4. Is it viable/reasonable to combine (transformed/preprocessed) datetime and binary features as observation sequences to fit/train a HMM?

@jmschrei
Copy link
Owner

jmschrei commented Apr 4, 2024

  1. Having positive log probability values isn't a problem that needs fixing. The math is still all valid, one just needs to know what it means and why.
  2. If you're going to use values explicitly scaled to the 0-1 range you might want to use a distribution like a Beta (you'd have to implement your own) that is explicitly in that range. If you want negative log probabilities and are using a Normal distribution you might try a mean/std scaling instead.
  3. It depends on the model parameters. What does the transition matrix look like? What are the distributions and what do their parameters look like?
  4. Sure, just use https://github.com/jmschrei/pomegranate/blob/master/pomegranate/distributions/independent_components.py This class lets you pass in one univariate distribution for each feature and it can be a totally different distribution type. The one catch is that it doesn't learn covariance across any features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants