Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Fitting multivariate Markov Chain throws Index out of bounds error #1077

Open
salpers opened this issue Jan 22, 2024 · 3 comments

Comments

@salpers
Copy link

salpers commented Jan 22, 2024

Hey there,

I try to fit a Markov Chain model on multivariate, categorical sequential data.
After label encoding my sequences to integers, I pad them with 0 so they all have the same length.
The resulting Tensor is of shape (932,132,3) - 932 Observations of length 132 (0 padded) with 3 features for each element.

However, I get an Index out of bounds error when I try to fit the model.

from pomegranate.markov_chain import MarkovChain

model = MarkovChain(k = 3)
model.fit(data)

File ... /pomegranate/markov_chain.py:216, in MarkovChain.fit(self, X, sample_weight)
    193 def fit(self, X, sample_weight=None):
    194 	"""Fit the model to optionally weighted examples.
    195 
    196 	This method will fit the provided distributions given the data and
   (...)
    213 	self
    214 	"""
--> 216 	self.summarize(X, sample_weight=sample_weight)
    217 	self.from_summaries()
    218 	return self

File .../pomegranate/markov_chain.py:276, in MarkovChain.summarize(self, X, sample_weight)
    274 for i in range(X.shape[1] - self.k):
    275 	j = i + self.k + 1
--> 276 	distribution.summarize(X[:, i:j], sample_weight=sample_weight)

File .../pomegranate/distributions/conditional_categorical.py:168, in ConditionalCategorical.summarize(self, X, sample_weight)
    165 strides = torch.tensor(self._xw_sum[j].stride(), device=X.device)
    166 X_ = torch.sum(X[:, :, j] * strides, dim=-1)
--> 168 self._xw_sum[j].view(-1).scatter_add_(0, X_, sample_weight[:,j])
    169 self._w_sum[j][:] = self._xw_sum[j].sum(dim=-1)

RuntimeError: index 21869 is out of bounds for dimension 0 with size 14520

I would appreciate it if you could help me with the issue or point out any mistakes in my approach.

@salpers
Copy link
Author

salpers commented Jan 23, 2024

I experimented with changing the data, however the issue is also reproducible with random small data.

import numpy as np
from pomegranate.markov_chain import MarkovChain

np.random.seed(137)
seq_data = np.random.randint(0, 10, (1,10,1))

model = MarkovChain(k = 1)
model.fit(seq_data) 

throws

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[99], line 5
      2 seq_data = np.random.randint(0, 10, (1,6,1))
      4 model = MarkovChain(k = 1)
----> 5 model.fit(seq_data)

File /opt/conda/lib/python3.10/site-packages/pomegranate/markov_chain.py:216, in MarkovChain.fit(self, X, sample_weight)
    193 def fit(self, X, sample_weight=None):
    194 	"""Fit the model to optionally weighted examples.
    195 
    196 	This method will fit the provided distributions given the data and
   (...)
    213 	self
    214 	"""
--> 216 	self.summarize(X, sample_weight=sample_weight)
    217 	self.from_summaries()
    218 	return self

File /opt/conda/lib/python3.10/site-packages/pomegranate/markov_chain.py:276, in MarkovChain.summarize(self, X, sample_weight)
    274 for i in range(X.shape[1] - self.k):
    275 	j = i + self.k + 1
--> 276 	distribution.summarize(X[:, i:j], sample_weight=sample_weight)

File /opt/conda/lib/python3.10/site-packages/pomegranate/distributions/conditional_categorical.py:168, in ConditionalCategorical.summarize(self, X, sample_weight)
    165 strides = torch.tensor(self._xw_sum[j].stride(), device=X.device)
    166 X_ = torch.sum(X[:, :, j] * strides, dim=-1)
--> 168 self._xw_sum[j].view(-1).scatter_add_(0, X_, sample_weight[:,j])
    169 self._w_sum[j][:] = self._xw_sum[j].sum(dim=-1)

RuntimeError: index 42 is out of bounds for dimension 0 with size 28

@salpers salpers closed this as completed Jan 23, 2024
@salpers salpers reopened this Jan 23, 2024
@Koenig128
Copy link

Hi,

I got the same error. Have you been able to fix it in the meantime? Does anyone else have a suggestion?

I would really appreciate any help on this.

Thank you!

@jmschrei
Copy link
Owner

This should be fixed in v1.0.4. Please let me know if you encounter any other issues. In the future, if you run into challenges you can pass in n_categories to the MarkovChain or make the list of distributions (one Categorical and then a series of k ConditionalCategorical objects) yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants