Need help upgrading this code to use the library of pomegranate >= 1.0.0 #1062

Fortune-codebox · 2023-09-23T12:45:18Z

Am new with pomegranate in general and i came across the snippet below but i can't run the script using the new pomegranate>=1.0.0 because obviously some of the variables, classes don't exist anymore. They include

Node
DiscreteDistibution
ConditionalProbabilityTable

I need help upgrading the code to work with pomegranate>=1.0.0,
Thanks.

from pomegranate import *

# Rain node has no parents
rain = Node(DiscreteDistribution({
    "none": 0.7,
    "light": 0.2,
    "heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain
maintenance = Node(ConditionalProbabilityTable([
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()

The text was updated successfully, but these errors were encountered:

jmschrei · 2023-09-23T19:17:04Z

Have you read the tutorial on Bayesian networks in pomegranate >= 1.0.0? https://github.com/jmschrei/pomegranate/blob/master/docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb

Let me know if that's still not helpful

Fortune-codebox · 2023-09-30T15:24:08Z

Yes i was able to come up with a solution for the model using the link you shared but am unable to fit the data successfully,
Please can you help me figuring out the right X data to fit this model using random.randint

import numpy as np
from pomegranate.distributions import *
from pomegranate.bayesian_network import BayesianNetwork

rain = Categorical([[0.7, 0.2, 0.1]])

maintenance = ConditionalCategorical([[[0.4, 0.6], [0.2, 0.8], [0.1, 0.9]]])

train = ConditionalCategorical([[
    [0.8, 0.2],
    [0.9, 0.1],
    [0.6, 0.4],
    [0.7, 0.3],
    [0.4, 0.6],
    [0.5, 0.5]]])

# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_distributions([rain, maintenance, train, appointment])

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

Also i will like to know if there is anything wrong this solution, Thanks

itolosa · 2023-10-02T18:12:05Z

I've successfully managed to run the code:

from pomegranate import *

import numpy as np
from pomegranate.distributions import *
from pomegranate.bayesian_network import BayesianNetwork

rain = Categorical(
    [
        [0.7, 0.2, 0.1],
    ]
)

maintenance = ConditionalCategorical(
    [
        [
            [0.4, 0.6],
            [0.2, 0.8],
            [0.1, 0.9],
        ],
    ]
)

train = ConditionalCategorical(
    [
        [
            [
                [0.8, 0.2],
                [0.9, 0.1],
            ],
            [
                [0.6, 0.4],
                [0.7, 0.3],
            ],
            [
                [0.4, 0.6],
                [0.5, 0.5],
            ],
        ]
    ]
)


appointment = ConditionalCategorical(
    [
        [
            [0.9, 0.1],
            [0.6, 0.4],
        ],
    ]
)


# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_distributions([rain, maintenance, train, appointment])

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

for likelihood.py use this:

import numpy
import torch
from model import model


rain_values = ["none", "light", "heavy"]
maintenance_values = ["yes", "no"]
train_values = ["on time", "delayed"]
appoinment_values = ["attend", "miss"]


probability = model.probability(
    torch.as_tensor(
        [
            [
                rain_values.index("none"),
                maintenance_values.index("no"),
                train_values.index("on time"),
                appoinment_values.index("attend"),
            ]
        ]
    )
)

print(probability)

This code is from cs50ai.- I'm currently taking the course :)

itolosa · 2023-10-02T21:17:43Z

sample.py:

from pomegranate.distributions import ConditionalCategorical

from collections import Counter

from model import model

# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []
for i in range(N):
    sample = model.sample(1)[0]
    # sample == "delayed"
    if sample[2] == 1.0:
        data.append("attend" if sample[3] == 0 else "miss")
print(Counter(data))

inference.py:

import torch
from model import model

X = torch.tensor(
    [
        [
            -1,
            -1,
            1, # delayed
            -1,
        ]
    ]
)

X_masked = torch.masked.MaskedTensor(X, mask=(X != -1))

states = (
    ("rain", ["none", "light", "heavy"]),
    ("maintenance", ["yes", "no"]),
    ("train", ["on time", "delayed"]),
    ("appointment", ["attend", "miss"]),
)

# Calculate predictions
predictions = model.predict_proba(X_masked)

# Print predictions for each node
for (node_name, values), prediction in zip(states, predictions):
    if isinstance(prediction, str):
        print(f"{node_name}: {prediction}")
    else:
        print(f"{node_name}")
        for value, probability in zip(values, prediction[0]):
            print(f"    {value}: {probability:.4f}")

itolosa · 2023-10-03T00:02:07Z

@jmschrei is there any way to get the joint probability of a bayesian network using model.probability(X) where X has some missing facts? (like setting -1 to some data)

I know I can do this by marginalization, but it would be less expensive to just calculate the product of the probabilities up to the current node.

Example:
If my model has A,B,C,D nodes
and I want to compute P(A,B,C), I could do: P(A,B,C) = P(A|B,C)P(B|C)P(C) and ignore D

PS: I'm currently learning this, I could be completely wrong on what I'm doing

jmschrei · 2023-10-03T17:15:37Z

Thanks @itolosa for your help! Where is cs50ai being taught?

Yes, you should be able to use torch.masked.MaskedTensor to indicate missingness. Let me know if you run into any issues.

https://github.com/jmschrei/pomegranate/blob/master/docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb

https://github.com/jmschrei/pomegranate#missing-values

itolosa · 2023-10-03T18:06:37Z

@jmschrei CS50AI Harvard, but I'm taking the online version through edx: link

I've tried using a masked tensor but it fails:

# assume the same model as the previous examples
X = torch.as_tensor(
    [
        [
            rain_values.index("none"),
            maintenance_values.index("no"),
            train_values.index("on time"),
            -1,
        ]
    ]
)

X_masked = torch.masked.MaskedTensor(X, mask=(X != -1))

probability = model.probability(X_masked) # <--- throws an error

Error:

~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py:156: UserWarning: The PyTorch API of MaskedTensors is in prototype stage and will change in the near future. Please open a Github issue for features requests and see our documentation on the torch.masked module for further information about the project.
  warnings.warn(("The PyTorch API of MaskedTensors is in prototype stage "
~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py:299: UserWarning: unbind is not implemented in __torch_dispatch__ for MaskedTensor.
If you would like this operator to be supported, please file an issue for a feature request at https://github.com/pytorch/maskedtensor/issues with a minimal reproducible code snippet.
In the case that the semantics for the operator are not trivial, it would be appreciated to also include a proposal for the semantics.
  warnings.warn(msg)
Traceback (most recent call last):
  File "likelihood.py", line 25, in <module>
    probability = model.probability(X_masked)
  File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/distributions/_distribution.py", line 61, in probability
    return torch.exp(self.log_probability(X))
  File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/bayesian_network.py", line 352, in log_probability
    logps += distribution.log_probability(X_)
  File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/distributions/conditional_categorical.py", line 134, in log_probability
    logps[i] += self._log_probs[j][tuple(X[i, :, j])]
  File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/_tensor.py", line 940, in __iter__
    return iter(self.unbind(0))
  File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py", line 274, in __torch_function__
    ret = func(*args, **kwargs)
TypeError: no implementation found for 'torch._ops.aten.unbind.int' on types that implement __torch_dispatch__: [<class 'torch.masked.maskedtensor.core.MaskedTensor'>]

I guess pomegranate/distributions/conditional_categorical.py", line 134 is failing because it performs an operation that requires unbind but it's not implemented for masked tensors.

itolosa · 2023-10-03T18:15:24Z

In fact, I've isolated the error:

tuple(X_masked)

it throws the same exception as before.

Fortune-codebox · 2023-10-06T15:56:01Z

Thanks a lot @itolosa and @jmschrei, you guys are the best.

jmschrei · 2023-10-06T16:20:32Z

I think a challenge with model.probability is that there are two ways that one could interpret that given incomplete data. The first is that one should marginalize out the unseen variables. The second is that one should infer the missing variables and then calculate the probabilities given the complete, but partially inferred, example. The second can be done by first doing predict and then passing in the completed example.

itolosa · 2023-10-06T16:43:19Z

Although I understand the procedure of the second option, I can't imagine the consequences in terms of the probability -- I'm not an expert on this, so I don't know if they're equivalent or not with the first option.

In any case, in terms of consistency, as a developer I would expect that the method performs the same kind (semantically) of calculations for any given input, and for any other special not so equivalent procedure to use some other method.

Older versions of pomegranate were able to receive an incomplete example and return the probability, so that was my initial expectation when I tried to use model.probability.

I've finally decided to create my own version of a bayesian network, just as a learning exercise, so this issue is no longer a concern for me.

In any case if you still want to implement this, and need help to upgrade some code, I'd be glad to be part of that. @jmschrei

jmschrei · 2023-10-06T16:51:57Z

I agree with you that having the model not accept masked tensors is a problem that I need to fix.

I would love to see your solution.

As you've probably inferred, I'm super time-constrained right now. I'm going on the faculty job market and it's taking more time than I was hoping for. I should have more time starting next year and begin to work through the backlog.

itolosa · 2023-10-06T17:04:04Z

I completely understand.

My solution is not efficient in terms of time complexity nor uses tensors, so I could open a PR to create a new method in the model, not documented for now, just as a proposal to implement the probability with missing facts using tensors (I hope). I can't promise when but I hope soon.

Thank you for taking the time to give us a response. 🤝

jmschrei · 2023-10-06T17:07:09Z

Of course -- thanks for engaging with the package and raising issues/working to find solutions!

If you have time to write a draft solution to the issue, even if it's not the most efficient, that'd be hugely helpful as I can then build off it.

aaa2002 · 2024-03-14T11:10:05Z

I have trouble with Discrete Distribution too.

for this code, no matter what I try to do I get some kind of errors

`
from pomegranate.distributions import *

Unconditional distribution for the metal node

metal = DiscreteDistribution({'T': 0.2, 'F': 0.8})
`

Error:

`NameError Traceback (most recent call last)
Cell In[5], line 7
5 from pomegranate.bayesian_network import BayesianNetwork
6 # Unconditional distribution for the metal node
----> 7 metal = DiscreteDistribution({'T': 0.2, 'F': 0.8})

NameError: name 'DiscreteDistribution' is not defined`

jmschrei · 2024-03-15T08:11:59Z

It's hard for me to provide feedback from only that tiny snippet, but it's worth noting that DiscreteDistribution is no longer in pomegranate as of v1.0.0. None of the distribution objects have the word Distribution in them anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help upgrading this code to use the library of pomegranate >= 1.0.0 #1062

Need help upgrading this code to use the library of pomegranate >= 1.0.0 #1062

Fortune-codebox commented Sep 23, 2023

jmschrei commented Sep 23, 2023

Fortune-codebox commented Sep 30, 2023 •

edited

itolosa commented Oct 2, 2023 •

edited

itolosa commented Oct 2, 2023 •

edited

itolosa commented Oct 3, 2023

jmschrei commented Oct 3, 2023

itolosa commented Oct 3, 2023 •

edited

itolosa commented Oct 3, 2023

Fortune-codebox commented Oct 6, 2023

jmschrei commented Oct 6, 2023

itolosa commented Oct 6, 2023

jmschrei commented Oct 6, 2023

itolosa commented Oct 6, 2023

jmschrei commented Oct 6, 2023

aaa2002 commented Mar 14, 2024 •

edited

jmschrei commented Mar 15, 2024

Need help upgrading this code to use the library of pomegranate >= 1.0.0 #1062

Need help upgrading this code to use the library of pomegranate >= 1.0.0 #1062

Comments

Fortune-codebox commented Sep 23, 2023

jmschrei commented Sep 23, 2023

Fortune-codebox commented Sep 30, 2023 • edited

itolosa commented Oct 2, 2023 • edited

itolosa commented Oct 2, 2023 • edited

itolosa commented Oct 3, 2023

jmschrei commented Oct 3, 2023

itolosa commented Oct 3, 2023 • edited

itolosa commented Oct 3, 2023

Fortune-codebox commented Oct 6, 2023

jmschrei commented Oct 6, 2023

itolosa commented Oct 6, 2023

jmschrei commented Oct 6, 2023

itolosa commented Oct 6, 2023

jmschrei commented Oct 6, 2023

aaa2002 commented Mar 14, 2024 • edited

Unconditional distribution for the metal node

jmschrei commented Mar 15, 2024

Fortune-codebox commented Sep 30, 2023 •

edited

itolosa commented Oct 2, 2023 •

edited

itolosa commented Oct 2, 2023 •

edited

itolosa commented Oct 3, 2023 •

edited

aaa2002 commented Mar 14, 2024 •

edited