Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/jmschrei/pomegranate
Browse files Browse the repository at this point in the history
  • Loading branch information
jmschrei committed Mar 11, 2024
2 parents 2437323 + 00c95c9 commit fa29944
Show file tree
Hide file tree
Showing 21 changed files with 44 additions and 44 deletions.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -165,7 +165,7 @@ Loading:
> **Note**
> `torch.compile` is under active development by the PyTorch team and may rapidly improve. For now, you may need to pass in `check_data=False` when initializing models to avoid one compatibility issue.
In PyTorch v2.0.0, `torch.compile` was introduced as a flexible wrapper around tools that would fuse operations together, use CUDA graphs, and generally try to remove I/O bottlenecks in GPU execution. Because these bottlenecks can be extremely significant in the small-to-medium sized data settings many pomegranate users are faced with, `torch.compile` seems like it will be extremely valuable. Rather than targetting entire models, which mostly just compiles the `forward` method, you should compile individual methods from your objects.
In PyTorch v2.0.0, `torch.compile` was introduced as a flexible wrapper around tools that would fuse operations together, use CUDA graphs, and generally try to remove I/O bottlenecks in GPU execution. Because these bottlenecks can be extremely significant in the small-to-medium sized data settings many pomegranate users are faced with, `torch.compile` seems like it will be extremely valuable. Rather than targeting entire models, which mostly just compiles the `forward` method, you should compile individual methods from your objects.

```python
# Create your object as normal
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Expand Up @@ -8,4 +8,4 @@ pomegranate >= 1.0.0
sphinx-rtd-theme
pandoc
nbsphinx
jinja2==3.0.3
jinja2==3.1.3
12 changes: 6 additions & 6 deletions docs/tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions docs/tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.ipynb

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb
Expand Up @@ -60,7 +60,7 @@
"source": [
"### Initialization and Fitting\n",
"\n",
"Similar to the hidden Markov model, the Bayesian network is comprised of a set of distributions and a graph structure connecting them. In this case, the graph is just a series of directed unweighted edges. Most Bayesian networks require that this graph is acyclic. However, becase pomegranate uses a factor graph to do inference, there is no strict requirement that this is the case. See the inference sections below.\n",
"Similar to the hidden Markov model, the Bayesian network is comprised of a set of distributions and a graph structure connecting them. In this case, the graph is just a series of directed unweighted edges. Most Bayesian networks require that this graph is acyclic. However, because pomegranate uses a factor graph to do inference, there is no strict requirement that this is the case. See the inference sections below.\n",
"\n",
"Likewise, similar to the other models in pomegranate, a Bayesian network can be learned in its entirety from data. However, exact structure learning is intractable and so the field has developed a variety of approximations. See the Bayesian network structure learning tutorial for more.\n",
"\n",
Expand Down Expand Up @@ -109,7 +109,7 @@
"id": "a0ad8a0c",
"metadata": {},
"source": [
"Once these models are initialized with a structue, they can be fit to data."
"Once these models are initialized with a structure, they can be fit to data."
]
},
{
Expand Down Expand Up @@ -386,7 +386,7 @@
"\n",
"Perhaps the most useful application of a learned Bayesian network is the ability to do inference for missing values. Rather than a traditional prediction problem, which has a fixed set of inputs and one or more fixed outputs, Bayesian network inference will use any variables whose values are known to infer any variables whose values are not known. The set of known variables can change across examples, and so do not need to be known in advance.\n",
"\n",
"In pomegranate, this is done using the loopy belief propogation algorithm, sometimes also called the \"sum-product\" algorithm. This algorithm is run on a factor graph, which is constructed in the backend. The trade-offs for this, versus normal junction-tree inference, are that the algorithm is faster, easier to implement, exact for tree-like Bayesian networks, and can provide estimates even for cyclic networks, but that the inference is not guaranteed to be exact in other cases or even to converge when the network is cyclic.\n",
"In pomegranate, this is done using the loopy belief propagation algorithm, sometimes also called the \"sum-product\" algorithm. This algorithm is run on a factor graph, which is constructed in the backend. The trade-offs for this, versus normal junction-tree inference, are that the algorithm is faster, easier to implement, exact for tree-like Bayesian networks, and can provide estimates even for cyclic networks, but that the inference is not guaranteed to be exact in other cases or even to converge when the network is cyclic.\n",
"\n",
"The implementation of the prediction methods differs slightly from other models in pomegranate. First, the unobserved variables are indicated using a `torch.masked.MaskedTensor` object, which holds the underlying data and a mask where `True` means the value is observed and `False` means that it is not observed. When the mask is `False`, it does not matter what the underlying value is. "
]
Expand Down Expand Up @@ -459,7 +459,7 @@
"id": "69c03439",
"metadata": {},
"source": [
"You might notice that the output from these functions is a different shape than other methods. Because there is no guarantee that the variables all have the same number of categories, pomegranate cannot return a single tensor where one of the dimensions is the number of categories. Instead, pomegranate chooses to return a list of tensors, where each element in the list is one variable and the tensor has the dimensions `(n_examples, n_categories)` for the number of categories for that dimension. In principle, one could return a single tensor of size `(n_examples, n_dimensions, max_n_categories)` where `max_n_categories` is the maximum number of categories across all dimensions, but one would likely choose to slice the unneccesary categories out anyway, and there is no guarantee that a single variable with a large number of categories wouldn't come along and massively increase the amount of needed memory. "
"You might notice that the output from these functions is a different shape than other methods. Because there is no guarantee that the variables all have the same number of categories, pomegranate cannot return a single tensor where one of the dimensions is the number of categories. Instead, pomegranate chooses to return a list of tensors, where each element in the list is one variable and the tensor has the dimensions `(n_examples, n_categories)` for the number of categories for that dimension. In principle, one could return a single tensor of size `(n_examples, n_dimensions, max_n_categories)` where `max_n_categories` is the maximum number of categories across all dimensions, but one would likely choose to slice the unnecessary categories out anyway, and there is no guarantee that a single variable with a large number of categories wouldn't come along and massively increase the amount of needed memory. "
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/B_Model_Tutorial_7_Factor_Graphs.ipynb
Expand Up @@ -295,7 +295,7 @@
"id": "215dabac",
"metadata": {},
"source": [
"Similarly to Bayesian networks, factor graphs can make predictions for missing values in data sets. In fact, Bayesian networks and Markov networks both frequently construct factor graphs in the backend to do the actual inference. These approaches use the sum-product algorithm, also called loopy belief propogation. The algorithm works essentially as follows:\n",
"Similarly to Bayesian networks, factor graphs can make predictions for missing values in data sets. In fact, Bayesian networks and Markov networks both frequently construct factor graphs in the backend to do the actual inference. These approaches use the sum-product algorithm, also called loopy belief propagation. The algorithm works essentially as follows:\n",
"\n",
"\n",
"- Initialize messages TO each factor FROM each marginal that is a copy of the marginal distribution\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/C_Feature_Tutorial_1_GPU_Usage.ipynb
Expand Up @@ -409,7 +409,7 @@
"id": "7b1b51e4",
"metadata": {},
"source": [
"Seems significanly faster.\n",
"Seems significantly faster.\n",
"\n",
"Now, let's try with an even more complex model: the dense hidden Markov model."
]
Expand Down
2 changes: 1 addition & 1 deletion docs/whats_new.rst
Expand Up @@ -239,7 +239,7 @@ HiddenMarkovModel
Misc
----

- Unneccessary calls to memset have been removed, courtesy of @alexhenrie
- Unnecessary calls to memset have been removed, courtesy of @alexhenrie
- Checking for missing values has been slightly refactored to be cleaner, courtesy of @mareksmid-lucid
- Include the LICENSE file in MANIFEST.in and simplify a bit, courtesy of @toddrme2178
- Added in a robust from_json method that can be used to deserialize a JSON for any pomegranate model.
Expand Down
2 changes: 1 addition & 1 deletion examples/Bayesian_Network_Monty_Hall.ipynb
Expand Up @@ -53,7 +53,7 @@
"\n",
"To create the Bayesian network in pomegranate, we first create the distributions which live in each node in the graph. For a categorical bayesian network we use Categorical distributions for the root nodes and ConditionalCategorical distributions for the inner and leaf nodes. \n",
"\n",
"First, we can create our \"prize\" and \"guest\" distribtions. These are each Categorical distributions because they do not depend on anything, and they are uniform distributions because they are equally likely to be any of the three doors."
"First, we can create our \"prize\" and \"guest\" distributions. These are each Categorical distributions because they do not depend on anything, and they are uniform distributions because they are equally likely to be any of the three doors."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/_utils.py
Expand Up @@ -78,7 +78,7 @@ def _cast_as_parameter(value, dtype=None, requires_grad=False):


def _update_parameter(value, new_value, inertia=0.0, frozen=None):
"""Update a parameters unles.
"""Update a parameter unles.
"""

if hasattr(value, "frozen") and getattr(value, "frozen") == True:
Expand Down Expand Up @@ -373,7 +373,7 @@ def partition_sequences(X, sample_weight=None, priors=None, n_dists=None):
a different length, and group together sequences of the same length so that
batched operations can be more efficiently done on them.
Alternatively, it can take in sequnces in the correct format and simply
Alternatively, it can take in sequences in the correct format and simply
return them. The correct form is to be either a single tensor that has
three dimensions or a list of three dimensional tensors, where each
tensor contains all the sequences of the same length.
Expand Down
10 changes: 5 additions & 5 deletions pomegranate/bayesian_network.py
Expand Up @@ -31,7 +31,7 @@ class BayesianNetwork(Distribution):
to be cyclic as long as there is no assumption of convergence during
inference.
Inference is doing using loopy belief propogation along a factor graph
Inference is doing using loopy belief propagation along a factor graph
representation. This is sometimes called the `sum-product` algorithm.
It will yield exact results if the graph has a tree-like structure.
Otherwise, if the graph is acyclic, it is guaranteed to converge but not
Expand All @@ -56,7 +56,7 @@ class BayesianNetwork(Distribution):
the parent distribution object and the second element is the child
distribution object. If None, then no edges. Default is None.
struture: tuple or list or None, optional
structure: tuple or list or None, optional
A list or tuple of the parents for each distribution with a tuple
containing no elements indicating a root node. For instance,
((), (0,), (), (0, 2)) would represent a graph with four nodes,
Expand Down Expand Up @@ -358,7 +358,7 @@ def predict(self, X):
This method infers a probability distribution for each of the missing
values in the data. It uses the factor graph representation of the
Bayesian network to run the sum-product/loopy belief propogation
Bayesian network to run the sum-product/loopy belief propagation
algorithm. After the probability distribution is inferred, the maximum
likeihood value for each variable is returned.
Expand Down Expand Up @@ -398,7 +398,7 @@ def predict_proba(self, X):
This method infers a probability distribution for each of the missing
values in the data. It uses the factor graph representation of the
Bayesian network to run the sum-product/loopy belief propogation
Bayesian network to run the sum-product/loopy belief propagation
algorithm.
The input to this method must be a torch.masked.MaskedTensor where the
Expand Down Expand Up @@ -446,7 +446,7 @@ def predict_log_proba(self, X):
This method infers a log probability distribution for each of the
missing values in the data. It uses the factor graph representation of
the Bayesian network to run the sum-product/loopy belief propogation
the Bayesian network to run the sum-product/loopy belief propagation
algorithm.
The input to this method must be a torch.masked.MaskedTensor where the
Expand Down
2 changes: 1 addition & 1 deletion pomegranate/distributions/bernoulli.py
Expand Up @@ -21,7 +21,7 @@ class Bernoulli(Distribution):
independent of the others.
There are two ways to initialize this object. The first is to pass in
the tensor of probablity parameters, at which point they can immediately be
the tensor of probability parameters, at which point they can immediately be
used. The second is to not pass in the rate parameters and then call
either `fit` or `summary` + `from_summaries`, at which point the probability
parameter will be learned from data.
Expand Down
2 changes: 1 addition & 1 deletion pomegranate/distributions/categorical.py
Expand Up @@ -18,7 +18,7 @@ class Categorical(Distribution):
A categorical distribution models the probability of a set of distinct
values happening. It is an extension of the Bernoulli distribution to
multiple values. Sometimes it is refered to as a discrete distribution,
multiple values. Sometimes it is referred to as a discrete distribution,
but this distribution does not enforce that the numeric values used for the
keys have any relationship based on their identity. Permuting the keys will
have no effect on the calculation. This distribution assumes that the
Expand Down
2 changes: 1 addition & 1 deletion pomegranate/distributions/exponential.py
Expand Up @@ -16,7 +16,7 @@ class Exponential(Distribution):
"""An exponential distribution object.
An exponential distribution models scales of discrete events, and has a
rate parameter describing the average time between event occurances.
rate parameter describing the average time between event occurrences.
This distribution assumes that each feature is independent of the others.
Although the object is meant to operate on discrete counts, it can be used
on any non-negative continuous data.
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/distributions/joint_categorical.py
Expand Up @@ -18,12 +18,12 @@ class JointCategorical(Distribution):
"""A joint categorical distribution.
A joint categorical distribution models the probability of a vector of
categorical values occuring without assuming that the dimensions are
categorical values occurring without assuming that the dimensions are
independent from each other. Essentially, it is a Categorical distribution
without the assumption that the dimensions are independent of each other.
There are two ways to initialize this object. The first is to pass in
the tensor of probablity parameters, at which point they can immediately be
the tensor of probability parameters, at which point they can immediately be
used. The second is to not pass in the rate parameters and then call
either `fit` or `summary` + `from_summaries`, at which point the
probability parameters will be learned from data.
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/distributions/normal.py
Expand Up @@ -22,15 +22,15 @@
class Normal(Distribution):
"""A normal distribution object.
A normal distribution models the probability of a variable occuring under
A normal distribution models the probability of a variable occurring under
a bell-shaped curve. It is described by a vector of mean values and a
covariance value that can be zero, one, or two dimensional. This
distribution can assume that features are independent of the others if
the covariance type is 'diag' or 'sphere', but if the type is 'full' then
the features are not independent.
There are two ways to initialize this object. The first is to pass in
the tensor of probablity parameters, at which point they can immediately be
the tensor of probability parameters, at which point they can immediately be
used. The second is to not pass in the rate parameters and then call
either `fit` or `summary` + `from_summaries`, at which point the probability
parameter will be learned from data.
Expand Down
6 changes: 3 additions & 3 deletions pomegranate/distributions/poisson.py
Expand Up @@ -14,9 +14,9 @@
class Poisson(Distribution):
"""An poisson distribution object.
A poisson distribution models the number of occurances of events that
happen in a fixed time span, assuming that the occurance of each event
is independent. This distibution also asumes that each feature is
A poisson distribution models the number of occurrences of events that
happen in a fixed time span, assuming that the occurrence of each event
is independent. This distribution also assumes that each feature is
independent of the others.
There are two ways to initialize this objecct. The first is to pass in
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/distributions/student_t.py
Expand Up @@ -15,7 +15,7 @@
class StudentT(Normal):
"""A Student T distribution.
A Student T distribution models the probability of a variable occuring under
A Student T distribution models the probability of a variable occurring under
a bell-shaped curve with heavy tails. Basically, this is a version of the
normal distribution that is less resistant to outliers. It is described by
a vector of mean values and a vector of variance values. This
Expand All @@ -24,7 +24,7 @@ class StudentT(Normal):
the features are not independent.
There are two ways to initialize this object. The first is to pass in
the tensor of probablity parameters, at which point they can immediately be
the tensor of probability parameters, at which point they can immediately be
used. The second is to not pass in the rate parameters and then call
either `fit` or `summary` + `from_summaries`, at which point the probability
parameter will be learned from data.
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/distributions/uniform.py
Expand Up @@ -17,14 +17,14 @@
class Uniform(Distribution):
"""A uniform distribution.
A uniform distribution models the probability of a variable occuring given
A uniform distribution models the probability of a variable occurring given
a range that has the same probability within it and no probability outside
it. It is described by a vector of minimum and maximum values for this
range. This distribution assumes that the features are independent of
each other.
There are two ways to initialize this object. The first is to pass in
the tensor of probablity parameters, at which point they can immediately be
the tensor of probability parameters, at which point they can immediately be
used. The second is to not pass in the rate parameters and then call
either `fit` or `summary` + `from_summaries`, at which point the probability
parameter will be learned from data.
Expand Down
4 changes: 2 additions & 2 deletions pomegranate/factor_graph.py
Expand Up @@ -27,7 +27,7 @@ class FactorGraph(Distribution):
distributions on the marginal side encode probability estimates from the
data.
Inference is done on the factor graph using the loopy belief propogation
Inference is done on the factor graph using the loopy belief propagation
algorithm. This is an iterative algorithm where "messages" are passed
along each edge between the marginals and the factors until the estimates
for the marginals converges. In brief: each message represents what the
Expand Down Expand Up @@ -461,7 +461,7 @@ def predict_log_proba(self, X):
This method infers a log probability distribution for each of the
missing values in the data. It uses the factor graph representation of
the Bayesian network to run the sum-product/loopy belief propogation
the Bayesian network to run the sum-product/loopy belief propagation
algorithm.
The input to this method must be a torch.masked.MaskedTensor where the
Expand Down
2 changes: 1 addition & 1 deletion pomegranate/hmm/_base.py
Expand Up @@ -331,7 +331,7 @@ def add_distributions(self, distributions):
Parameters
----------
distrbutions: list, tuple, iterable
distributions: list, tuple, iterable
A set of distributions to add to the model.
"""

Expand Down

0 comments on commit fa29944

Please sign in to comment.