Merge branch 'master' of https://github.com/jmschrei/pomegranate

jmschrei · Mar 11, 2024 · fa29944 · fa29944
2 parents 2437323 + 00c95c9
commit fa29944
Show file tree

Hide file tree

Showing 21 changed files with 44 additions and 44 deletions.
diff --git a/README.md b/README.md
@@ -165,7 +165,7 @@ Loading:
 > **Note**
 > `torch.compile` is under active development by the PyTorch team and may rapidly improve. For now, you may need to pass in `check_data=False` when initializing models to avoid one compatibility issue.
 
-In PyTorch v2.0.0, `torch.compile` was introduced as a flexible wrapper around tools that would fuse operations together, use CUDA graphs, and generally try to remove I/O bottlenecks in GPU execution. Because these bottlenecks can be extremely significant in the small-to-medium sized data settings many pomegranate users are faced with, `torch.compile` seems like it will be extremely valuable. Rather than targetting entire models, which mostly just compiles the `forward` method, you should compile individual methods from your objects.
+In PyTorch v2.0.0, `torch.compile` was introduced as a flexible wrapper around tools that would fuse operations together, use CUDA graphs, and generally try to remove I/O bottlenecks in GPU execution. Because these bottlenecks can be extremely significant in the small-to-medium sized data settings many pomegranate users are faced with, `torch.compile` seems like it will be extremely valuable. Rather than targeting entire models, which mostly just compiles the `forward` method, you should compile individual methods from your objects.
 
 ```python
 # Create your object as normal

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -8,4 +8,4 @@ pomegranate >= 1.0.0
 sphinx-rtd-theme
 pandoc
 nbsphinx
-jinja2==3.0.3
+jinja2==3.1.3
diff --git a/docs/tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb b/docs/tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb
diff --git a/docs/tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.ipynb b/docs/tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.ipynb
diff --git a/docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb b/docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb
@@ -60,7 +60,7 @@
    "source": [
     "### Initialization and Fitting\n",
     "\n",
-    "Similar to the hidden Markov model, the Bayesian network is comprised of a set of distributions and a graph structure connecting them. In this case, the graph is just a series of directed unweighted edges. Most Bayesian networks require that this graph is acyclic. However, becase pomegranate uses a factor graph to do inference, there is no strict requirement that this is the case. See the inference sections below.\n",
+    "Similar to the hidden Markov model, the Bayesian network is comprised of a set of distributions and a graph structure connecting them. In this case, the graph is just a series of directed unweighted edges. Most Bayesian networks require that this graph is acyclic. However, because pomegranate uses a factor graph to do inference, there is no strict requirement that this is the case. See the inference sections below.\n",
     "\n",
     "Likewise, similar to the other models in pomegranate, a Bayesian network can be learned in its entirety from data. However, exact structure learning is intractable and so the field has developed a variety of approximations. See the Bayesian network structure learning tutorial for more.\n",
     "\n",
@@ -109,7 +109,7 @@
    "id": "a0ad8a0c",
    "metadata": {},
    "source": [
-    "Once these models are initialized with a structue, they can be fit to data."
+    "Once these models are initialized with a structure, they can be fit to data."
    ]
   },
   {
@@ -386,7 +386,7 @@
     "\n",
     "Perhaps the most useful application of a learned Bayesian network is the ability to do inference for missing values. Rather than a traditional prediction problem, which has a fixed set of inputs and one or more fixed outputs, Bayesian network inference will use any variables whose values are known to infer any variables whose values are not known. The set of known variables can change across examples, and so do not need to be known in advance.\n",
     "\n",
-    "In pomegranate, this is done using the loopy belief propogation algorithm, sometimes also called the \"sum-product\" algorithm. This algorithm is run on a factor graph, which is constructed in the backend. The trade-offs for this, versus normal junction-tree inference, are that the algorithm is faster, easier to implement, exact for tree-like Bayesian networks, and can provide estimates even for cyclic networks, but that the inference is not guaranteed to be exact in other cases or even to converge when the network is cyclic.\n",
+    "In pomegranate, this is done using the loopy belief propagation algorithm, sometimes also called the \"sum-product\" algorithm. This algorithm is run on a factor graph, which is constructed in the backend. The trade-offs for this, versus normal junction-tree inference, are that the algorithm is faster, easier to implement, exact for tree-like Bayesian networks, and can provide estimates even for cyclic networks, but that the inference is not guaranteed to be exact in other cases or even to converge when the network is cyclic.\n",
     "\n",
     "The implementation of the prediction methods differs slightly from other models in pomegranate. First, the unobserved variables are indicated using a `torch.masked.MaskedTensor` object, which holds the underlying data and a mask where `True` means the value is observed and `False` means that it is not observed. When the mask is `False`, it does not matter what the underlying value is. "
    ]
@@ -459,7 +459,7 @@
    "id": "69c03439",
    "metadata": {},
    "source": [
-    "You might notice that the output from these functions is a different shape than other methods. Because there is no guarantee that the variables all have the same number of categories, pomegranate cannot return a single tensor where one of the dimensions is the number of categories. Instead, pomegranate chooses to return a list of tensors, where each element in the list is one variable and the tensor has the dimensions `(n_examples, n_categories)` for the number of categories for that dimension. In principle, one could return a single tensor of size `(n_examples, n_dimensions, max_n_categories)` where `max_n_categories` is the maximum number of categories across all dimensions, but one would likely choose to slice the unneccesary categories out anyway, and there is no guarantee that a single variable with a large number of categories wouldn't come along and massively increase the amount of needed memory. "
+    "You might notice that the output from these functions is a different shape than other methods. Because there is no guarantee that the variables all have the same number of categories, pomegranate cannot return a single tensor where one of the dimensions is the number of categories. Instead, pomegranate chooses to return a list of tensors, where each element in the list is one variable and the tensor has the dimensions `(n_examples, n_categories)` for the number of categories for that dimension. In principle, one could return a single tensor of size `(n_examples, n_dimensions, max_n_categories)` where `max_n_categories` is the maximum number of categories across all dimensions, but one would likely choose to slice the unnecessary categories out anyway, and there is no guarantee that a single variable with a large number of categories wouldn't come along and massively increase the amount of needed memory. "
    ]
   },
   {

diff --git a/docs/tutorials/B_Model_Tutorial_7_Factor_Graphs.ipynb b/docs/tutorials/B_Model_Tutorial_7_Factor_Graphs.ipynb
@@ -295,7 +295,7 @@
    "id": "215dabac",
    "metadata": {},
    "source": [
-    "Similarly to Bayesian networks, factor graphs can make predictions for missing values in data sets. In fact, Bayesian networks and Markov networks both frequently construct factor graphs in the backend to do the actual inference. These approaches use the sum-product algorithm, also called loopy belief propogation. The algorithm works essentially as follows:\n",
+    "Similarly to Bayesian networks, factor graphs can make predictions for missing values in data sets. In fact, Bayesian networks and Markov networks both frequently construct factor graphs in the backend to do the actual inference. These approaches use the sum-product algorithm, also called loopy belief propagation. The algorithm works essentially as follows:\n",
     "\n",
     "\n",
     "- Initialize messages TO each factor FROM each marginal that is a copy of the marginal distribution\n",

diff --git a/docs/tutorials/C_Feature_Tutorial_1_GPU_Usage.ipynb b/docs/tutorials/C_Feature_Tutorial_1_GPU_Usage.ipynb
@@ -409,7 +409,7 @@
    "id": "7b1b51e4",
    "metadata": {},
    "source": [
-    "Seems significanly faster.\n",
+    "Seems significantly faster.\n",
     "\n",
     "Now, let's try with an even more complex model: the dense hidden Markov model."
    ]

diff --git a/docs/whats_new.rst b/docs/whats_new.rst
@@ -239,7 +239,7 @@ HiddenMarkovModel
 Misc
 ----
 
-	- Unneccessary calls to memset have been removed, courtesy of @alexhenrie
+	- Unnecessary calls to memset have been removed, courtesy of @alexhenrie
 	- Checking for missing values has been slightly refactored to be cleaner, courtesy of @mareksmid-lucid
 	- Include the LICENSE file in MANIFEST.in and simplify a bit, courtesy of @toddrme2178
 	- Added in a robust from_json method that can be used to deserialize a JSON for any pomegranate model.

diff --git a/examples/Bayesian_Network_Monty_Hall.ipynb b/examples/Bayesian_Network_Monty_Hall.ipynb
@@ -53,7 +53,7 @@
     "\n",
     "To create the Bayesian network in pomegranate, we first create the distributions which live in each node in the graph. For a categorical bayesian network we use Categorical distributions for the root nodes and ConditionalCategorical distributions for the inner and leaf nodes. \n",
     "\n",
-    "First, we can create our \"prize\" and \"guest\" distribtions. These are each Categorical distributions because they do not depend on anything, and they are uniform distributions because they are equally likely to be any of the three doors."
+    "First, we can create our \"prize\" and \"guest\" distributions. These are each Categorical distributions because they do not depend on anything, and they are uniform distributions because they are equally likely to be any of the three doors."
    ]
   },
   {

diff --git a/pomegranate/_utils.py b/pomegranate/_utils.py
@@ -78,7 +78,7 @@ def _cast_as_parameter(value, dtype=None, requires_grad=False):
 
 
 def _update_parameter(value, new_value, inertia=0.0, frozen=None):
-	"""Update a parameters unles.
+	"""Update a parameter unles.
 	"""
 
 	if hasattr(value, "frozen") and getattr(value, "frozen") == True:
@@ -373,7 +373,7 @@ def partition_sequences(X, sample_weight=None, priors=None, n_dists=None):
 	a different length, and group together sequences of the same length so that
 	batched operations can be more efficiently done on them. 
 
-	Alternatively, it can take in sequnces in the correct format and simply 
+	Alternatively, it can take in sequences in the correct format and simply 
 	return them. The correct form is to be either a single tensor that has
 	three dimensions or a list of three dimensional tensors, where each
 	tensor contains all the sequences of the same length.

diff --git a/pomegranate/bayesian_network.py b/pomegranate/bayesian_network.py
@@ -31,7 +31,7 @@ class BayesianNetwork(Distribution):
 	to be cyclic as long as there is no assumption of convergence during
 	inference.
 
-	Inference is doing using loopy belief propogation along a factor graph
+	Inference is doing using loopy belief propagation along a factor graph
 	representation. This is sometimes called the `sum-product` algorithm.
 	It will yield exact results if the graph has a tree-like structure.
 	Otherwise, if the graph is acyclic, it is guaranteed to converge but not
@@ -56,7 +56,7 @@ class BayesianNetwork(Distribution):
 		the parent distribution object and the second element is the child
 		distribution object. If None, then no edges. Default is None.
 
-	struture: tuple or list or None, optional
+	structure: tuple or list or None, optional
 		A list or tuple of the parents for each distribution with a tuple
 		containing no elements indicating a root node. For instance, 
 		((), (0,), (), (0, 2)) would represent a graph with four nodes, 
@@ -358,7 +358,7 @@ def predict(self, X):
 
 		This method infers a probability distribution for each of the missing
 		values in the data. It uses the factor graph representation of the
-		Bayesian network to run the sum-product/loopy belief propogation
+		Bayesian network to run the sum-product/loopy belief propagation
 		algorithm. After the probability distribution is inferred, the maximum
 		likeihood value for each variable is returned.
 
@@ -398,7 +398,7 @@ def predict_proba(self, X):
 
 		This method infers a probability distribution for each of the missing 
 		values in the data. It uses the factor graph representation of the
-		Bayesian network to run the sum-product/loopy belief propogation
+		Bayesian network to run the sum-product/loopy belief propagation
 		algorithm.
 
 		The input to this method must be a torch.masked.MaskedTensor where the
@@ -446,7 +446,7 @@ def predict_log_proba(self, X):
 
 		This method infers a log probability distribution for each of the 
 		missing  values in the data. It uses the factor graph representation of 
-		the Bayesian network to run the sum-product/loopy belief propogation
+		the Bayesian network to run the sum-product/loopy belief propagation
 		algorithm.
 
 		The input to this method must be a torch.masked.MaskedTensor where the

diff --git a/pomegranate/distributions/bernoulli.py b/pomegranate/distributions/bernoulli.py
@@ -21,7 +21,7 @@ class Bernoulli(Distribution):
 	independent of the others.
 
 	There are two ways to initialize this object. The first is to pass in
-	the tensor of probablity parameters, at which point they can immediately be
+	the tensor of probability parameters, at which point they can immediately be
 	used. The second is to not pass in the rate parameters and then call
 	either `fit` or `summary` + `from_summaries`, at which point the probability
 	parameter will be learned from data.

diff --git a/pomegranate/distributions/categorical.py b/pomegranate/distributions/categorical.py
@@ -18,7 +18,7 @@ class Categorical(Distribution):
 
 	A categorical distribution models the probability of a set of distinct
 	values happening. It is an extension of the Bernoulli distribution to
-	multiple values. Sometimes it is refered to as a discrete distribution,
+	multiple values. Sometimes it is referred to as a discrete distribution,
 	but this distribution does not enforce that the numeric values used for the
 	keys have any relationship based on their identity. Permuting the keys will
 	have no effect on the calculation. This distribution assumes that the

diff --git a/pomegranate/distributions/exponential.py b/pomegranate/distributions/exponential.py
@@ -16,7 +16,7 @@ class Exponential(Distribution):
 	"""An exponential distribution object.
 
 	An exponential distribution models scales of discrete events, and has a
-	rate parameter describing the average time between event occurances.
+	rate parameter describing the average time between event occurrences.
 	This distribution assumes that each feature is independent of the others.
 	Although the object is meant to operate on discrete counts, it can be used
 	on any non-negative continuous data.

diff --git a/pomegranate/distributions/joint_categorical.py b/pomegranate/distributions/joint_categorical.py
@@ -18,12 +18,12 @@ class JointCategorical(Distribution):
 	"""A joint categorical distribution.
 
 	A joint categorical distribution models the probability of a vector of
-	categorical values occuring without assuming that the dimensions are
+	categorical values occurring without assuming that the dimensions are
 	independent from each other. Essentially, it is a Categorical distribution
 	without the assumption that the dimensions are independent of each other. 
 
 	There are two ways to initialize this object. The first is to pass in
-	the tensor of probablity parameters, at which point they can immediately be
+	the tensor of probability parameters, at which point they can immediately be
 	used. The second is to not pass in the rate parameters and then call
 	either `fit` or `summary` + `from_summaries`, at which point the 
 	probability parameters will be learned from data.

diff --git a/pomegranate/distributions/normal.py b/pomegranate/distributions/normal.py
@@ -22,15 +22,15 @@
 class Normal(Distribution):
 	"""A normal distribution object.
 
-	A normal distribution models the probability of a variable occuring under
+	A normal distribution models the probability of a variable occurring under
 	a bell-shaped curve. It is described by a vector of mean values and a
 	covariance value that can be zero, one, or two dimensional. This
 	distribution can assume that features are independent of the others if
 	the covariance type is 'diag' or 'sphere', but if the type is 'full' then
 	the features are not independent.
 
 	There are two ways to initialize this object. The first is to pass in
-	the tensor of probablity parameters, at which point they can immediately be
+	the tensor of probability parameters, at which point they can immediately be
 	used. The second is to not pass in the rate parameters and then call
 	either `fit` or `summary` + `from_summaries`, at which point the probability
 	parameter will be learned from data.

diff --git a/pomegranate/distributions/poisson.py b/pomegranate/distributions/poisson.py
@@ -14,9 +14,9 @@
 class Poisson(Distribution):
 	"""An poisson distribution object.
 
-	A poisson distribution models the number of occurances of events that
-	happen in a fixed time span, assuming that the occurance of each event
-	is independent. This distibution also asumes that each feature is
+	A poisson distribution models the number of occurrences of events that
+	happen in a fixed time span, assuming that the occurrence of each event
+	is independent. This distribution also assumes that each feature is
 	independent of the others.
 
 	There are two ways to initialize this objecct. The first is to pass in

diff --git a/pomegranate/distributions/student_t.py b/pomegranate/distributions/student_t.py
@@ -15,7 +15,7 @@
 class StudentT(Normal):
 	"""A Student T distribution.
 
-	A Student T distribution models the probability of a variable occuring under
+	A Student T distribution models the probability of a variable occurring under
 	a bell-shaped curve with heavy tails. Basically, this is a version of the
 	normal distribution that is less resistant to outliers.  It is described by 
 	a vector of mean values and a vector of variance values. This
@@ -24,7 +24,7 @@ class StudentT(Normal):
 	the features are not independent.
 
 	There are two ways to initialize this object. The first is to pass in
-	the tensor of probablity parameters, at which point they can immediately be
+	the tensor of probability parameters, at which point they can immediately be
 	used. The second is to not pass in the rate parameters and then call
 	either `fit` or `summary` + `from_summaries`, at which point the probability
 	parameter will be learned from data.

diff --git a/pomegranate/distributions/uniform.py b/pomegranate/distributions/uniform.py
@@ -17,14 +17,14 @@
 class Uniform(Distribution):
 	"""A uniform distribution.
 
-	A uniform distribution models the probability of a variable occuring given
+	A uniform distribution models the probability of a variable occurring given
 	a range that has the same probability within it and no probability outside
 	it. It is described by a vector of minimum and maximum values for this
 	range.  This distribution assumes that the features are independent of
 	each other.
 
 	There are two ways to initialize this object. The first is to pass in
-	the tensor of probablity parameters, at which point they can immediately be
+	the tensor of probability parameters, at which point they can immediately be
 	used. The second is to not pass in the rate parameters and then call
 	either `fit` or `summary` + `from_summaries`, at which point the probability
 	parameter will be learned from data.

diff --git a/pomegranate/factor_graph.py b/pomegranate/factor_graph.py
@@ -27,7 +27,7 @@ class FactorGraph(Distribution):
 	distributions on the marginal side encode probability estimates from the
 	data. 
 
-	Inference is done on the factor graph using the loopy belief propogation
+	Inference is done on the factor graph using the loopy belief propagation
 	algorithm. This is an iterative algorithm where "messages" are passed
 	along each edge between the marginals and the factors until the estimates
 	for the marginals converges. In brief: each message represents what the
@@ -461,7 +461,7 @@ def predict_log_proba(self, X):
 
 		This method infers a log probability distribution for each of the 
 		missing  values in the data. It uses the factor graph representation of 
-		the Bayesian network to run the sum-product/loopy belief propogation
+		the Bayesian network to run the sum-product/loopy belief propagation
 		algorithm.
 
 		The input to this method must be a torch.masked.MaskedTensor where the

diff --git a/pomegranate/hmm/_base.py b/pomegranate/hmm/_base.py
@@ -331,7 +331,7 @@ def add_distributions(self, distributions):
 
 		Parameters
 		----------
-		distrbutions: list, tuple, iterable
+		distributions: list, tuple, iterable
 			A set of distributions to add to the model.
 		"""