-
Notifications
You must be signed in to change notification settings - Fork 584
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
5,980 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
776 changes: 776 additions & 0 deletions
776
tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
863 changes: 863 additions & 0 deletions
863
tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
393 changes: 393 additions & 0 deletions
393
tutorials/C_Feature_Tutorial_2_Mixed_Precision_and_DataTypes.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,393 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fd9dc061", | ||
"metadata": {}, | ||
"source": [ | ||
"## Mixed Precision and Other Data Types\n", | ||
"\n", | ||
"author: Jacob Schreiber <br>\n", | ||
"contact: jmschreiber91@gmail.com\n", | ||
"\n", | ||
"Because torchegranate models are all instances of `torch.nn.Module`, you can do anything with them that you could do with other PyTorch models. In the first tutorial, we saw how this means that one can use GPUs in exactly the same way that one would with their other PyTorch models. However, this also means that all the great things built-in for during half precision, quantization, automatic mixed precision (AMP), etc., can also be used in pomegranate." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "c9cd05a4", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.483551Z", | ||
"start_time": "2023-04-16T02:53:37.715664Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Populating the interactive namespace from numpy and matplotlib\n", | ||
"numpy : 1.21.5\n", | ||
"scipy : 1.9.1\n", | ||
"torch : 2.0.0\n", | ||
"torchegranate: 0.5.0\n", | ||
"\n", | ||
"Compiler : GCC 11.2.0\n", | ||
"OS : Linux\n", | ||
"Release : 5.4.0-146-generic\n", | ||
"Machine : x86_64\n", | ||
"Processor : x86_64\n", | ||
"CPU cores : 48\n", | ||
"Architecture: 64bit\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import os\n", | ||
"os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n", | ||
"\n", | ||
"%pylab inline\n", | ||
"import seaborn; seaborn.set_style('whitegrid')\n", | ||
"\n", | ||
"import torch\n", | ||
"\n", | ||
"numpy.random.seed(0)\n", | ||
"numpy.set_printoptions(suppress=True)\n", | ||
"\n", | ||
"%load_ext watermark\n", | ||
"%watermark -m -n -p numpy,scipy,torch,torchegranate" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d3ffb2aa", | ||
"metadata": {}, | ||
"source": [ | ||
"### float16 (half) Precision\n", | ||
"\n", | ||
"Doing operations at half precision is just the same as it would by in PyTorch: you can use the `.half()` method or the `.to(torch.float16)` method. However, very few operations seem to be supported for half precision, including the log and sqrt methods not being supported for some reason? So, until more operations are supported, you will probably be using other methods.\n", | ||
"\n", | ||
"### bfloat16\n", | ||
"\n", | ||
"More operations seem to be supported for `bfloat16` though!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "d2b18f5b", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.531124Z", | ||
"start_time": "2023-04-16T02:53:44.487008Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(Parameter containing:\n", | ||
" tensor([-0.0076, 0.0164, 0.0164, -0.0033, 0.0105]),\n", | ||
" Parameter containing:\n", | ||
" tensor([1.0291, 1.0491, 0.9661, 0.8947, 1.0778]))" | ||
] | ||
}, | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from torchegranate.distributions import Normal\n", | ||
"\n", | ||
"X = torch.randn(1000, 5)\n", | ||
"\n", | ||
"d = Normal(covariance_type='diag').fit(X)\n", | ||
"d.means, d.covs" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "3a191622", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.543246Z", | ||
"start_time": "2023-04-16T02:53:44.533626Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(Parameter containing:\n", | ||
" tensor([-0.0075, 0.0165, 0.0165, -0.0034, 0.0103], dtype=torch.bfloat16),\n", | ||
" Parameter containing:\n", | ||
" tensor([1.0312, 1.0469, 0.9688, 0.8945, 1.0781], dtype=torch.bfloat16))" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"X = X.to(torch.bfloat16)\n", | ||
"\n", | ||
"d = Normal(covariance_type='diag').to(torch.bfloat16).fit(X)\n", | ||
"d.means, d.covs" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e501c8a5", | ||
"metadata": {}, | ||
"source": [ | ||
"However, not all operations are supported for `torch.bfloat16` either, including the cholesky decomposition used in full covariance normal distributions.\n", | ||
"\n", | ||
"Although not all operations support all data types, all models and methods (inference and training) support them to the extent that the underlying operations allow. For instance, we can just as easily use a mixture model with `bfloat16` data types as with full floats." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "a30dc725", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.575031Z", | ||
"start_time": "2023-04-16T02:53:44.546736Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[1] Improvement: 32.0, Time: 0.001619s\n", | ||
"[2] Improvement: 32.0, Time: 0.001082s\n", | ||
"[3] Improvement: 0.0, Time: 0.00106s\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"GeneralMixtureModel(\n", | ||
" (distributions): ModuleList(\n", | ||
" (0-1): 2 x Normal()\n", | ||
" )\n", | ||
")" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from torchegranate.gmm import GeneralMixtureModel\n", | ||
"\n", | ||
"model = GeneralMixtureModel([Normal(covariance_type='diag'), Normal(covariance_type='diag')], verbose=True)\n", | ||
"model = model.to(torch.bfloat16)\n", | ||
"model.fit(X)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c40b3b61", | ||
"metadata": {}, | ||
"source": [ | ||
"And we can use the resulting trained model to make predictions at whatever resolution we'd like." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "77fa2ecd", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.580895Z", | ||
"start_time": "2023-04-16T02:53:44.576451Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(tensor([[0.8281, 0.1680],\n", | ||
" [0.2451, 0.7539],\n", | ||
" [0.0874, 0.9375],\n", | ||
" ...,\n", | ||
" [0.6445, 0.3574],\n", | ||
" [0.3145, 0.6875],\n", | ||
" [0.2695, 0.7305]], dtype=torch.bfloat16),\n", | ||
" torch.bfloat16)" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"y_hat = model.predict_proba(X)\n", | ||
"y_hat, y_hat.dtype" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "1b654637", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:44.590926Z", | ||
"start_time": "2023-04-16T02:53:44.582423Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(tensor([[0.8299, 0.1701],\n", | ||
" [0.2463, 0.7537],\n", | ||
" [0.0793, 0.9207],\n", | ||
" ...,\n", | ||
" [0.6496, 0.3504],\n", | ||
" [0.3148, 0.6852],\n", | ||
" [0.2788, 0.7212]]),\n", | ||
" torch.float32)" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"model = model.to(torch.float32)\n", | ||
"\n", | ||
"y_hat = model.predict_proba(X)\n", | ||
"y_hat, y_hat.dtype" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4d62dd1b", | ||
"metadata": {}, | ||
"source": [ | ||
"### Automatic Mixed Precision\n", | ||
"\n", | ||
"An automatic way to get around some of these issues is to use AMP so that operations which can work at lower precision are cast and others are not. Keeping up with the theme, doing this is exactly the same as using AMP with your other PyTorch models." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "1dc7e946", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:47.577003Z", | ||
"start_time": "2023-04-16T02:53:44.592405Z" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"X = torch.randn(1000, 50).cuda()\n", | ||
"\n", | ||
"model = GeneralMixtureModel([Normal(), Normal()]).cuda()\n", | ||
"\n", | ||
"with torch.autocast('cuda', dtype=torch.float16):\n", | ||
" model.fit(X)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6b29eb8b", | ||
"metadata": {}, | ||
"source": [ | ||
"This would have crashed if you tried to run `model.fit` alone because of the unsupported Cholesky decomposition.\n", | ||
"\n", | ||
"### Speedups\n", | ||
"\n", | ||
"Unfortunately, because pomegranate uses a wide range of operations to implement the underlying models, the speedups from using mixed precision are inconsistent. It may be worth trying out in your application but the speedups observed in training neural networks are not guaranteed here because not all operations are supported -- and if they are supported, they may not be optimized to be faster. Basically, because AMP will fall back on normal precision for many operations, the entire method may end up not being significantly faster in practice." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "fe0d5f03", | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2023-04-16T02:53:51.522075Z", | ||
"start_time": "2023-04-16T02:53:47.579203Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[1] Improvement: 1015.0, Time: 0.1042s\n", | ||
"[2] Improvement: 417.5, Time: 0.1044s\n", | ||
"[3] Improvement: 310.0, Time: 0.1041s\n", | ||
"[4] Improvement: 243.5, Time: 0.1041s\n", | ||
"\n", | ||
"[1] Improvement: 1232.5, Time: 0.1039s\n", | ||
"[2] Improvement: 613.5, Time: 0.1042s\n", | ||
"[3] Improvement: 446.5, Time: 0.1043s\n", | ||
"[4] Improvement: 344.0, Time: 0.1043s\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"GeneralMixtureModel(\n", | ||
" (distributions): ModuleList(\n", | ||
" (0-9): 10 x Normal()\n", | ||
" )\n", | ||
")" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"X = torch.randn(10000, 500).cuda()\n", | ||
"\n", | ||
"model = GeneralMixtureModel([Normal(covariance_type='diag') for i in range(10)], max_iter=5, verbose=True).cuda()\n", | ||
"with torch.autocast('cuda', dtype=torch.bfloat16):\n", | ||
" model.fit(X)\n", | ||
"\n", | ||
"print()\n", | ||
" \n", | ||
"model = GeneralMixtureModel([Normal(covariance_type='diag') for i in range(10)], max_iter=5, verbose=True).cuda()\n", | ||
"model.fit(X)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.