Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceData.to_dataframe() can crash the kernel #2258

Open
drbenvincent opened this issue Jul 2, 2023 · 1 comment
Open

InferenceData.to_dataframe() can crash the kernel #2258

drbenvincent opened this issue Jul 2, 2023 · 1 comment

Comments

@drbenvincent
Copy link
Contributor

Describe the bug
When sampling from a model, and trying to extract the samples into a dataframe, I'm finding that it will crash the kernel.

To Reproduce

import arviz as az
import numpy as np
import pymc as pm

N = 10_000

with pm.Model() as model1:
    x = pm.Normal("x", shape=N)
    temp = pm.Normal("temp", shape=N)
    y = pm.Deterministic("y", x + 1 + np.sqrt(3) * temp)
    idata1 = pm.sample_prior_predictive(samples=1)

df = az.extract(idata1, group="prior", var_names=["x", "y"]).squeeze().to_dataframe()
df

When N=10_000, this works on my machine and gives the following result.
Screenshot 2023-07-02 at 09 49 26

NOTE: that the dataframe has 100 million rows

But much above this leads to a kernel crash. In my use case I am using N=100_000.

This is related to the strategy of setting the shape of the variables to N then drawing 1 sample. For example, if we change the shape to the default 1, and instead ask for N samples, then it works fine. This does work as expected:

with pm.Model() as model2:
    x = pm.Normal("x")
    temp = pm.Normal("temp")
    y = pm.Deterministic("y", x + 1 + np.sqrt(3) * temp)
    idata2 = pm.sample_prior_predictive(samples=N)

df = az.extract(idata2, group="prior", var_names=["x", "y"]).squeeze().to_dataframe()
df

Expected behavior
This may well be an edge case, but the expected behaviour would be to get a dataframe with N rows, even with the original model1 strategy.

Additional context
Arviz version: 0.15.1

@ahartikainen
Copy link
Contributor

ahartikainen commented Jul 2, 2023

This is expected behavior with xarray.Dataset (data is in tidy / long -format).

If you want data in wide format, use the to_dataframe from InferenceData object.

idata2.to_dataframe(groups="prior")[["x", "y"]]

We should add var_names to the functionality and also add a function to support xarray Datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants