Hi, I think I have a shape bug in my code but I can’t figure out what’s wrong. Basically, when I run sample_prior_predictive()
I am getting n
times the number of expected number of samples where n
is the number of groups.
So for example in this MWE, I expected to get 4 (number of groups) * 500 (number of samples requested) = 2000 samples. But I am getting 4 copies for each sample => 8000 total samples.
import pymc3 as pm
import arviz as az
import numpy as np
import pandas as pd
coords = {
'region': ['north', 'east', 'south', 'west'],
'obs_id': np.arange(3 * 4)
}
np.random.seed(0)
df = pd.DataFrame({
'region': np.repeat(coords['region'], 3),
'value': np.array([np.random.normal(loc=x, scale=[1], size=(3)) for x in [10, 11, 12, 15]]).reshape(-1)
})
region_labels, region_levels = pd.factorize(df['region'])
with pm.Model(coords=coords) as unpooled_model:
region_idx = pm.Data('region_idx', region_labels, dims='obs_id')
region_mu = pm.Exponential('region_mu', 0.001, dims='region')
pooled_sigma = pm.Exponential('sigma', 1 / 100)
item_mu = region_mu[region_idx]
y = pm.Normal('y', item_mu, sigma=pooled_sigma, observed=df['value'], dims='obs_id')
prior_checks = pm.sample_prior_predictive(samples=500, random_seed=0)
idata_prior = az.from_pymc3(prior=prior_checks)
df_prior_samples = idata_prior.prior.to_dataframe()
print(df_prior_samples.shape)
df_prior_samples
However, if I change region_mu = pm.Exponential('region_mu', 0.001, dims='region')
to pm.Normal
then I’d get 2000 samples as expected. I played around different distributions and found that only Normal works the way I expected. For example, Gamma, ChiSquared, Uniform, etc. all give me 8000 samples (so this is not an issue of number of parameters of the distributions). Can someone help me understand the reason how this works? Thank you!