Sample_prior_predictive bug, gives wrong shape for dependent variable?

Minimum working Example:

import pandas as pd
import pymc3 as pm

foxes = pd.read_csv('https://github.com/rmcelreath/rethinking/raw/master/data/foxes.csv', sep=';')

with pm.Model() as mdl:
    a = pm.Normal('a', mu=0, sd=0.25)
    b = pm.Normal('b', mu=0, sd=0.4)
    mu = pm.Deterministic('mu', a + b * foxes.area.values.reshape(-1, 1))
    sigma = pm.Exponential('sigma', lam=1)
    weight = pm.Normal('weight', mu=mu, sd=sigma, observed=foxes.weight)
    
    prior = pm.sample_prior_predictive()

{k: prior[k].shape for k in prior.keys()}

As output I get:

{'a': (500,),
 'weight': (500, 116, 116),
 'mu': (500, 116, 1),
 'sigma': (500,),
 'b': (500,),
 'sigma_log__': (500,)}

For “weight”, I would have expected a shape of (500, 116), but instead get (500, 116, 116). Am I misunderstanding something with what sample_prior_predictive should be doing?

Not a bug :slight_smile:
The problem is that you reshaped foxes.area. Your dataset has shape (N,), mu has shape (N, 1). Those are broadcast using numpy rules to (N, N). Basically, you are saying that you have N copies of each observation. To fix it just remove the reshape:

mu = pm.Deterministic('mu', a + b * foxes.area.values) 
2 Likes

Yeah, that did it – I didn’t realize that would make such a difference! Thanks for your help!!

1 Like