Sample_ppc shape

Actually the old behavior outputs (1, ), but I recently changed it.

The main reason for changing it is to have a consistent behavior regarding sampling and prior samples. Before my change, if you want to do prior samples (ie, sample from the prior distribution and prior predictive distribution), you sometimes need to write down the shape specifically:

r = pm.StudentT('output', mu=mu, nu=nu, sd=sigma, observed=train, shape=train.shape)

as the observed variable does not inherent the shape from observed. This is quite inconvenient as user need to change the way model is written when they are sampling from the prior or posterior.

Now, this come to the question of why would we want the observed random variable have the same shape as the input observed. It is a bit of a subtle point, the key is the likelihood function of the parameters. By that I mean the likelihood function change as you have more or less data, thus number of observation (i.e., shape) change the geometry of the likelihood function (you can have a look at my recent talk @ pydataberlin https://github.com/junpenglao/All-that-likelihood-with-PyMC3/tree/pydata_Berlin). On the computation side you want the actual observed and the generated (from prior or posterior) fake observed to have the same shape, then when you are passing the (actual or fake) observed array around and evaluating them on some function (eg the likelihood function), the result is consistent.

I hope this clears things up a bit.

1 Like