How could my models’ priors be quickly sampled, in their model and data context?
Use a data-less model to generate parameters, and a data-ful to estimate likelihood:
with pm.Model() as priors_only:
ÎĽ = pm.Normal('mu', 0, 1, shape=(k))
L_ÎŁ_vec = pm.LKJCholeskyCov('Lvec', n=k, eta=0.5)
sim_params = pm.sample(500)
with pm.Model() as data_model:
ÎĽ = pm.Normal('mu', 0, 1, shape=(k))
L_ÎŁ_vec = pm.LKJCholeskyCov('L_vec', n=k, eta=0.5, sd_dist = pm.HalfNormal.dist(1))
L_ÎŁ = pm.Deterministic('L', pm.expand_packed_triangular(k, L_ÎŁ_vec))
X = pm.MvNormal('obs', ÎĽ, chol=L_ÎŁ, observed=data)
llik = data_model.logp # this step is super important
dliks = [llik({'mu': sim_params['mu'][i,:], 'L_vec': sim_params['L_vec'][i, :]}) for i in xrange(500)]
# these are P(X|theta)P(theta)
print 'mean likelihood: %f' % np.mean(dliks)
Would this just be pm.sample_prior_predictive()?
I think this would generate a matrix of stuff that “should look like” the data.
As an aside, the mean likelihood calculated above is a first-order dumb monte-carlo integration. The whole point of NUTS is to make things smarter. Really all this is doing is giving you a sense of (hand-waving) the KL divergence between the un-fit prior and the posterior of the model. From this perspective it’s difficult to distinguish between “bad model” (NO settings of parameters are particularly good) and “bad parameters” (the model is right, but your initial values stink).
A more robust alternative might be to use ADVI (which should be far faster than NUTS) to better approximate the likelihood:
with data_model:
sim_params = pm.ADVI().fit().sample(500)
dliks = [llik({'mu': sim_params['mu'][i,:], 'L_vec': sim_params['L_vec'][i, :]}) for i in xrange(500)]
That should at least help distinguish between poor models and poor initial prior parameters.