Full outputs are in this note book but here’s the summary GroupLevelPredictions.ipynb · GitHub
This first model produces 81 divergences. The second model below produces something like ~1000 divergences, even though nothing has changed for any of the parameters used in the likelihood calculation. The logp for the test point is the same, so I’m curious why sampling is different. Maybe its something as simple as that the initial start point is different?
with pm.Model() as model_hierarchical_salad_sales_predictions:
σ = pm.HalfNormal("σ", 20)
β_μ_hyperprior = pm.Normal("β_μ_hyperprior", 10, 10)
β_σ_hyperprior = pm.HalfNormal("β_σ_hyperprior", 10)
β_offset = pm.Normal('β_offset', mu=0, sd=1, shape=6)
β = pm.Deterministic("β", β_μ_hyperprior + β_offset * β_σ_hyperprior)
μ = pm.Deterministic('μ', β[location_category.codes] * hierarchical_salad_df.customers)
sales = pm.Normal("sales", mu=μ, sd=σ, observed=hierarchical_salad_df.sales)
trace_hierarchical_salad_sales_noncentered = pm.sample(random_seed=0)
with pm.Model() as model_hierarchical_salad_sales_extra_nodes:
# All this stuff is the same until the next comment
σ = pm.HalfNormal("σ", 20)
β_μ_hyperprior = pm.Normal("β_μ_hyperprior", 10, 10)
β_σ_hyperprior = pm.HalfNormal("β_σ_hyperprior", 10)
β_offset = pm.Normal('β_offset', mu=0, sd=1, shape=6)
β = pm.Deterministic("β", β_μ_hyperprior + β_offset * β_σ_hyperprior)
μ = pm.Deterministic('μ', β[location_category.codes] * customers)
sales = pm.Normal("sales", mu=μ, sd=σ, observed=hierarchical_salad_df.sales)
# Extra nodes for group and individual level predictions
β_group = pm.Normal("group_beta_prediction", β_μ_hyperprior, β_σ_hyperprior)
group_prediction = pm.Normal("group_prediction", β_group*out_of_sample_customers, σ)
location_4_predictions = pm.Normal("location_4_predictions", β[4]*out_of_sample_customers, σ)
trace_hierarchical_salad_sales_noncentered = pm.sample(random_seed=0)