Hello everyone,
I’m using pymc3 with for panel regressions and I’m puzzled by the sample_posterior_predictive function as it returns an error when the regressors used for forecasting don’t have the same length as the initial data I use for sampling. As I saw in previous posts, for hierarchical models, the trick is to supply the index that matches coefficients and groups as a pm.Data object and then to change the data and the index to compute the forecast.
with pm.Model() as hierarchical_model:
X1_shared = pm.Data('X1_shared', data.X1.values)
index = pm.Data('index', iso3c_index)
# Hyperpriors for group nodes
mu_a = pm.Normal(name='mu_a', mu=0., sigma=1)
sigma_a = pm.HalfNormal(name='sigma_a', sigma=.5)
mu_b = pm.Normal(name='mu_b', mu=.5, sigma=1)
sigma_b = pm.HalfNormal(name='sigma_b', sigma=.5)
a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=n_countries)
b = pm.TruncatedNormal('b', mu=mu_b, sigma=sigma_b,lower=0,upper=1, shape=n_countries)
# Model error
eps = pm.HalfCauchy('eps', 5.)
index = index.get_value().astype('int64') # index as a vector of int
demand_est = a[index] + b[index] * X1_shared # specification of mu
# Data likelihood
demand_like = pm.Normal('demand_likelihood',
mu=demand_est,
sigma=eps,
observed=data.Total)
with hierarchical_model:
hierarchical_trace = pm.sample(2000, tune=2500)
with hierarchical_model:
pm.set_data({'X1_shared': forecast_data.X1.values, # get my regressor values
'index': forecast_data.iso3c.astype("category").cat.codes}) # create my index
ppc = pm.sample_posterior_predictive(hierarchical_trace, 2000)
return ppc
By doing this I get an error which states that the shape of my new data doesn’t match the shape the function sample_posterior_predictive needs (and the shape required corresponds to the previous dataset used for calibration). Do you have any idea why? The error in question:
Blockquote
ValueError: Input dimension mis-match. (input[0].shape[0] = 1534, input[2].shape[0] = 1416)
Apply node that caused the error: Elemwise{Composite{((i0 + (i1 * i2)) - (i3 * i4))}}[(0, 0)](AdvancedSubtensor1.0, AdvancedSubtensor1.0, X1_shared, AdvancedSubtensor1.0, X2_shared)
Toposort index: 5
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(1534,), (1534,), (1416,), (1534,), (1416,)]
Inputs strides: [(8,), (8,), (8,), (8,), (8,)]
Inputs values: [‘not shown’, ‘not shown’, ‘not shown’, ‘not shown’, ‘not shown’]
Outputs clients: [[‘output’]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag ‘optimizer=fast_compile’. If that does not work, Theano optimizations can be disabled with ‘optimizer=None’.
HINT: Use the Theano flag ‘exception_verbosity=high’ for a debugprint and storage map footprint of this apply node.
PS: I actually got another variable X2_shared in the error log but I simplified the model to post it on discourse