Forecasting hierarchical models with sample_posterior_predictive

Hello everyone,

I’m using pymc3 with for panel regressions and I’m puzzled by the sample_posterior_predictive function as it returns an error when the regressors used for forecasting don’t have the same length as the initial data I use for sampling. As I saw in previous posts, for hierarchical models, the trick is to supply the index that matches coefficients and groups as a pm.Data object and then to change the data and the index to compute the forecast.

        with pm.Model() as hierarchical_model:

            X1_shared = pm.Data('X1_shared', data.X1.values)
            index = pm.Data('index', iso3c_index)

            # Hyperpriors for group nodes
            mu_a = pm.Normal(name='mu_a', mu=0., sigma=1)
            sigma_a = pm.HalfNormal(name='sigma_a', sigma=.5)

            mu_b = pm.Normal(name='mu_b', mu=.5, sigma=1)
            sigma_b = pm.HalfNormal(name='sigma_b', sigma=.5)

            a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=n_countries)

            b = pm.TruncatedNormal('b', mu=mu_b, sigma=sigma_b,lower=0,upper=1, shape=n_countries)

            # Model error
            eps = pm.HalfCauchy('eps', 5.)

            index = index.get_value().astype('int64')  # index as a vector of int

            demand_est = a[index] + b[index] * X1_shared  # specification of mu

            # Data likelihood
            demand_like = pm.Normal('demand_likelihood',

        with hierarchical_model:
            hierarchical_trace = pm.sample(2000, tune=2500)

        with hierarchical_model:

            pm.set_data({'X1_shared': forecast_data.X1.values, # get my regressor values
                         'index': forecast_data.iso3c.astype("category")}) # create my index
            ppc = pm.sample_posterior_predictive(hierarchical_trace, 2000)

        return ppc

By doing this I get an error which states that the shape of my new data doesn’t match the shape the function sample_posterior_predictive needs (and the shape required corresponds to the previous dataset used for calibration). Do you have any idea why? The error in question:

ValueError: Input dimension mis-match. (input[0].shape[0] = 1534, input[2].shape[0] = 1416)
Apply node that caused the error: Elemwise{Composite{((i0 + (i1 * i2)) - (i3 * i4))}}[(0, 0)](AdvancedSubtensor1.0, AdvancedSubtensor1.0, X1_shared, AdvancedSubtensor1.0, X2_shared)
Toposort index: 5
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(1534,), (1534,), (1416,), (1534,), (1416,)]
Inputs strides: [(8,), (8,), (8,), (8,), (8,)]
Inputs values: [‘not shown’, ‘not shown’, ‘not shown’, ‘not shown’, ‘not shown’]
Outputs clients: [[‘output’]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag ‘optimizer=fast_compile’. If that does not work, Theano optimizations can be disabled with ‘optimizer=None’.
HINT: Use the Theano flag ‘exception_verbosity=high’ for a debugprint and storage map footprint of this apply node.

PS: I actually got another variable X2_shared in the error log but I simplified the model to post it on discourse

By using shared variables with theano I managed to solve my issue thanks to this post: Prediction using sample_ppc in Hierarchical model