# Problem in post predictive sampling in context of updating priors for sequential data

I followed this documentation on updating priors for sequential data for ex. time series.
This documentation showed only the training pipeline. But when I extended to prediction using post predictive sampling I found a bug.
In the training part, I modified model such that I need X’s shape of (3,100) and trained by this input.
But in predictive sampling, I fed X of shape (2,100), still model was predicting.
You can reproduce the error by running this notebook:

Weird, I’ll try to look into what is going on and get back to you

1 Like

I found the source of the problem.

In the fourth cell, where you write:

``````basic_model = Model()

with basic_model:

# Priors for unknown model parameters
alpha = Normal('alpha', mu=0, sd=1)
beta0 = Normal('beta0', mu=12, sd=1)
beta1 = Normal('beta1', mu=18, sd=1)
beta2 = Normal('beta2', mu=15, sd=1)

# Expected value of outcome
mu = alpha + beta0 * x_all[0] + beta1 * x_all[1] + beta2*x_all[2]

# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)

# draw 1000 posterior samples
trace = sample(1000)
``````

you are defining `mu` in terms of `x_all` and not of `x`. The difference seems so subtle but it turns out to have huge consequences. `x_all` is a `np.ndarray`, and when it is multiplied and summed with `theano.tensor`'s, the are interpreted as `TensorConstant`s! This means that their values can never be changed later on (maybe something could be done with `theano.clone`), and the `mu` tensor will always be computed using the initial `x_all`. That is why, when you changed x and did

``````ppc = pm.sample_ppc(trace, samples=50, model=basic_model)
``````

`sample_ppc` could still sample even though the shapes were wrong.

To solve this problem you should just change the first definition of `mu` to

``````mu = alpha + beta0 * x[0] + beta1 * x[1] + beta2*x[2]
``````

As an unrelated side note, your last call to `sample_ppc` uses the `basic_model`, not the last updated `model`, and I’m not sure if you did that on purpose or not. If you had used the updated `model`, you would have not had the problem. In the eighth cell, where you define the updated model as

``````for _ in range(10):

# generate more data
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
X3 = np.random.randn(size) * 0.3
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + beta2_true * X3 + np.random.randn(size)

x_temp = np.array([X1,X2,X3])
x.set_value(x_temp)

model = Model()
with model:
# Priors are posteriors from previous iteration
alpha = from_posterior('alpha', trace['alpha'])
beta0 = from_posterior('beta0', trace['beta0'])
beta1 = from_posterior('beta1', trace['beta1'])
beta2 = from_posterior('beta2', trace['beta2'])
# Expected value of outcome
mu = alpha + beta0 * x[0] + beta1 * x[1] + beta2 * x[2]

# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)

# draw 10000 posterior samples
trace = sample(1000)
traces.append(trace)
``````

you are setting `mu = alpha + beta0 * x[0] + beta1 * x[1] + beta2 * x[2]` using the shared `x`, and not `x_temp` nor `x_all`, so your later traces use correctly updated `x`'s, and `sample_ppc` would complain about the inconsistent shape of the later `x`.

1 Like

Thanks a lot, @lucianopaz. Actually, that was a silly mistake and this was just a prototype of the code I was working. In my real code, I had taken values from theano tensor instead of `numpy.ndarray`. But your suggestion helped me found the real bug, I had to reshape array of (21,) to (21,1) and that solved problem.
Also, I used the `basic_model` because in the prediction I had to predict using index 0, 1 and 2 as constants while in `new_model` that had a variable in for loop and dealing that was a tedious process. I used updates trace samples in the basic model so that is updated one.
Again, Thanks for the help!