Sample_posterior_predicitve not catching shape of new data

Hi all. I’m trying to predict on new data, but pm.sample_posterior_predicitve does not catch the new shape of data. I have tried an approach combining this and this posts, but to no avail. Here’s my latest attempt:

import numpy as np
import pymc as pm
rng = np.random.default_rng(18)

score = np.random.normal(0,1, (2,50)).flatten()
fi = np.array([np.arange(50),np.arange(50)]).flatten() #dates index
oi = np.array([np.zeros(50),np.ones(50)]).flatten() #number of options

#### Original model ####
def gen_mod(sco):
    with pm.Model() as mod:
        zs = pm.MutableData("zs", sco, dims="obs_id")
        F = int(zs.get_value().shape[0]/2)
        f = np.array([np.arange(F),np.arange(F)]).flatten() #dates index
        o = np.array([np.zeros(F).astype("int32"),np.ones(F).astype("int32")]).flatten() #number of options
        sd = pm.HalfNormal.dist(1.0)
        L, corr, std = pm.LKJCholeskyCov("L", n=2, eta=2.0, sd_dist=sd, compute_corr=True) 
        Σ = pm.Deterministic("Σ", L.dot(L.T))  
        w = pm.GaussianRandomWalk("w", shape=(F,2), init_dist=pm.Normal.dist(0,1))  
        B = pm.Deterministic("B", pm.math.matrix_dot(Σ,w.T))  
        α = pm.Normal("α", 0, 1.0, shape=F)  
        μ = pm.Deterministic("μ", α[f] + B[o,f]) 
        ϵ = pm.HalfNormal('ϵ', 1.0)+1
        y = pm.Normal("y", mu=μ, sigma=ϵ, observed=zs)  
    return mod

model = gen_mod(score)

with model:
    trace = pm.sample(100, chains=2, cores=12)

#### prediction
s1 = np.concatenate([score[:50], np.ones(20)])
s2 = np.concatenate([score[50:], np.ones(20)])

score_pred = np.array([s1,s2]).flatten()


model_pred = gen_mod(score_pred)
with model_pred:
    preds = pm.sample_posterior_predictive(trace)
preds = preds.posterior_predictive
preds = preds['y']
Only 100 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 12 jobs)
NUTS: [L, w, α, ϵ]
 |████████████| 100.00% [2200/2200 00:53<00:00 Sampling 2 chains, 4 divergences]Sampling 2 chains for 1_000 tune and 100 draw iterations (2_000 + 200 draws total) took 128 seconds.
There were 4 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5258, but should be close to 0.8. Try to increase the number of tuning steps.
 |████████████████████| 100.00% [200/200 00:00<00:00]

preds.shape
Out[2]: (2, 100, 100)

score_pred.shape
Out[3]: (140,)

So the shape of preds should be (2,100,140). I have tried using multidimensional arrays (i.e. 2 by 50) rather than flattened arrays, but nothing seems to work. Any help would be really appreciated.

Your model is always built on the data in score. The sco argument taken by your gen_mod() function is never used.

Thanks for spotting the typo/bug (sorry about that, my brain may be melting :disappointed:). I’ve updated my question with the bug fixed. Sadly, the outcome remains the same (as in previous attempts without that typo).

Are you attempting to define 2 separate models? Or are you just looking to alter zs in order to apply the posterior to new data? It looks like you are doing the former, but the latter just requires the use of model.set_data(). That being said, I do forget (off the top of my head how to alter the dimensions of a Mutable Data object.

Try this?

[Edit:] Which clearly won’t work. I would keep an eye on that issue.

1 Like

Many thanks for the help. I’ll keep trying to find a solution or to find what may be causing the issue.

I think I found the solution:

# -*- coding: utf-8 -*-
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
rng = np.random.default_rng(18)


score = np.random.normal(0,1, (2,50))
fi = np.arange(50)
oi = [0,1]
#### Original model ####
def gen_mod(score, fi, oi):
    with pm.Model() as mod:
        mod.add_coord("score", score, mutable = True)
        mod.add_coord("f", fi, mutable = True)
        mod.add_coord("o", oi, mutable = True)
        sd = pm.HalfNormal.dist(1.0)
        L, corr, std = pm.LKJCholeskyCov("L", n=2, eta=2.0, sd_dist=sd, compute_corr=True) 
        Σ = pm.Deterministic("Σ", L.dot(L.T))  
        w = pm.GaussianRandomWalk("w", init_dist=pm.Normal.dist(0,1), dims=("f","o"))  
        B = pm.Deterministic("B", pm.math.matrix_dot(Σ,w.T))  
        α = pm.Normal("α", 0, 0.01, dims="f") 
        μ = pm.Deterministic("μ", α + B) 
        ϵ = pm.HalfNormal('ϵ', 0.01)+1
        y = pm.Normal("y", mu=μ, sigma=ϵ, observed=score)  
    return mod

model = gen_mod(score, fi, oi)

with model:
    trace = pm.sample(1000, tune=1000, chains=4, cores=12)

#### prediction
s1 = np.concatenate([score.flatten()[:50], np.ones(20)])
s2 = np.concatenate([score.flatten()[50:], np.ones(20)])
score = np.array([s1,s2])
fi = np.arange(score.shape[1])
oi = [0,1]

model = gen_mod(score, fi, oi)

with model:
    preds = pm.sample_posterior_predictive(trace)
preds = preds.posterior_predictive
preds = preds['y']

score.shape
preds.shape

#### poltting
pre = preds.mean(axis=0).mean(axis=0)
presd = preds.mean(axis=0).std(axis=0)
plt.plot(fi, score[0], label='observed mean')
plt.plot(fi, pre[0], color='orange', label='predicted mean')
plt.fill_between(fi, pre[0]-presd[0], pre[0]+presd[0], alpha=0.2, color='orange', label='SD')
plt.grid(alpha=0.2)
plt.legend()
plt.show()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 12 jobs)
NUTS: [L, w, α, ϵ]
 |████████████| 100.00% [8000/8000 00:41<00:00 Sampling 4 chains, 1 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 66 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
 |████████████████████| 100.00% [4000/4000 00:00<00:00]
 

preds.shape
Out[2]: (4, 1000, 2, 70)

score.shape
Out[3]: (2, 70)

This image (pardon the horrible predictions) illustrates that in principle the predicted distribution is behaving correctly:
image

I think the trick was in re-initialising the model, rather than initialising a new model (i.e. with a different name) as @cluhmann you pointed out. But somehow with the set_data approach it wasn’t working well, as it is now with the add_coord approach. I guess everything is working as it should, but I’ll look in more detail and if there’s something off I’ll post it here. Thanks again.

My apologies. I didn’t read this one too carefully. Indeed, the add_coord argument seems to break the model when used with mutable=True. Here are some results for comparison.

Without add_coord:

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
rng = np.random.default_rng(18)


score = np.random.normal(0,1, (2,50))
fi = np.arange(50)
oi = [0,1]
#### Original model ####
def gen_mod(score, fi, oi):
    with pm.Model() as mod: 
        w = pm.GaussianRandomWalk("w", init_dist=pm.Normal.dist(0,1), shape=(50,2))  
        ϵ = pm.HalfNormal('ϵ', 1.0)
        y = pm.Normal("y", mu=w.T, sigma=ϵ, observed=score)  
    return mod

model = gen_mod(score, fi, oi)
    
with model:
    trace = pm.sample(1000, tune=1000, chains=4, cores=12)
    
with model:
    preds = pm.sample_posterior_predictive(trace)
preds = preds.posterior_predictive
preds = preds['y']

#### poltting
pre = preds.mean(axis=0).mean(axis=0)
presd = preds.mean(axis=0).std(axis=0)
plt.plot(fi, score[0], label='observed mean')
plt.plot(fi, pre[0], color='orange', label='predicted mean')
plt.fill_between(fi, pre[0]-presd[0], pre[0]+presd[0], alpha=0.2, color='orange', label='SD')
plt.grid(alpha=0.2)
plt.legend()
plt.show()

image

With add_coord:

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
rng = np.random.default_rng(18)


score = np.random.normal(0,1, (2,50))
fi = np.arange(50)
oi = [0,1]
#### Original model ####
def gen_mod(score, fi, oi):
    with pm.Model() as mod: 
        mod.add_coord("score", score, mutable=True)
        mod.add_coord("f", fi, mutable=True)
        mod.add_coord("o", oi, mutable=True)
        w = pm.GaussianRandomWalk("w", init_dist=pm.Normal.dist(0,1), dims=("f","o"))  
        ϵ = pm.HalfNormal('ϵ', 1.0)
        y = pm.Normal("y", mu=w.T, sigma=ϵ, observed=score)  
    return mod

model = gen_mod(score, fi, oi)
    
with model:
    trace = pm.sample(1000, tune=1000, chains=4, cores=12)
    
with model:
    preds = pm.sample_posterior_predictive(trace)
preds = preds.posterior_predictive
preds = preds['y']

#### poltting
pre = preds.mean(axis=0).mean(axis=0)
presd = preds.mean(axis=0).std(axis=0)
plt.plot(fi, score[0], label='observed mean')
plt.plot(fi, pre[0], color='orange', label='predicted mean')
plt.fill_between(fi, pre[0]-presd[0], pre[0]+presd[0], alpha=0.2, color='orange', label='SD')
plt.grid(alpha=0.2)
plt.legend()
plt.show()

image

1 Like

This may be a bit off-topic respect to my question, but just in case someone finds it a useful solution I’ll leave it here. I have written a very simple function to find the posterior predictive via sampling and it gives reasonable results for the example above. This implements a Gaussian random walk over a 2 dimensional variable (score: 50 datapoints randomly drawn from a standard Gaussian).

# -*- coding: utf-8 -*-
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
rng = np.random.default_rng(18)


score = np.random.normal(0,1, (2,50))
O,F = score.shape
times = np.arange(F)
#### Original model #### ##does not work with coords!!!
with pm.Model() as mod: 
    w = pm.GaussianRandomWalk("w", init_dist=pm.Normal.dist(0,1), shape=(F,O))  
    ϵ = pm.HalfNormal('ϵ', 1.0)
    y = pm.Normal("y", mu=w.T, sigma=ϵ, observed=score)  

with mod:
    trace = pm.sample(1000, tune=1000, chains=4, cores=12)
    
with mod:
    preds = pm.sample_posterior_predictive(trace, var_names=['y'])
preds = preds.posterior_predictive
preds = preds['y']



################# 
#### prediction
def predict(mu, f):
    y0 = []
    for chain in range(mu.shape[0]): 
        s_id = np.random.choice(sigma[chain].shape[0])
        s = sigma[chain][s_id]
        mm = []
        for time in range(mu.shape[2]+f): 
            if time < mu.shape[2]:
                t = time
            if time > mu.shape[2]:
                t = np.random.choice(np.arange(mu.shape[2]))
            m = mu[chain,:,t,0]
            p = np.random.normal(m,s,m.shape[0])
            mm.append(np.array(p))
        y0.append(np.array(mm))
    y0 = np.array(y0)
    y1 = []
    for chain in range(mu.shape[0]): 
        s_id = np.random.choice(sigma[chain].shape[0])
        s = sigma[chain][s_id]
        mm = []
        for time in range(mu.shape[2]+f): 
            if time < mu.shape[2]:
                t = time
            if time > mu.shape[2]:
                t = np.random.choice(np.arange(mu.shape[2]))
            m = mu[chain,:,t,1]
            p = np.random.normal(m,s,m.shape[0])
            mm.append(np.array(p))
        y1.append(np.array(mm))
    y1 = np.array(y1)
    return np.array([y0,y1])
    
mu = trace.posterior["w"]
sigma = trace.posterior["ϵ"]

preds = predict(mu,20)
preds = preds.mean(axis=1)

times2 = np.arange(70)
s1 = np.concatenate([score.flatten()[:50], np.random.normal(0,1,20)])
s2 = np.concatenate([score.flatten()[50:], np.random.normal(0,1,20)])
score2 = np.array([s1,s2])

#### poltting
pre = preds.mean(axis=2)
presd = preds.std(axis=2)
plt.plot(times, score[0], alpha=0.5, color='purple', label='observed')
plt.plot(times2[49:], score2[0,49:], alpha=0.5, linestyle=':', color='purple', label='new (not yet observed)')
plt.plot(times2, pre[0], color='green', label='predicted mean')
plt.fill_between(times2, pre[0]-presd[0], pre[0]+presd[0], alpha=0.2, color='green', label='SD')
plt.grid(alpha=0.2)
plt.legend()
plt.show()

pre = preds.mean(axis=2)
presd = preds.std(axis=2)
plt.plot(times, score[1], alpha=0.5, color='blue', label='observed')
plt.plot(times2[49:], score2[1,49:], alpha=0.5, linestyle=':', color='blue', label='new (not yet observed)')
plt.plot(times2, pre[1], color='crimson', label='predicted mean')
plt.fill_between(times2, pre[1]-presd[1], pre[1]+presd[1], alpha=0.2, color='crimson', label='SD')
plt.grid(alpha=0.2)
plt.legend()
plt.show()

First dimension of Gaussian random walk:
image
Second dimension of Gaussian random walk:
image

Although this works well, I’m still a bit unsure about my approach. I used random samples from the first 50 times (50 observed data points) to predict 20 new unobserved data points. I simulated random new data points and added them after predictions (dotted lines on images above) and they seem to be reasonable (though not super accurate), but maybe the way I performed these predictions is not entirely appropriate (?). Any comments would be greatly appreciated.

I’ll mark this as the solution, though it doesn’t really solve the underlying issue.

Your predictions don’t look right. In a time series model you should never have wiggles and dynamics in the average prediction unless you built them in, i.e. with seasonal components or AR terms. For a GRW, your best guess of tomorrow’s value is today’s value, so your predictions should be a straight line out from the last observed value, with increasing “cup shaped” variance as you move out from the observations in time (variance should be proportional to \sqrt{T + h})

To see this, here’s some math. The GRW model can be written:

y_{t+1} = y_t + \epsilon_{t+1}, \quad \epsilon_t \sim N(\mu, \sigma)

Important points:

  1. This is a so-called “Markov Process”, meaning that including information about past values does not improve your ability to forecast future values. Formally, \mathbb E [y_{t+1} | \{y_\tau\}_{\tau=0}^t] = \mathbb E [y_{t+1} | y_t], i.e. all that extra info is useless.
  2. The mean future prediction, then, is simply \mathbb E_t [y_{t+1}] = \mathbb E_t [y_t + \epsilon_{t+1}] = y_t + \mathbb E_t[\epsilon_{t+1}] = y_t + \mu
  3. Similarly, the forecast variance is just Var_t (y_{t+1}) = Var_t (y_t + \epsilon_t) = Var(\epsilon_t) = \sigma^2)
  4. By the “Law of Iterated Expectations”, all your best guesses about future values will boil down to only your knowledge today. It means that if we predict for time t+2, we’ll end up with \mathbb E_t [y_{t+2}] = \mathbb E_t [\mathbb E_t [y_{t+1}] + \epsilon_{t+2}] = \mathbb E_t[\mathbb E_t [y_t + \epsilon_{t+1}] + \epsilon_{t+2}] = y_t + \mathbb E_t [\mathbb E_t [\epsilon_{t+1}] + \epsilon_{t+2}]. The important point that the double expectation washes out, so you get \mathbb E_t [y_{t+2}] = y_t + 2\mu.

This generalizes to E_t [y_{t+h}] = y_t + h\mu, so your predicted mean should be the last observed value, plus a deterministic trend with slope \mu (0 in your case).

You can do the same thing for variance: Var_t (y_{t+2}) = Var_t(y_{t+1} + \epsilon_{t+2}) = Var_t(y_t + \epsilon_{t+1} + \epsilon_{t+2} The only important thing to know here is that variance is quadratic, so you can distribute it over addition, but little covariance babies pop out, as in Var(a + b) = Var(a) + Var(b) + 2 Cov(a, b). I appeal to Cov(\epsilon_t, \epsilon_s) = 0, \quad \forall t, s to ignore these, and you end up with: Var_t (y_{t+2}) = 2\sigma^2. So in general, the standard deviation of your prediction for y_{T+h} will be \sqrt{h} \cdot \sigma.

So far I’ve said nothing helpful about making predictions with your PyMC model… sorry about that. Basically, you want to sample from the posterior distribution over w[-1] and ϵ, construct a normal distribution centered on w[-1] with standard error ϵ, sample h values from this normal (where h is your forecast horizon) then take the cumsum to get your predictions. Do this a lot of times and you’ll get a nice cup shaped forecast distribution.

You can also skip the hassle and just use the formulas derived above.

Here’s the first approach applied to the first time series in score, starting after sampling the model you posted:

T = 50
h = 20
post = az.extract_dataset(trace)

# Note that the last prediction is *not* included in the cumsum, it is "x0", not the drift! 
forecasts = preds.stack(sample = ['chain', 'draw']).values[0, -1, :] + (post['ϵ'].values[None, :] * rng.normal(size=(h, 2000))).cumsum(axis=0)

fig, ax = plt.subplots(figsize=(14,4), dpi=77)
ax.plot(preds.stack(sample = ['chain', 'draw'])[0, :, :], color='0.5', alpha=0.05)
ax.plot(preds.stack(sample = ['chain', 'draw']).mean(dim='sample').values[0, :], color='tab:blue')
ax.plot(score[0, :], color='k')

ax.plot(np.arange(T, T + h), forecasts, color='tab:green', alpha=0.1)
ax.plot(np.arange(T, T + h), forecasts.mean(axis=-1), color='k', ls='--')
plt.show()

As a general comment, you can always make out-of-sample predictions by just re-implementing your model in Numpy, using the posterior samples together with whatever data you wish. In this case it’s a bit opaque how the pieces fit together, but that’s all we’re doing.

3 Likes

Thank you very much for the detailed explanation. Makes a lot of sense. I have corrected my approach following your advice.