Generative time series modelling

I’m running into difficulties producing a generative model for a time series. I have a model as follows:

import pymc3 as pm
import theano.tensor as tt

cutoff_idx = 1000
y_obs = np.ma.MaskedArray(Y, np.arange(N) > cutoff_idx)

interval = 100

with pm.Model() as genModel:

    sigma_mu = pm.HalfNormal('s_mu', sd=0.05)
    sigma = pm.HalfNormal('s', sd=0.3)

    mu = pm.GaussianRandomWalk('mu', mu=0.0, sd=sigma_mu, shape=interval)
    weights = tt.repeat(mu, y_obs.size // interval)

    y = pm.Normal('y', mu=weights, sd=sigma, observed=y_obs)

I want this model to learn the standard deviations sigma and sigma_mu, but not the individual mus in my gaussian random walk. This is because I want to be able to generate more data points (using the masked array) that simulate the underlying DGP, and I don’t want the model to learn a trend in the random walk (because I happen to know there is none – any such learning would be spurious).

In other words I noticed that, with the above code, if my generated data in Y happens to have data points mostly below 0, for example, then the mu variable will learn to generate mostly values below 0. This is not what I want: I’m using the random walk prior only to learn the variance of it, not any specific trend in it.

What would be the right way to achieve what I’m trying to achieve (assuming it’s possible)?

Almost as soon as I posted the above it occurred to me that there was a basic workaround, which is to grab the samples myself from the trace (and ignore the mus), and generate my data outside of the pyMC3 framework as follows:

with genModel:

    sigma_mus = trace['s_mu']
    sigma = trace['s']

    # generating from the obtained samples
    random_walk = 1.0
    generated_y = []
    for t in range(N/2):
        s_mu_idx = int(np.random.uniform(0, len(sigma_mus), 1)[0])
        s_idx = int(np.random.uniform(0, len(sigma), 1)[0])
        current_s_mu = sigma_mus[s_mu_idx]
        current_s = sigma[s_idx]

        random_walk = random_walk + np.random.normal(0.0, current_s_mu, 1)[0]
        current_y = random_walk + np.random.normal(0.0, current_s, 1)[0]

        generated_y.append(current_y)

    plt.plot(generated_y)
    plt.title("Generated time series")
    plt.show()

This seems to work (?), but I was aiming for a more elegant solution within the pyMC3 framework, such as having the option for the GaussianRandomWalk to not learn any trend (if it is indeed what it’s doing) in the innovations.

You definitely should not use the mu (or weight) because it fits to the local pattern of your data. Sampling from hyper parameters is much more appropriate and your solution looks fine to me.