Irregularly Spaced Observations for Gaussian Random Walk


I’m working on trying to infer the standard deviation of a random walk from some observed data. However the data is irregularly spaced (there could be 0, 1, 2 observations at any given time point), so I’m not sure how to setup a bayesian random walk model for this. The data is generated by a process similar to the following,

T = 100
std = 1.0
mu_array = np.zeros(T)
t = []
obs = []
for i in range(1, T):
    p = np.random.rand()
    mu_array[i] = std*np.random.randn() + mu_array[i-1]
    if p <= 0.1:
        obs.append(std*np.random.rand() + mu_array[i])
        obs.append(std*np.random.rand() + mu_array[i])
    if p <= 0.3:
        obs.append(std*np.random.rand() + mu_array[i])

Any advice on how to model this would be greatly appreciated! Thanks!

I note that there are in fact two components in the above block of code: a hidden random walk

z_t = z_{t-1} + \epsilon_t

and an observation

x_t^{(i)} = z_t + \nu_t^{(i)}

with \epsilon_t, \nu_t^{(i)} \sim_{iid} N(0, \sigma^2)

I note that the above code will either have 0, 2, or 3 observations (should the second if be an elif?)

The sampling model needs to be different from the inferential model; as the former can’t be conditioned on the number of observations at each time step:

k = 100
with pm.Model() as sim_model:
    z0 = pm.Normal('z0', 0, 0.5)
    et = pm.Normal('et', 0, 2, shape=k)
    zt = pm.Deterministic('zt', z0 + tt.cumsum(et))
    nt = pm.Categorical('nt', [0.7, 0.2, 0.1], shape=k)
    vt = pm.Normal('vt', 0, 1, shape=(k,3))
    xt = pm.Deterministic('xt', (vt.T + zt.T).T)
    prtr = pm.sample_prior_predictive(5)

data_x, data_n = list(), list()
for i in range(k):
    for j in range(prtr['nt'][0,i]):

For inference, indexing is your friend:

with pm.Model() as mod_inf:
    err_eps = pm.HalfNormal('s_rw', 1.)
    err_obs = pm.HalfNormal('s_os', 1.)
    zt = pm.GaussianRandomWalk('zt', mu=0, sd=err_eps, shape=k)
    zt_obs = zt[np.array(data_n)]
    xt = pm.Normal('xt', mu=zt_obs, sd=err_obs, observed=np.array(data_x))
    zt_idx = pm.Deterministic('zt_tp', zt[np.array([0, 25, 50, 75, 99])])
    tr = pm.sample(init='advi+adapt_diag', tune=1500, chains=3, cores=2)
pm.traceplot(tr, ['s_rw', 's_os', 'zt_tp']);

for j in range(1000)[::10]:
    plt.plot(tr['zt'][j,:], alpha=0.1, color='grey')
plt.scatter(data_n, data_x, color='red')



Thanks for the quick reply! This example is very detailed and helpful. Also good catch on the second elif, that’s my bad.

I have just two quick questions.

  1. What does the line zt_idx = pm.Deterministic('zt_tp', zt[np.array([0, 25, 50, 75, 99])]) mean/do?
  2. Why the choice of init='advi+adapt_diag'?

zt_idx is just a convenience for the traceplot. By default arviz will make one plot per component per variable, and I only wanted to see a few rather than 100 density plots.

advi+adapt_diag is just a personal preference.