Hi,
I am setting up an autoregressive Poisson model and I’m using synthetic data to ensure that I am recovering the parameters correctly. Unfortunately, that is not happening.
I am actually building a hierarchical model but let’s simplify for now with just one series. Here is the data generating process:
N = 52*6 # number of points per series
x = np.zeros((N))
# true k:
true_k = 0.6
# true process mean:
true_mu=5
# true variance of the innovation
true_tau = 8
x[0]=true_mu
for i in range(1, N):
x[i] = true_k * (x[i-1] - true_mu) + stats.norm.rvs(0, np.sqrt(1/true_tau)) + true_mu
y = np.random.poisson(np.exp(x))
_, ax = plt.subplots(1, 1, figsize=(8,5))
ax.plot(y);
I use a non-zero mean autoregressive process:
import theano as T
import theano.tensor as tt
import pymc3 as pm
from pymc3.distributions.distribution import generate_samples, draw_values
class NonzeroMeanAR1(pm.distributions.continuous.PositiveContinuous):
"""
Autoregressive process with 1 lag.
Parameters
----------
k : tensor
effect of lagged value on current value
tau_e : tensor
precision for innovations
"""
def __init__(self, k, tau_e, mu, *args, **kwargs):
super(NonzeroMeanAR1, self).__init__(*args, **kwargs)
self.k = k = tt.as_tensor_variable(k)
self.tau_e = tau_e = tt.as_tensor_variable(tau_e)
self.tau = tau_e * (1 - k ** 2)
self.mu = tt.as_tensor_variable(mu)
self.mean = self.mu
def logp(self, x):
k = self.k
tau_e = self.tau_e
mu = self.mu
x_im1 = x[:-1]
x_i = x[1:]
boundary = pm.Normal.dist(mu=mu, tau=tau_e).logp(x[0])
innov_like = pm.Normal.dist(mu=k * (x_im1 - mu) + mu, tau=tau_e).logp(x_i)
return boundary + tt.sum(innov_like)
First, you can find the model that assumes that the AR process is observed:
with pm.Model() as par_model:
k = pm.Normal('k', 0, 1)
tau = pm.HalfNormal('tau', sd=1)
mu = pm.Normal('mu', mu=5, sd=1)
lat = NonzeroMeanAR1('lat', k=k, tau_e=tau, mu=mu, observed=x)
#y_pred = pm.Poisson('y_pred', mu=tt.exp(lat), observed=y)
trace = pm.sample(
2000,
tune=2000,
target_accept=0.9,
init="advi+adapt_diag",
)
Next, you can find the model that assumes that the AR process is a latent process:
with pm.Model() as par_model:
k = pm.Normal('k', 0, 1)
tau = pm.HalfNormal('tau', sd=1)
mu = pm.Normal('mu', mu=5, sd=1)
lat = NonzeroMeanAR1('lat', k=k, tau_e=tau, mu=mu, shape=len(y))
y_pred = pm.Poisson('y_pred', mu=tt.exp(lat), observed=y)
trace = pm.sample(
2000,
tune=2000,
target_accept=0.9,
init="advi+adapt_diag",
)
I can recover the parameters for a purely AR process but when performing the AR-Poisson (AR is a latent process) I am unable to get the AR process parameters back. Here you have the trace plots:
It is mixing ok but the estimations are completely off. Already tried fewer data and more informative priors and the opposite. What am I missing?
Thank you!