Hi all, I’m a Stan-user experimenting with PyMC for the first time.
I’ve been converting some State-Space Models from Stan to PyMC as an exercise. Today I converted one such model: a varying-intercepts model where each timepoint has its own intercept, but the prior for each intercept \alpha_t is \text{Normal}(\alpha_{t-1}, \sigma_\alpha). So we have this recursive mean structure in the varying intercepts.
I have fit the model in PyMC no problem using GaussianRandomWalk
to capture this recursiveness:
import pymc as pm
with pm.Model() as model_2_2:
# Priors on scale parameters
sigma_irreg = pm.HalfNormal('sigma_irreg', sigma=1)
sigma_alpha = pm.HalfNormal('sigma_level', sigma=1)
# Using GaussianRandomWalk for the varying intercepts
mu_seq = pm.GaussianRandomWalk('mu_seq', sigma=sigma_alpha, shape=len(dat))
# Likelihood
y_obs = pm.Normal('y_obs', mu=mu_seq, sigma=sigma_irreg, observed=dat['log_y'])
# Go!
trace = pm.sample(2000, chains=4, tune=1000, return_inferencedata=True)
But one thing I appreciate about the Stan syntax for this same model is that you can make the recursiveness explicit by defining priors using for loops. For example, in Stan you can do:
data {
int<lower=1> n;
}
parameters {
vector[n] mu;
}
model {
for(t in 2:n)
mu[t] ~ normal(mu[t-1], sigma_alpha);
}
I find this Stan syntax nice for pedagogical reasons (i.e. Bayesian State Space Models can be framed as iterative Bayesian inference with drift in the latent state between each iteration, and I think a loop helps illustrate this), but also I think it is more flexible: it gives you the option to model the relationship between adjacent timepoints using other variables, to easily use other distributions for the relationship, etc.
I was able to use the for loop approach in PyMC when defining this model for 10 observations, but for the full dataset (~200 observations) I ran into C++ compilation errors. I tried a version that used Pytensor scan, but hit an error: ValueError: Random variables detected in the logp graph: {mu_t}.
import pytensor as pt
import pytensor.tensor as pt
with pm.Model() as model:
n = len(dat['log_y'])
y_observed = dat['log_y']
# Priors for standard deviations
sigma_level = pm.HalfNormal('sigma_level', sigma=1)
sigma_irreg = pm.HalfNormal('sigma_irreg', sigma=1)
# Initial value of mu
mu_init = pm.Normal('mu_init', mu=0, sigma=1)
# Iterative computation of timepoint-specific mu
def step_fn(mu_tm1, sigma_level):
mu_t = pm.Normal('mu_t', mu=mu_tm1, sigma=sigma_level)
return mu_t
mu, updates = pytensor.scan(fn=step_fn,
outputs_info=[dict(initial=mu_init)],
non_sequences=sigma_level,
n_steps=n-1)
# Include the initial value in the mu sequence
mu_seq = pt.concatenate([[mu_init], mu])
# Likelihood
y_obs = pm.Normal('y_obs', mu=mu_seq, sigma=sigma_irreg, observed=y_observed)
# Go!
trace = pm.sample(2000, chains=4, tune=1000, return_inferencedata=True)
So I guess my question is: is it possible to declare priors using a loop in PyMC? I imagine I’m just overlooking some basic syntactic differences between Stan and PyMC. Thanks in advance for your help.