I’m attempting to implement a beta random walk for a parameter in a larger model that I expect to vary over time and am confused about whether I’m taking the right approach or not. I’ve tried copying the design of the GaussianRandomWalk
class and implementing a logp function, and also explicitly creating each distribution in a loop through the number of steps. They both “work”, but I get different results with each. If my logic is completely wrong, or I’m missing something obvious, please let me know! I’m still learning about all of this. Otherwise if someone can explain the discrepancies between approaches I’d appreciate it.
For testing I just have a dummy dataset of binomial samples with variable probabilities of success at each timepoint (I expect autocorrelation and the real parameter will be part of a larger model, so I don’t want beta-binomial distribution).
Beta Random walk:
class BetaRandomWalk(pm.distributions.continuous.UnitContinuous):
def __init__(self, delta=0.8, theta=20, *args, **kwargs):
kwargs.setdefault("shape", 1)
super().__init__(*args, **kwargs)
self.delta = tt.as_tensor_variable(delta)
self.theta = tt.as_tensor_variable(theta)
self.mode = 0.5
def logp(self, x):
x_im1 = x[:-1]
x_i = x[1:]
alpha = tt.max(tt.stack([self.theta * x_im1 + self.delta, tt.ones_like(x_im1)]), axis=0)
beta = tt.max(tt.stack([self.theta * (1 - x_im1) + self.delta, tt.ones_like(x_im1)]), axis=0)
innov_like = pm.Beta.dist(alpha=alpha, beta=beta).logp(x_i)
return tt.sum(innov_like)
The models
delta = 0.8
theta = 20
with pm.Model() as brw_model:
p = BetaRandomWalk("p", delta=delta, theta=theta, shape=num_steps)
likelihood = pm.Binomial("likelihood", p=p, n=50, observed=dummy_df["nHit"].values)
with pm.Model() as bloop_model:
arr_p = []
arr_p.append(pm.Beta("arr_p0", alpha=2, beta=2))
for s in np.arange(1, num_steps):
a = pm.math.maximum(theta * arr_p[s-1] + delta, 1)
b = pm.math.maximum(theta * (1 - arr_p[s-1]) + delta, 1)
arr_p.append(pm.Beta(f"arr_p{s}", alpha=a, beta=b))
p = pm.Deterministic("p", tt.as_tensor_variable(arr_p))
likelihood = pm.Binomial("likelihood", p=p, n=50, observed=dummy_df["nHit"].values)
True p
is:
[0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32,
0.35, 0.38, 0.42, 0.45, 0.48, 0.51, 0.54, 0.58, 0.61, 0.64]
So they aren’t crazy off, but I don’t know if I’ve done either one correctly.
If anyone can help me understand, or point out how wrong I am about everything, I’d really appreciate it!