Beta random walk

I’m attempting to implement a beta random walk for a parameter in a larger model that I expect to vary over time and am confused about whether I’m taking the right approach or not. I’ve tried copying the design of the GaussianRandomWalk class and implementing a logp function, and also explicitly creating each distribution in a loop through the number of steps. They both “work”, but I get different results with each. If my logic is completely wrong, or I’m missing something obvious, please let me know! I’m still learning about all of this. Otherwise if someone can explain the discrepancies between approaches I’d appreciate it.

For testing I just have a dummy dataset of binomial samples with variable probabilities of success at each timepoint (I expect autocorrelation and the real parameter will be part of a larger model, so I don’t want beta-binomial distribution).

Beta Random walk:

class BetaRandomWalk(pm.distributions.continuous.UnitContinuous):
    def __init__(self, delta=0.8, theta=20, *args, **kwargs):
        kwargs.setdefault("shape", 1)
        super().__init__(*args, **kwargs)
        
        self.delta = tt.as_tensor_variable(delta)
        self.theta = tt.as_tensor_variable(theta)
        
        self.mode = 0.5
        
    def logp(self, x):
        x_im1 = x[:-1]
        x_i = x[1:]
        
        alpha = tt.max(tt.stack([self.theta * x_im1 + self.delta, tt.ones_like(x_im1)]), axis=0)
        beta = tt.max(tt.stack([self.theta * (1 - x_im1) + self.delta, tt.ones_like(x_im1)]), axis=0)
        
        innov_like = pm.Beta.dist(alpha=alpha, beta=beta).logp(x_i)
        
        return tt.sum(innov_like)

The models

delta = 0.8
theta = 20

with pm.Model() as brw_model:
    p = BetaRandomWalk("p", delta=delta, theta=theta, shape=num_steps)
    likelihood = pm.Binomial("likelihood", p=p, n=50, observed=dummy_df["nHit"].values)


with pm.Model() as bloop_model:
    arr_p = []
    arr_p.append(pm.Beta("arr_p0", alpha=2, beta=2))
    for s in np.arange(1, num_steps):
        a = pm.math.maximum(theta * arr_p[s-1] + delta, 1)
        b = pm.math.maximum(theta * (1 - arr_p[s-1]) + delta, 1)
        arr_p.append(pm.Beta(f"arr_p{s}", alpha=a, beta=b))
        
    p = pm.Deterministic("p", tt.as_tensor_variable(arr_p))
    
    likelihood = pm.Binomial("likelihood", p=p, n=50, observed=dummy_df["nHit"].values)
    

True p is:

[0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 
 0.35, 0.38, 0.42, 0.45, 0.48, 0.51, 0.54, 0.58, 0.61, 0.64]

So they aren’t crazy off, but I don’t know if I’ve done either one correctly.

If anyone can help me understand, or point out how wrong I am about everything, I’d really appreciate it!

I didn’t look into your code but one, possibly easier, implementation is to use the built in GaussianRandomWalk and then wrap it inside tt.nnet.sigmoid() to squish between (0, 1)

Usage of maximum is discouraged as it causes the gradient to be zero below the cut, this causes problems for gradient based sampling like NUTS.

1 Like

You’re right that does appear to work quite well and matches up nicely with the loop model results. I checked what happens with more extreme values to see if the nonlinearity with step size would be an issue on the tails – it doesn’t look like it, but it highlighted that my BetaRandomWalk class definitely is incorrectly implemented.

Would this be a general way to implement AR’s for non-gaussians? I would also like to include an exponential AR(1), and it would certainly be simpler to wrap a GRW with pm.math.exp and feed that into pm.Exponential. I haven’t been able to find much information for estimating non-normal random walk parameters that I really understand!

In the case of maximum, would there be the same problem using tt.switch(a > 1, a, 1)? Maximum was just simpler to write, but isn’t necessary obviously.

It’s perfectly valid to do a GRW and then wrap it inside pm.math.exp().

tt.switch is very similar to maximum so it too will have zero gradient below 1. The issue is that if a is at -5 or 0.8 the model cannot “see” how close it is to being activated. One alternative approach is to use a soft rather than hard boundary.

2020-10-17_07-40

1 Like

Another approach is to do pm.Beta('x', ...).cumsum(), the only issue is that the innoviation is also a beta but you can append this random vector to a pm.Flat to get the same thing.

1 Like

Any chance you could write a quick example model? I’m not understanding how to implement this, and I keep getting dimension errors trying to append to pm.Flat.