Is it possible to embed a deterministic variable as the mean of a Gaussian Random Walk

I am implementing a model which is similar to the stochastic volatility example in the documentation. The model is as follows:

alpha ~ Normal(mu_alpha, sigma_alpha**2)
delta ~ Truncated-Normal(mu_delta, sigma_delta**2, a=-1, b=1)
beta_variance ~ InverseGamma(a_2, b_2)
stock_variance ~ InverseGamma(a_1, b_2)
beta_t ~ Normal( mu= alpha + delta * (beta_t-1 - alpha), beta_variance)
returns_obs_t ~ Normal(mu=beta_t * returns_market_t, stock_variance)

Here is what I have so far:

with pm.Model() as model:
   beta_variance = pm.InverseGamma('beta_var',1000, 200)
   sym_variance = pm.InverseGamma('sym_var', 1000, 100)`

    alpha = pm.Normal('steady_state_beta', mu=1.0, sd=10.0)
    delta = pm.TruncatedNormal('mean_reversion_coeff', mu=0.3, sd=10.0, a=-1, b=1)
    
    beta = pm.GaussianRandomWalk('beta', mu=alpha + delta * (beta - alpha), sd=math.sqrt(beta_variance), shape = len(returns_df.index))
    returns = pm.Normal('stock_returns', mu = returns_df['excess_market'] * beta, sd = math.sqrt(sym_variance), observed = returns_df['excess_symbol', 'excess_market'])

Using a deterministic function for the mean of beta seems appropriate but it’s unclear how to combine the two concepts (deterministic functions of the mean of Guassian Random Walk). Any thoughts?

I’m not sure you want a GaussianRandomWalk here, or at least one with a non-zero mu. Mu for the random walk is the innovation drift. Is there a source paper/textbook from which you got this model?

If you’re looking for something like the mean-reverting vol as defined in formula (2) of section 2.1 here, you might try something like this:

class StandardStochasticVol(pm.distributions.distribution.Continuous):
    def __init__(self, phi, alpha, sigma_nu, *args, **kwargs):
        """
        Function (2), section 2.1 of 
        https://cran.r-project.org/web/packages/stochvol/vignettes/article.pdf
        Args:
            phi: persistence
            alpha: mean
            sigma_nu: volatility (of volatility, if you're using this for log vol)
        """
        super(StandardStochasticVol, self).__init__(*args, **kwargs)
        self.phi = phi = T.as_tensor_variable(phi)
        self.alpha = alpha = T.as_tensor_variable(alpha)
        self.sigma_nu = sigma_nu = T.as_tensor_variable(sigma_nu)
        self.mode = T.as_tensor_variable(0)

    def logp(self, x):
        phi = self.phi
        alpha = self.alpha
        sigma_nu = self.sigma_nu

        x_im1 = x[:-1]
        x_i = x[1:]
        boundary_term = pm.Normal.dist(alpha, sigma_nu).logp(x[0])
        err = (x_i - alpha) - phi * (x_im1 - alpha)
        x_i_term = pm.Normal.dist(0, sigma_nu).logp(err)
        return boundary_term + T.sum(x_i_term)

Bayesian Analysis of Stochastic Betas@DanWeitzenfeld Here’s the paper. I attempted to implement a Gibbs Sampler using the author’s derivation of conditional posteriors but it was extremely sub optimal and I was hoping to make use of PyMC3.

I think your sketched out function is incorrect because each beta_t depends on beta_t+1 from previous iteration and beta_t-1 from current iteration. The updating process is has a snaking effect, I believe.

I’m not sure what you mean - can you say more?

Here’s an example on toy data.

In the paper the author decomposes that conditional posterior distribution of beta_t as

p(beta_t|alpha, delta, variance_symbol, beta_variance, observations) ~ p(beta_t|beta_tm1)p(beta_tp1|beta_t)p(observed returns|beta_t)

If I understand your logp function, it calculates err by shifting the index of x by 1 to the left, whereas the model requires stepwise drawing beta_t for each t? As in, you need to update conditional posterior for each t then sample and use the new sampled value to update and draw from t+1 conditional posterior? Am I misunderstanding how logp works within the PyMC3 framework? That’s where I found the bottleneck in optimizing the model. I will try to implement using your code and try with theano.scan to see if I get similar convergence.

I think maybe the confusion here is that the model I’m suggesting is like a Kalman smoothing, in that it estimates based on the entire time series, and you’re looking for a Kalman filtering that estimates only each beta_t only using data available at t. That said, I skimmed the paper, and I didn’t see anything that made me think they were filtering rather than smoothing. Let me know if you disagree.

Using Kalman smoothing leads to a single estimate of beta for entire time series in a given iteration? Filtering vs smoothing are new concepts to me so I’ll review and compare to paper. Alternatively, I applied a similar approach to your StandardStochasticVol class where I modified GuassianRandomWalk class to account for a more complicated mu term (in this case mu = alpha + delta * (beta_tm1 - alpha)).