Fitting Gaussian Random Walk to stock prices

I am trying to find the underlying ‘hidden’ price of a stock by modelling it as a Gaussian Random walk for the past 100 days. This is the model that I have:

n_samples = 500
with pm.Model() as model:
    σ = pm.Exponential('σ', 1/np.abs(np.diff(y,axis=0)).mean())
    f = pm.GaussianRandomWalk('f', sd=σ, shape=len(y))
    
    sd = pm.Bound(pm.Normal, lower=0.0, upper=0.1)('sd', mu=0.0, sd=1.0, testval=1.0)
    likelihood = pm.Normal('y', mu=f, sd=sd, observed=y)
    
    trace = pm.sample(n_samples, njobs=4)

However ‘f’ simply ends up being zero. I bounded sd because it was an attempt to make it ‘stick’ to the observed prices a bit better. Having it unbounded didn’t improve or worsen the model. Any thoughts on how I could improve this model? The full code is here, (see cell 27/28).

Your model is unidentifiable as it is. Think of it this way: the variance in each observation in y comes from 2 places: from the GaussianRandomWalk controlled by σ (also the covariance with the previous time point) and the Normal distribution controlled by sd. With uniform prior, y could be generated completely from a GaussianRandomWalk with σ=var(diff(y)), or a Normal distribution with sd=var(y).

You can put more informative prior on the σ for the random walk (currently you use Exponential which put too much weight on the small value, thus pushing the variance of y all loaded onto the Normal), but I like to instead model the total variance and distribute them onto σ and sd:

with pm.Model() as model:
    smth_parm = pm.Uniform("alpha", lower=0, upper=1)
    mu2 = pm.Normal("mu", sd=100)
    tau2 = pm.Exponential("tau", 1.0/100)
    z2 = pm.GaussianRandomWalk("f",
                           mu=mu2,
                           tau=tau2 / (1.0 - smth_parm), 
                           shape=y.shape)
    obs = pm.Normal("obs", 
                    mu=z2, 
                    tau=tau2 / smth_parm, 
                    observed=y)
    trace = pm.sample(1000, njobs=4, tune=1000)

Which variance contributed more is controlled by alpha, you can interpreted it as a smoothing parameter.

You can find more explanations here:

4 Likes

This is super helpful, even years later :sweat_smile: Thanks @junpenglao !
I remember you also mention it in your book with Osvaldo and Ravin – great chapter BTW, I always refer to it.

Reading about it made me wonder: would you recommend doing the same when the likelihood is not Normal?
So something like this:

    intercept = pm.Normal("intercept")
    mu = pm.Normal("mu")
    smth_param = pm.Beta("smth_param", 2, 2)
    diversity = pm.InverseGamma("diversity", 2, 2)
    scale = diversity * (1.0 - smth_param)

    rw = pm.GaussianRandomWalk(
           "rw", init_dist=pm.Normal.dist(),  
            mu=mu, sigma=scale,
        )
    p = pm.math.invlogit(intercept + rw)

    # overdispersion parameter
    theta = diversity * smth_param
    
    pm.BetaBinomial(
        "y",
        alpha=p * theta,
        beta=(1.0 - p) * theta,
        n=N,
        observed=y,
    )

I’m obviously not @junpenglao , but my interpretation of this model is a type of statespace model, with a GRW hidden state and non-guassian emissions. This is perfectly kosher. I think the marketing guys used to do something similar to this for time-varying parameters, but they now recommend using an HSGP instead; maybe @cetagostini can confirm?

On that note, you might be interested in this post by @bwengals where he shows equivalence between a GP with a certain kernel and a GRW.

2 Likes

Thanks for the clarification :stuck_out_tongue_winking_eye:

Aaaah, so I think I’m starting to get the linguo: hidden state basically means latent param? And non-guassian emissions means likelihood ≠ Normal?

Ha ha yeah – in the end, everything is a GP

2 Likes