NUTS sampler slowing down on a hierarchical model

Quick update: I tried fitting a GRW model on a single timeseries and I seem to get a very good fit upon visual inspection, but I keep getting some divergences during sampling.

Any thoughts on why I am getting divergences with GRW? Here’s the code:

with pm.Model() as model:
    
    sigma = pm.HalfNormal('sigma', 300)
    alpha = pm.Uniform('alpha', 0, 1)
    mu = pm.GaussianRandomWalk('mu', 
                              sigma=sigma * (1. - alpha), 
                              shape=len(y)
                              )
    likelihood = pm.Normal('sales', 
                      mu=mu, 
                      sigma=sigma * alpha, 
                      observed=y
                          )

Also, I am not sure how to scale this model to the entire dataset? Should I also try to build a hierarchical model here or use the Multivariate GRW class?

Btw, the SDE model is blazing fast with no divergences but, interestingly, the posterior dist samples look a lot noisier compared to GRW and the fit is not visually as good.

Thank you.