My model looks like this:

```
M = X_train.shape[1]
with pm.Model() as model:
ν = 1
λ = pm.HalfCauchy('lambda', 1)
τ2 = pm.InverseGamma('tau_sq', alpha=0.5*ν, beta=0.5*ν/λ, shape=M, testval=0.1)
σ = pm.HalfNormal("σ", 1.)
pred = pm.Data("pred", X_train)
mu = pm.Deterministic('mu',β0 + pm.math.dot(pred, β))
obs = pm.Normal("obs",mu , σ, observed=y_train)
```

I find that there are many divergences for this specification. I’ve tried the following…

- Using different initalizations (helps a little)
- How do i reparametrize the halfCauchy? This seems to help with the Cauchy -https://github.com/pymc-devs/pymc3/issues/1924

- Does it make sense to use a HalfStudentT in place of the HalfCauchy with dof 2?

Also any advice on how seriously to take divergences? I see in several models - the number of divergences does not affect the held out MAPE much. So is there a rule of thumb - say 10 or more divergences in 2 chains with 1000 tuning steps and 1000 draws is not ignorable?