My model looks like this:
M = X_train.shape with pm.Model() as model: ν = 1 λ = pm.HalfCauchy('lambda', 1) τ2 = pm.InverseGamma('tau_sq', alpha=0.5*ν, beta=0.5*ν/λ, shape=M, testval=0.1) σ = pm.HalfNormal("σ", 1.) pred = pm.Data("pred", X_train) mu = pm.Deterministic('mu',β0 + pm.math.dot(pred, β)) obs = pm.Normal("obs",mu , σ, observed=y_train)
I find that there are many divergences for this specification. I’ve tried the following…
- Using different initalizations (helps a little)
- How do i reparametrize the halfCauchy? This seems to help with the Cauchy -https://github.com/pymc-devs/pymc3/issues/1924
- Does it make sense to use a HalfStudentT in place of the HalfCauchy with dof 2?
Also any advice on how seriously to take divergences? I see in several models - the number of divergences does not affect the held out MAPE much. So is there a rule of thumb - say 10 or more divergences in 2 chains with 1000 tuning steps and 1000 draws is not ignorable?