Does it make sense to use a HalfStudentT in place of the HalfCauchy with dof 2?
Also any advice on how seriously to take divergences? I see in several models - the number of divergences does not affect the held out MAPE much. So is there a rule of thumb - say 10 or more divergences in 2 chains with 1000 tuning steps and 1000 draws is not ignorable?
Can you show how you’re defining β0 and β in your code? And where τ2 is being used in the code (as shown it isn’t actually used in the model). This will help figure out what might be causing the divergences.
To paraphrase comments Aki Vehtari made after his keynote at last year’s PyMCon, the rule of thumb is that divergences are nothing to worry about as long as you have fewer than about one of them. In other words, divergences (any number) should not be ignored.
With a small number of divergences, I typically assume that I can fix things with small modifications to sampling parameters (e.g., targeted acceptance rates), but I always verify this by actually implementing those modifications. More divergences and you may have more of a pathological issue. PyMC3/arviz provides plenty of tools to investigate the divergences so that you can diagnose what all is going on.
Thanks That is useful - Is there a particular link I can focus on to examine divergences? I usually find a simple regression model with one covariate for prior predictive checks, a full example with multiple covariates would be a great starting point.
Empirically I’ve found the test performance for predictive models to be incredibly close for 0 vs say 5 divergences. So I suspect it matters which parameter(s) is divergent. Any comments on reparametrization would also be welcome.
You can find the pymc3 notebook discussing the diagnosis of divergences here. It’s slightly out of date, and so many of the functions in that notebook have been (or are in the midst of being) moved into arviz. But that notebook should nonetheless give you good guidance about the general procedure.
Indeed. Ultimately, the divergences are simply an indication of what was going on during sampling. Divergences indicate that the sampler was having trouble and figuring out where in the parameter space the sampling was difficult (i.e., where the divergences were observed) is a major part of uncovering the relevant investigation. The notebook above specifically addresses this and presents a re-specified model that alleviates the uncovered sampling difficulties. Of course, you can get a stray divergence here or there for no obvious reason. But until you go digging, you’ll never know why.