Hey everyone: just a heads up-last week I was curious why some simpler models would not finish sampling in Databricks when I came up on some discourse on the Databricks forums(Notebook cell gets hung up but code completes - Databricks Community - 67841).
Feedback from these posts suggested that Databricks has some restrictive memory constraints within cells that make it hard to gauge sampling performance-as the sampling widget itself will lock up/leak memory, visually giving the impression that sampling has stopped or frozen. A user may then be led to falsely believe that their model is misspecified, or their posterior practically intractable by mcmc.
In my testing applying some basic BART poission regression with low-moderate dimensionallity and complexity( I’ve predefined strictly linear relationship, dims = (4000,10), 50 trees with 1k burns/draws and 8 chains, my sampler will consistently hang at the 10 or so minute mark within Data bricks-while the cell process still runs. Without the progress bar, my sampling will finish in half the indicated ‘hang/stall’ time, and sampling rate will remain far more stable.
I’d be happy to provide benchmarks if the need arises, but I wanted to get this out as soon as possible in case others are left scratching their heads as to why everything still looks good after the fourth triple check