I have a linear model that is entirely categorical variables turned into dummy variables and I am wondering if someone has a recommended sampler. I have been avoiding NUTS and playing with Metropolis and HamiltonianMC, however there are many to pick from and I am wondering if anyone has some advice for this general setup.

Is there some specific reason you are avoiding NUTS? Model with dummy-coded predictors (typically) still have continuous parameters (coefficients), so NUTS would be my default (and is probably selected by `pm.sample()`

automatically). But if you have some reason to look elsewhere, knowing what that reason is might help point you in the right direction.

Ah, I was under the impression that NUTS does not play well with categorical data and thus should be avoided!

NUTS does not play well with categorical *parameters*, but is otherwise fine with categorical data. So if your data is dummy-coded and you have continuous coefficients, you should be fine. In general, `pm.sample()`

will automatically try to select a reasonable sampling scheme by inspecting your model. NUTS tends to be much better than the alternative MCMC algorithms when it’s available.

Thanks so much!

Just a sidenote from experience, if you’ve a lot of categorical features and/or features with many levels leading to many linear parameters, you’ll probably be well-served to introduce partial-pooling e.g. GLM: Hierarchical Linear Regression — PyMC3 3.11.2 documentation

If I had missing categorical predictors or categorical features, which sampler would be best for that?

Ultimately, the best sampling scheme depends on what your model is. Just knowing what your data is like doesn’t give you any strong hints about how best to sample. So if you have a specific model you have questions about, feel free to post it and someone can weigh in on how best to proceed.

I’m not sure the sampler matters so much as the model construction, and then your sampler would simply be a consequence of that.

Handling missing values in a categorical is an interesting problem. AFAIK under the current missing value imputation, the features are imputed independently - so you probably wouldn’t want to try to impute a {0,1} for each column in a one-hot-encoded set of columns because you’d also have to enforce them to sum to 1. (Though I suppose you could do that with a switch and Potential)

If you’re partial-pooling the categorical factors, then you have an indexing feature, so perhaps you could try to impute that, requiring that it comes from a discrete uniform distribution {0, …, max_index_value}. Naturally you’d have to use a discrete-friendly sampler for that particular feature.

Marginalizing discrete variables is a nice solution to avoid subpar samplers. Doing the marginalization, you can always sample the discrete variables from the marginal probabilities.

Yeah, cool - I hadn’t thought of it that way. Imputing missing values in a categorical feature is a bit like (I think) assigning a latent single-member cluster label, and the non-missing values act to seed the cluster labels. Maybe…

Related: