PR proposal -- API for Multinomial-Softmax and Cholesky decomposition

AlexAndorra · November 11, 2019, 11:54am

Hi there! I had two ideas of PR for PyMC but was wondering if it would be useful:

I noticed lots of people over-parametrize their Multinomial-Softmax models: they put all the categories in the model, while only N-1 of them are useful and the last one is entirely determined by the others. Over-parametrizing generally slows down sampling. At the same time, the over-parametrization is more intuitive: the trick of fixing the value of one category always feels weird.
So, what do you think of modifying the API so that people can parametrize with all the categories but PyMC fixes one of them under the hood? That way, the API becomes more intuitive and the sampling speed is always optimized, without people having to artificially complicate their models.
Same principle, but applied to the API for MvNormal with Cholesky decomposition. Doing this decomposition is quasi unavoidable for sampling efficiency, but I find the API adds boiler plate code that harms models’ readability:

sd_dist = pm.Exponential.dist(1)
packed_chol = pm.LKJCholeskyCov(f"chol_cov_{p}", eta=4, n=2, sd_dist=sd_dist)
chol = pm.expand_packed_triangular(2, packed_chol, lower=True)

# extract rho and sds:
cov = pm.math.dot(chol, chol.T)
sigma_ab = pm.Deterministic("sigma_cluster", tt.sqrt(tt.diag(cov)))
corr = tt.diag(sigma_ab ** -1).dot(cov.dot(tt.diag(sigma_ab ** -1)))
r = pm.Deterministic("Rho", corr[np.triu_indices(2, k=1)])

Having an API that does the decomposition under the hood and returns the standard deviations and correlations directly in the trace (as Stan does) would be great I think! What’s your take on that?

I’d be happy to work on these PRs! With the guidance of a mentor if possible, given the complexity of the code base.
Happy to discuss all that with you! PyMCheers

junpenglao · November 11, 2019, 4:34pm

I think both would be really helpful contributions! I think both could be either adding an additional kwarg to the distribution, or creating a new distribution that subclass the original distribution. Thoughts?

AlexAndorra · November 11, 2019, 8:56pm

Yeah I was thinking about new kwargs too. Something like:

For Multinomial, the user would provide the index of the pivot category (as the value of the parameters changes depending on which category acts as pivot):

alpha = pm.Normal("alpha", -1.8, 0.5, shape=Ncategories)
p = tt.nnet.softmax(alpha)
R = pm.Multinomial("R", n=N, p=p, pivot_id=0, observed=sim_R)

PyMC then operates the transformation backstage, i.e the pivot_idth column of alpha in reality gets replaced by a constant equal to the mean of alpha, which means p is indeed computed thanks to Ncategories - 1 categories.

For Cholesky, something like this would be great:

sd_dist = pm.Exponential.dist(1)
packed_chol = pm.LKJCholeskyCov("chol_cov", eta=4, n=2, sd_dist=sd_dist)
ab_cluster = pm.MvNormal(
            "ab_cluster", mu=[-1.8, -0.5], chol=packed_chol, 
             extract_corr=True, shape=(Nclusters, 2)
           )

The transformation into Cholesky factor and covariance matrix, and computation of correlations and standard deviations would be done under the hood. The trace would return the stds and correlations, as users are in general interested in them, not in the Cholesky factor per se.

Of course, these are high-level proposals, as I don’t have your knowledge of the code base and of the design choices, but I hope this is a start

Topic		Replies	Views
Multi-Variate Normal in pymc3 vs. R Questions	9	986	August 4, 2020
Does pymc3 have a cholesky decomposition function? Questions	5	1119	October 15, 2019
Advice for speeding up large multivariate models Questions	17	2147	April 12, 2018
Using LKJCorr together with MvNormal version agnostic	44	717	January 16, 2024
`LKJCholeskyCov` requires `dist_params`? (v4.0) Questions	2	787	September 22, 2021

PR proposal -- API for Multinomial-Softmax and Cholesky decomposition

Related topics