Multivariate Hypergeometric model has very low number of effective samples

lucianopaz · April 21, 2019, 7:35pm

I think that the most likely cause of the low ess is the metropolis step. Metropolis steps or Gibbs are the only ways to sample from discrete variables such as K, but the problem is that it mixes badly. It’s even worse when the number of dimensions (in your case, the number of classes) increases. This means that the autocorrelation in the trace for K will be long. This can be interpreted to mean that given an element in the trace, the next step will have not reached the stable distribution of the Markov Chain yet. A typical way to deal with this problem is to thin the chain after sampling, in order to make each element in the thinned chain have little to no autocorrelation with the previous one. The downside is that you will have to draw much more samples than the ones you will end up using. In your case maybe thinning by a factor of 10 could work, but you should have a look at the autocorrelation plots.

The only alternative I can think of, to avoid metropolis, is to try to marginalize out K. This means that you should write down the probability distribution of your MvHypergeometric conditional on K and then sum over all possible combinations of K. This will lead to a Mixture of MvHypergeometrics with different determined K’s, so you would not be inferring discrete values anymore, you would be left only with the continuous p and could sample using NUTS. However, mixtures are also very hard to sample from, so I would go with chain thinning first, and see how it goes.

Topic		Replies	Views
Implementation of Multivariate Hypergeometric distribution not sampling correctly Questions development	8	2339	April 17, 2019
Mixed multivariate Gauss distribution Questions	39	4815	April 12, 2018
Geometric variable not being properly Sampled Questions from_github	22	3240	June 22, 2017
Hierarchical population modeling: MV GMM by importance sampling chains of individual fits Questions	0	468	January 15, 2020
Custom Categorical Distribution - ensure bounded candidates v5 modeling	13	395	March 31, 2024

Multivariate Hypergeometric model has very low number of effective samples

Related topics