Implementation of Multivariate Hypergeometric distribution not sampling correctly

I just figured out what was the problem. When I set up the model, I chose to represent each K as a univariate BetaBinomial:

if __name__ == '__main__':

    trueK = np.asarray([40, 60])
    N = trueK.sum()

    fn = 0.9
    n = (N*fn).astype('int32')
    obs = random_mv_hypergeom(K=trueK, n=n)[0]
    p = obs/n

    b = 10
    a = (1+(b-2)*p)/(1-p)

    with pm.Model() as ex:
        K = pm.BetaBinomial('K', n=N, alpha=a, beta=b, testval=(N*p).astype('int32'), shape=p.shape)
        p = pm.Deterministic('p', K/N)
        hyp = MvHypergeometric(name='hypergeom', K=K, n=n, observed=obs)

        tex1 = pm.sample(draws=10000, nchains=2)

It is important that all K sum up to N, but since each BetaBinomial is independent of each other, there is no guarantee of that. Changing the specification so that K has a Multinomial prior solved the issue:

    with pm.Model() as ex:
        pp = pm.Dirichlet('p', a = N*np.repeat(1/len(trueK), repeats=len(trueK)), shape=trueK.shape,
                          testval=obs)
        KK = pm.Multinomial('K', n=N, p=pp, shape=trueK.shape)
        hyp = MvHypergeometric(name='hypergeom', K=KK, n=n,
                               observed=obs)

There are warnings about divergence and number of effective samples, and the traces for the K variable don’t look very well behaved, but this seems out of this question`s scope. I’ll start another question in the forum regarding this problem.

Anyway, thanks for the attention and the tips!

3 Likes