Efficiency of a Multivariate Poisson distribution


#1

What might be the most effective way to write a Multivariate Poisson distribution with some 50 RVs?

The Wikipedia entry for the distribution defines this for a bivariate case but I imagine the efficiency will reduce significantly if this is followed for large number of RVs.

Is there any literature for doing this efficiently with PyMC3? Alternatively, are there any approximations/transformations that may be useful for this purpose?


#2

The Poisson only has a single parameter, the rate, so moving from N independent poisson processes to N dependent poisson processes means having a model for the joint distribution of the rates.

A multivariate log-normal approach might look like:

d=50
rate_chol_unpacked = pm.LKJCholeskyCov('rates_chol_UP', n=d, eta=1, sd_dist=pm.HalfCauchy.dist(1.))
m_rates = pm.Cauchy('mu_rates', 0., 1., shape=(d,))
log_rates = pm.MvNormal('log_rates', m_rates, 
                        chol=pm.expand_packed_triangular(d, rate_chol_unpacked), 
                        shape=d)
rates = pm.Deterministic('rates', tt.exp(log_rates))
counts = pm.Poisson('counts', mu=rates, observed=data)

Edit: Also, try to avoid using MvNormal directly. It’s almost always more efficient to use an equivalent non-centered parametrization.