Efficiency of a Multivariate Poisson distribution

What might be the most effective way to write a Multivariate Poisson distribution with some 50 RVs?

The Wikipedia entry for the distribution defines this for a bivariate case but I imagine the efficiency will reduce significantly if this is followed for large number of RVs.

Is there any literature for doing this efficiently with PyMC3? Alternatively, are there any approximations/transformations that may be useful for this purpose?

The Poisson only has a single parameter, the rate, so moving from N independent poisson processes to N dependent poisson processes means having a model for the joint distribution of the rates.

A multivariate log-normal approach might look like:

d=50
rate_chol_unpacked = pm.LKJCholeskyCov('rates_chol_UP', n=d, eta=1, sd_dist=pm.HalfCauchy.dist(1.))
m_rates = pm.Cauchy('mu_rates', 0., 1., shape=(d,))
log_rates = pm.MvNormal('log_rates', m_rates, 
                        chol=pm.expand_packed_triangular(d, rate_chol_unpacked), 
                        shape=d)
rates = pm.Deterministic('rates', tt.exp(log_rates))
counts = pm.Poisson('counts', mu=rates, observed=data)

Edit: Also, try to avoid using MvNormal directly. It’s almost always more efficient to use an equivalent non-centered parametrization.

2 Likes

I have an additional question on this topic:

How would you code the above covariances if the rate of each Poisson depended on say two parameters? i.e. log_rates = constant + coeff * data

Where we can assume constant and coeff are drawn from something like:

constant = pm.Normal('const', mu=0., sigma=1.)
coeff = pm.Normal('b0', mu=0., sigma=1.)