What might be the most effective way to write a Multivariate Poisson distribution with some 50 RVs?
The Wikipedia entry for the distribution defines this for a bivariate case but I imagine the efficiency will reduce significantly if this is followed for large number of RVs.
Is there any literature for doing this efficiently with PyMC3? Alternatively, are there any approximations/transformations that may be useful for this purpose?
The Poisson only has a single parameter, the rate, so moving from N independent poisson processes to N dependent poisson processes means having a model for the joint distribution of the rates.
A multivariate log-normal approach might look like:
rate_chol_unpacked = pm.LKJCholeskyCov('rates_chol_UP', n=d, eta=1, sd_dist=pm.HalfCauchy.dist(1.))
m_rates = pm.Cauchy('mu_rates', 0., 1., shape=(d,))
log_rates = pm.MvNormal('log_rates', m_rates,
rates = pm.Deterministic('rates', tt.exp(log_rates))
counts = pm.Poisson('counts', mu=rates, observed=data)
Edit: Also, try to avoid using MvNormal directly. It’s almost always more efficient to use an equivalent non-centered parametrization.
I have an additional question on this topic:
How would you code the above covariances if the rate of each Poisson depended on say two parameters? i.e. log_rates = constant + coeff * data
Where we can assume constant and coeff are drawn from something like:
constant = pm.Normal('const', mu=0., sigma=1.)
coeff = pm.Normal('b0', mu=0., sigma=1.)