Including predictors in Dirichlet-multinomial models

I am trying to build a DM model based on the example in the documentation, but with some predictor variables, and I get different errors each time I adjust my code.

I have some data adjusted from the example:

import pymc as pm
import pandas as pd
import numpy as np
import scipy as sp
import scipy.stats
import arviz

rng = np.random.default_rng(0)

# Sample data structure
true_conc = 6.0
true_frac = np.array([0.5, 0.25, 0.25])
k = len(true_frac)  # Number of different tree species
n = 10  # Number of forests
total_count = 50 # Number of trees in each forest
true_p = sp.stats.dirichlet(true_conc * true_frac).rvs(size=n)
observed_counts = np.vstack([sp.stats.multinomial(n=total_count, p=p_i).rvs() for p_i in true_p])
x = np.random.normal(0,1,n)

print(observed_counts)

I then run the code on the example, and get three frac posteriors and one conc parameter:

# Version 1 - intercept only
with pm.Model() as model_dm_1:
    frac = pm.Dirichlet("frac", a=np.ones(k))
    conc = pm.Lognormal("conc", mu=1, sigma=1)
    counts = pm.DirichletMultinomial("counts", n=total_count, a=frac * conc, shape=(n, k), observed=observed_counts)
    # Sample
    trace_dm_1 = pm.sample(chains = 4, return_inferencedata = True)

pm.model_to_graphviz(model_dm_1)
arviz.plot_posterior(trace_dm_1)

When I try to add in a predictor to the fraction, I get an error on sample. When I run model_to_graphviz, the dimensions do not match up (again varying each time), so I assume that it is failing on some matrix multiplication, but I can’t figure out why.

# Version 2 - some predictors
with pm.Model() as model_dm_2:
    b0 = pm.Normal("b0", shape = k)
    b1 = pm.Normal("b1", shape = k)
    y = b0 + b1 * [x, x, x]
    
    frac = pm.Dirichlet("frac", a=pm.math.exp(y))
    conc = pm.Lognormal("conc", mu=1, sigma=1)
    counts = pm.DirichletMultinomial("counts", n=total_count, a=frac * conc, shape=(n, k), observed=observed_counts)
    
    # Sample
    trace_dm_2 = pm.sample(chains = 4, return_inferencedata = True)

Gives me Incompatible Elemwise input shapes [(3, 10), (10, 3)]
Another change (removing [x, x, x] and just putting x gives me SpecifyShape: dim 0 of input has shape 3, expected 10.
Another change to replace frac gave me Incompatible Elemwise input shapes [(10, 3), (1, 10)]

The y = b0 + b1 * x part works on a Poisson regression and on a linear regression, so where am I going wrong on Dirichlet regression?

I haven’t checked your model carefully, but multivariate distributions are very specific about the other of their dimensions. This guide might be of help: Distribution Dimensionality — PyMC 0+untagged.345.g2bd0611.dirty documentation

1 Like