Hi everyone,
I’m still trying to wrap my head around the various shapes a parameter can have, and today I was slowly able to start understanding better the coordinates and dimensions and how they are used in practice. Where I’m currently stuck is with the following problem - I have a dataset with multiple output counts data which I wish to model as (for the moment) independent Poisson GLMs. However, I would like to share some of the parameters between the GLMs. Trying to do that leaves me quite confused regarding how to specify the dimentions correctly.
Here is the model I’m trying to fit:
Y_{ij} \sim \text{Poisson}(\lambda_{ij}); \quad j \in \text{Outcomes}, i \in \text{Rows} \\ \log (\lambda_{ij}) = \alpha_0 + \sigma_M \alpha_{\text{PRODUCT[i]}} + \sigma_{\text{PRODUCT[i]}} \alpha_{\text{PRODUCT[i], OUTCOME[j]}} + \log(\text{exposure}_i) \\ \alpha_0 \sim \text{Normal}(-4, 1.3) \\ \alpha_\text{PRODUCT[i]} \sim \text{Normal}(0, 1) \\ \sigma_M \sim \text{Exponential}(5) \\ \sigma_\text{PRODUCT[i]} \sim \text{Exponential}(5) \\ \alpha_\text{PRODUCT[i], OUTCOME[j]} \sim \text{Normal}(0, 1) \\
This is a non-centered hierarchical model with a common global intercept, an intercept for each product and, for each product, an intercept for each outcome of counts.
I managed to fit this model without the outcomes part quite easily (as a single GLM), but now when I extend to multiple outcomes I get stuck.
Here is a simple example of the code I’m trying to fit:
product_idx, product_name = pd.factorize(data.product)
exposure = data.exposure
outcomes = ['y_0', 'y_1', 'y_2']
coords = {
'obs_id': list(range(data.shape[0])),
'Product': product_name,
'Outcome': outcomes
}
with pm.Model(coords=coords) as model:
product_idx = pm.Data('product_idx', product_idx, dims="obs_id")
exposure = pm.Data('exposure', exposure, dims="obs_id")
alpha_0 = pm.Normal('alpha_0', -4, 1.3) # from prior predictive checks
sd_m = pm.Exponential('sd_m', 5)
alpha_p = pm.Normal('alpha_p', 0, 1, dims="Product")
scaled_alpha_p = pm.Deterministic('scaled_alpha_p', sd_m * alpha_p, dims='Product')
# Here be dragons
sd_m_p = pm.Exponential('sd_m_p', 5, dims='Product')
alpha_m_p = pm.Normal('alpha_m_p', 0, 1, dims=('Product', 'Outcome'))
# Now I'm not sure how to proceed. One option:
scaled_alpha_m_p = pm.Deterministic('scaled_alpha_m_p', sd_m_p * alpha_m_p, dims=('Product', 'Outcome'))
# second option:
scaled_alpha_m_p = pm.Deterministic('scaled_alpha_m_p', pm.math.dot(sd_m_p, alpha_m_p), dims=('Product', 'Outcome'))
# both don't really work.
# And here I'm really unsure of how to specify the likelihood:
linpart = alpha_0 + scaled_alpha[product_idx] + scaled_alpha_m_p[product_idx, ???] + pm.math.log(exposure)
y = pm.Poisson('y', mu=pm.math.exp(linpart), observed=data[outcomes], dims=("obs_id", "Outcome"))
The last part of the code, following “Here be dragons” does not work, though it would if I remove all mention of “Outcome” and the scaled_alpha_m_p
part.
My questions are:
-
How can I make the multiplication for
scaled_alpha_m_p
work? I want there to be something with the dimensions of observations x outcomes but I don’t know how to achieve this. Seemed quite easy for the scalar x vector multiplication, but here I’m stuck. -
I thought of using a loop over outcomes to create a likelihood for each outcome via the standard Poisson, but there I got stuck since I don’t know how to access the “outcome” dimension of
alpha_m_p
. Is there a way to make that work? -
Is there a better way altogether for this? In the end the model will be much more complicated with probable dependencies between the outcomes and more regressors added, so I wish to make this foundation as solid as possible.
Thanks for any help, and for the good work on PyMC3!
Cheers, Omri