Total beginner here and not that familiar with the theory so I might be completely off base. I am trying to model some data that is basically occurrence counts over integers. I’m starting off a graphical model from the literature of my academic field which gives me a distribution over integers. The function is somewhat complex with a utility and a soft-max and it involves an idiosyncratic term with no simple form. Taking inspiration in particular from this example I was able to express it in PyMC but I can’t quite figure out how to feed it my data. What I have looks like this:
with model:
...
n = pm.Data("n", value=np.arange(max_n))
weird_factor = pm.Data("w", value=...)
...
dist = ... # random vector with length max_n
count = pm.Categorical("count", p=dist, observed=??)
So:
- Does this look reasonable? It seems to me that due to the idiosyncratic term I can’t do better that treat the integers like they are “data” and the final distribution as a categorical variable, is there a plausible alternative here?
- How do I actually feed my counts or frequencies as observations to the model? Just putting the counts as observed data did not seem to work. Or is there a better way to fit an expected distribution to an observed one than using Categorical like this?