I’m new to PyMC, so I’m still getting my head around shaping the input data. I’m running a trial that has k possible outcomes, and varying two inputs, and want to find the probability of each outcome. My results are formatted like so:
| Run | Var1 | Var2 | OutcomeA | OutcomeB | OutcomeC |
|-----|------|------|----------|----------|----------|
| 1 | X | M | 1 | 0 | 0 |
| 2 | X | N | 0 | 1 | 0 |
| 3 | Y | M | 1 | 0 | 0 |
| 4 | Y | N | 0 | 0 | 1 |
and so on. I’ve been able to get a model that works without subdividing the groups, aka calling sum()
on the entire results table doing the following:
coords = {"outcome": ["A", "B", "C"]}
with pm.Model(coords=coords) as model:
p = pm.Dirichlet("p", a=np.ones(3), dims="outcome")
obs = pm.Multinomial("obs", n=n, p=p, observed=data["OutcomeA", "OutcomeB", "OutcomeC"].sum())
However I want to investigate how the different combinations of variables affect the distribution of p. I’m using data.groupby(["Var1", "Var2"]).sum()
to get a summary table, but I can’t figure out how I pass that table into a PyMC model. I’ve tried passing shape=n*m
(n and m are the number of different Var1 and Var2 inputs respectively) to p but that just gives me the same results as above.
What I want to get out is a set of n x m vectors of length k, where n x m is the number of combinations of Var1 and Var2 and k is the number of possible outcomes, like so:
("X", "M"): [0.2, 0.4, 0.4]
("X", "N"): [0.25, 0.3, 0.45]
etc. Can anyone give me a hint on how to properly set this up, or point me at an example of a similar problem? Thanks