Pandas groupby with PyMC

I’m new to PyMC, so I’m still getting my head around shaping the input data. I’m running a trial that has k possible outcomes, and varying two inputs, and want to find the probability of each outcome. My results are formatted like so:

| Run | Var1 | Var2 | OutcomeA | OutcomeB | OutcomeC |
|-----|------|------|----------|----------|----------|
| 1   | X    | M    | 1        | 0        | 0        |
| 2   | X    | N    | 0        | 1        | 0        |
| 3   | Y    | M    | 1        | 0        | 0        |
| 4   | Y    | N    | 0        | 0        | 1        |

and so on. I’ve been able to get a model that works without subdividing the groups, aka calling sum() on the entire results table doing the following:

coords = {"outcome": ["A", "B", "C"]}
with pm.Model(coords=coords) as model:
    p = pm.Dirichlet("p", a=np.ones(3), dims="outcome")
    obs = pm.Multinomial("obs", n=n, p=p, observed=data["OutcomeA", "OutcomeB", "OutcomeC"].sum())

However I want to investigate how the different combinations of variables affect the distribution of p. I’m using data.groupby(["Var1", "Var2"]).sum() to get a summary table, but I can’t figure out how I pass that table into a PyMC model. I’ve tried passing shape=n*m (n and m are the number of different Var1 and Var2 inputs respectively) to p but that just gives me the same results as above.

What I want to get out is a set of n x m vectors of length k, where n x m is the number of combinations of Var1 and Var2 and k is the number of possible outcomes, like so:

("X", "M"): [0.2, 0.4, 0.4]
("X", "N"): [0.25, 0.3, 0.45]

etc. Can anyone give me a hint on how to properly set this up, or point me at an example of a similar problem? Thanks

1 Like