Pandas groupby with PyMC

b_apollo · October 25, 2022, 9:38am

I’m new to PyMC, so I’m still getting my head around shaping the input data. I’m running a trial that has k possible outcomes, and varying two inputs, and want to find the probability of each outcome. My results are formatted like so:

| Run | Var1 | Var2 | OutcomeA | OutcomeB | OutcomeC |
|-----|------|------|----------|----------|----------|
| 1   | X    | M    | 1        | 0        | 0        |
| 2   | X    | N    | 0        | 1        | 0        |
| 3   | Y    | M    | 1        | 0        | 0        |
| 4   | Y    | N    | 0        | 0        | 1        |

and so on. I’ve been able to get a model that works without subdividing the groups, aka calling sum() on the entire results table doing the following:

coords = {"outcome": ["A", "B", "C"]}
with pm.Model(coords=coords) as model:
    p = pm.Dirichlet("p", a=np.ones(3), dims="outcome")
    obs = pm.Multinomial("obs", n=n, p=p, observed=data["OutcomeA", "OutcomeB", "OutcomeC"].sum())

However I want to investigate how the different combinations of variables affect the distribution of p. I’m using data.groupby(["Var1", "Var2"]).sum() to get a summary table, but I can’t figure out how I pass that table into a PyMC model. I’ve tried passing shape=n*m (n and m are the number of different Var1 and Var2 inputs respectively) to p but that just gives me the same results as above.

What I want to get out is a set of n x m vectors of length k, where n x m is the number of combinations of Var1 and Var2 and k is the number of possible outcomes, like so:

("X", "M"): [0.2, 0.4, 0.4]
("X", "N"): [0.25, 0.3, 0.45]

etc. Can anyone give me a hint on how to properly set this up, or point me at an example of a similar problem? Thanks

Topic		Replies	Views
Why my multinomial model with categorical predictors and response differs so much from results in R? version agnostic modeling	5	88	January 26, 2025
PyMC v5.10.3 prediction stuff v5 modeling	11	449	January 15, 2024
Need help with basic mode - starting point	6	163	April 29, 2024
How to create model with series of multinomials, each of which has observations on only a subset of components? v5 modeling	1	339	November 27, 2022
Distribution shape controlled by discrete random variable v5 modeling	2	387	July 26, 2022

Pandas groupby with PyMC

Related topics