Categories in another column, how to use dynamically?

I am very new to pymc3 and trying to get my feet wet with it. If I have a basic dataset, like:

import pandas as pd
import pymc3 as pm
df = pd.DataFrame({
    'labels': list('aabbcc'),
    'values': [0,1,1,1,0,0]
})

with pm.Model() as model:
    mu_a = pm.Beta('mu_a', 1, 3)
    mu_b = pm.Beta('mu_b', 1, 3)
    mu_c = pm.Beta('mu_c', 1, 3)

    likelihood_a = pm.Binomial('likelihood_a', p=mu_a, observed=df[df['labels'] == 'a'][‘values’])
    likelihood_b = pm.Binomial('likelihood_b', p=mu_b, observed=df[df['labels'] == 'b'][‘values’])
    likelihood_c = pm.Binomial('likelihood_c', p=mu_c, observed=df[df['labels'] == 'c'][‘values’])

    diff_a_b = pm.Deterministic('diff_a_b', mu_a - mu_b)
    diff_a_c = pm.Deterministic('diff_a_c', mu_a - mu_c )
    diff_b_c = pm.Deterministic('diff_a_c', mu_b - mu_c)

I feel like I am doing something very wrong, and there has to be an easier way to define the probabilities and calculate the differences.

Hi,
Yeah it seems like some vectorization is in order here. But it depends on what you’re trying to do though.
I noticed you don’t use the values column. So, what are studying with your model?

The missing usage of the values column was a typo when I was typing the example code. I have corrected the question. I also noticed I was calculating the diffs incorrectly and adjusted that.

At the moment, I would be looking for the differences in means between the label categories.

What I have there runs, but it feels like there is a better way to write it. And maybe a better way to scale it. If I had another column of new categories, say, “country,” I might want to compare across countries and labels without having to write a number of lines equal to

df.countries.nunqiue() * df.labels.nunique()

Times however many lines the model definition is.

I suspect that I need to be using the shape kwarg in the model definition, but I don’t yet understand what exactly shape does and when to use it.

To be sure I understand what you’re trying to study:

  • Is there one experimental condition, in which people have to choose between three categories (a, b and c)?
  • Or are there three different experimental conditions (a, b and c) in which people have to make a binary choice (0 or 1)?

It’s usually quite intuitive. This section of the quicktart NB should help you.

Thanks @AlexAndorra for the help thus far. In this case, it would be something more similar to the former. People are choosing from a drop down.

I figured I would have to use “shape” to do this, but what I am still missing is if I wanted to retain the labels in the posterior, and I am not understanding how I might be able to do that. The quickstart shows me how I would integer index once I have shape defined, and I can understand that.

I’m also thinking ahead to next steps when it is a multilevel model, and I want to compare across the user’s countries of origin to see if there is a difference in behavior, but there is something I am missing on how to do that, and do that without losing the information of their labels and using integer indexing.

Yeah ok so I think you’re looking for Multinomial instead of Binomial. Here is an example that should help you: it’s a multinomial regression with several predictors and prior and posterior predictive sampling