Categories in another column, how to use dynamically?

jorwoods · June 24, 2020, 7:14pm

I am very new to pymc3 and trying to get my feet wet with it. If I have a basic dataset, like:

import pandas as pd
import pymc3 as pm
df = pd.DataFrame({
    'labels': list('aabbcc'),
    'values': [0,1,1,1,0,0]
})

with pm.Model() as model:
    mu_a = pm.Beta('mu_a', 1, 3)
    mu_b = pm.Beta('mu_b', 1, 3)
    mu_c = pm.Beta('mu_c', 1, 3)

    likelihood_a = pm.Binomial('likelihood_a', p=mu_a, observed=df[df['labels'] == 'a'][‘values’])
    likelihood_b = pm.Binomial('likelihood_b', p=mu_b, observed=df[df['labels'] == 'b'][‘values’])
    likelihood_c = pm.Binomial('likelihood_c', p=mu_c, observed=df[df['labels'] == 'c'][‘values’])

    diff_a_b = pm.Deterministic('diff_a_b', mu_a - mu_b)
    diff_a_c = pm.Deterministic('diff_a_c', mu_a - mu_c )
    diff_b_c = pm.Deterministic('diff_a_c', mu_b - mu_c)

I feel like I am doing something very wrong, and there has to be an easier way to define the probabilities and calculate the differences.

AlexAndorra · June 25, 2020, 9:07am

Hi,
Yeah it seems like some vectorization is in order here. But it depends on what you’re trying to do though.
I noticed you don’t use the values column. So, what are studying with your model?

jorwoods · June 25, 2020, 11:25am

The missing usage of the values column was a typo when I was typing the example code. I have corrected the question. I also noticed I was calculating the diffs incorrectly and adjusted that.

At the moment, I would be looking for the differences in means between the label categories.

What I have there runs, but it feels like there is a better way to write it. And maybe a better way to scale it. If I had another column of new categories, say, “country,” I might want to compare across countries and labels without having to write a number of lines equal to

df.countries.nunqiue() * df.labels.nunique()

Times however many lines the model definition is.

I suspect that I need to be using the shape kwarg in the model definition, but I don’t yet understand what exactly shape does and when to use it.

AlexAndorra · June 26, 2020, 12:47pm

To be sure I understand what you’re trying to study:

Is there one experimental condition, in which people have to choose between three categories (a, b and c)?
Or are there three different experimental conditions (a, b and c) in which people have to make a binary choice (0 or 1)?

It’s usually quite intuitive. This section of the quicktart NB should help you.

jorwoods · June 26, 2020, 1:10pm

Thanks @AlexAndorra for the help thus far. In this case, it would be something more similar to the former. People are choosing from a drop down.

I figured I would have to use “shape” to do this, but what I am still missing is if I wanted to retain the labels in the posterior, and I am not understanding how I might be able to do that. The quickstart shows me how I would integer index once I have shape defined, and I can understand that.

I’m also thinking ahead to next steps when it is a multilevel model, and I want to compare across the user’s countries of origin to see if there is a difference in behavior, but there is something I am missing on how to do that, and do that without losing the information of their labels and using integer indexing.

AlexAndorra · June 26, 2020, 1:40pm

Yeah ok so I think you’re looking for Multinomial instead of Binomial. Here is an example that should help you: it’s a multinomial regression with several predictors and prior and posterior predictive sampling

Topic		Replies	Views
Trouble specificying X \| a, b, c, d ~ Categorical( . ) Questions	5	492	March 2, 2019
Newbie model question Questions	6	1565	July 27, 2017
Rewriting Likelihood with Potential Causes the Gradient to Crash Questions	2	370	March 29, 2021
Marginalizing over missing categories Questions	1	703	June 17, 2020
Peculiar Issue with PyMC3.Potential Questions	7	665	July 30, 2020

Categories in another column, how to use dynamically?

Related topics