Can't fit binomial hierarchical model on proportion data

mihagazvoda · April 9, 2023, 9:34pm

Let’s say I have data of proportions where trial outcomes can be anything between 0 and 1):

import pandas as pd
import bambi as bmb # version 0.10.0

df = pd.DataFrame({
    'group': ['A', 'B', 'C'],
    'y': [10.6, 0.7, 84.13],
    'n': [100000, 10, 900000]
})

m = bmb.Model('p(y, n) ~ (1|group)', data=df, family='binomial')
idata = m.fit()

How can I model such data since I get the error that successes need to be integers? If it’s not possible in bambi, pymc is also okay. Thanks!

tcapretto · April 10, 2023, 11:26am

Could you explain a little bit more about why y is not integer? What does it represent? What does n represent?

mihagazvoda · April 10, 2023, 2:49pm

Let’s say I want to model star ratings (of Amazon products, for example) from 1 to 5. In my opinion, it makes sense to estimate the proportion of stars received out of all possible stars (for one rating). So 3 stars = 0.5, 4 stars 0.75, etc…
n represents all ratings received per product.
For example, product received 3, 5, 1 star, this is converted as y = 0.5 + 1 + 0 and n = 3.

tcapretto · April 11, 2023, 11:32am

I’m not so sure that’s a good idea. The stars represent an ordinal outcome (1 < 2 < 3 < 4 < 5). So you could model it as such (e.g., ordered logistic regression). Or if you still want to model the proportion of stars, you can use a Beta distribution for the response. However, I’m not very in favor of this option either because you only have 5 possible values (0, 0.25, 0.5, 0.75, 1)

Remember the binomial distribution applies when your variable can be represented as “the number of successes in ‘n’ independent trials”. Every time you use a Binomial, think about what’s the success and what’s the failure.

Topic		Replies	Views
Beta and dirichlet regression for continuous proportion data bambi	7	1126	September 15, 2024
How to speed up bambi model version agnostic bambi	2	690	April 9, 2023
Weights for a bernoulli model in Bambi bambi	5	63	May 4, 2025
Plateau data: Initial evaluation of model at starting point failed!	3	24	January 19, 2025
Zero-inflated Bounded Continuous Outcome v5 modeling	10	71	February 20, 2025

Can't fit binomial hierarchical model on proportion data

Related topics