Let’s say I have data of proportions where trial outcomes can be anything between 0 and 1):
import pandas as pd
import bambi as bmb # version 0.10.0
df = pd.DataFrame({
'group': ['A', 'B', 'C'],
'y': [10.6, 0.7, 84.13],
'n': [100000, 10, 900000]
})
m = bmb.Model('p(y, n) ~ (1|group)', data=df, family='binomial')
idata = m.fit()
How can I model such data since I get the error that successes need to be integers? If it’s not possible in bambi, pymc is also okay. Thanks!
Could you explain a little bit more about why y
is not integer? What does it represent? What does n
represent?
1 Like
Let’s say I want to model star ratings (of Amazon products, for example) from 1 to 5. In my opinion, it makes sense to estimate the proportion of stars received out of all possible stars (for one rating). So 3 stars = 0.5, 4 stars 0.75, etc…
n represents all ratings received per product.
For example, product received 3, 5, 1 star, this is converted as y = 0.5 + 1 + 0 and n = 3.
I’m not so sure that’s a good idea. The stars represent an ordinal outcome (1 < 2 < 3 < 4 < 5). So you could model it as such (e.g., ordered logistic regression). Or if you still want to model the proportion of stars, you can use a Beta distribution for the response. However, I’m not very in favor of this option either because you only have 5 possible values (0, 0.25, 0.5, 0.75, 1)
Remember the binomial distribution applies when your variable can be represented as “the number of successes in ‘n’ independent trials”. Every time you use a Binomial, think about what’s the success and what’s the failure.