I have data where each row corresponds to a single review of an item. Items can be reviewed multiple times (anywhere from 1 to ~20), however they only have one final outcome - an example subset may be:
ID | Reviewer | p1 | p2 | p3 | outcome |
---|---|---|---|---|---|
1 | A | 1.50 | 1.6 | 0.9 | 2.4 |
1 | B | 1.45 | 1.35 | 1.46 | 2.4 |
1 | C | 1.55 | 1.51 | 1.51 | 2.4 |
2 | B | 1.05 | 1.02 | 1.00 | 1.5 |
2 | C | 1.06 | 1.16 | 0.80 | 1.5 |
3 | C | 0.86 | 1.5 | 0.46 | 0.8 |
I’m interested in doing a regression at the ID/outcome level. In principle, I could just do a groupby("ID").mean()
, but there’s a lot of information lost in doing so. It’d be nice to capture the idea that many reviews with similar values ought to make us more confident in the value. Is there a standard way to create a likelihood at a “group” level without having to roll to a summary statistic? (The long term goal would be to do a hierarchical model to encapsulate the reviewer level effect, since reviewers would be similar but might have some variance between them, but one step at a time ).
There is an old, semi-related thread, but I couldn’t extract anything applicable from it: Modelling groups with different number of observations - #3 by falk