The above distributions show the differences of \mu, \sigma, the expected value (mean) and the log of the mean. So the parameters between the two groups are clearly different. Does my model makes sense? Based on the distribution on the bottom left, can I conclude that ‘c2’ has on average 15 more page views per day than ‘c1’?
It is fair to say that “given this model, the means are probably different by 10 - 20”. A few other random thoughts if you want to go deeper:
It probably won’t change much, but best practice is now to use something like pm.HalfNormal('sd1', 10) instead of a Uniform('sd1', 0, 500).
Why a log normal? If that changes to some other distribution, do you get the same results? (you could get into model comparison with this)
Are your convergence results fine? This means no divergences, rhat very near 1, reasonable effective sample size?
The first plot you show, the distributions look very visually similar! I’d be prepared when presenting these results to put this in context: what is the raw difference in the means? I would guess that the dataset is pretty large, so maybe that is swamping the priors. Which is fine! it means you could add more features to the model, or just give you confidence that you have enough data to not need Bayes here.
Thank you @colcarroll for your reply and comments!
Convergence is fine, i.e. no divergences and rhat virtually one.
The dataset is large, 50K records in cluster 1 and 10K in cluster 2. What do you exactly mean with “it means you could add more features to the model”?
The large sample size probably doesn’t warrant Bayes here, as you mention. But in terms of interpretability of results I think Bayes has advantages over e.g. a Wilcoxon rank-sum test (or Hypothesis testing in general), since it also gives you the magnitude of the difference. What are your thoughts on this?
I was mostly trying to say things that might be useful, since “you did great” seemed boring!
You make a good point about interpretability, even in the presence of large data.
I was thinking about how, as your dataset grows, your posterior densities will collapse to the MLE pretty tightly (I couldn’t find a good reference for this, but would be interested if anyone has one!) Conversely, adding more parameters will (usually) admit more uncertainty over the parameters.
This is a pretty hand-wavy way of saying that bigger datasets can support more complex, expressive models.