Say I have some data where the weekly total varies, but the daily split within the week is usually quite similar.
E.g.:
weekly = np.array([100, 120, 90, 80, 70, 100, 119, 110, 95, 90])
true_props = np.array([.1, .1, .2, .1, .05, .05, .3])
data = np.outer(weekly, true_props)
so that data
looks like this:
array([[10. , 10. , 20. , 10. , 5. , 5. , 30. ],
[12. , 12. , 24. , 12. , 6. , 6. , 36. ],
[ 9. , 9. , 18. , 9. , 4.5 , 4.5 , 27. ],
[ 8. , 8. , 16. , 8. , 4. , 4. , 24. ],
[ 7. , 7. , 14. , 7. , 3.5 , 3.5 , 21. ],
[10. , 10. , 20. , 10. , 5. , 5. , 30. ],
[11.9 , 11.9 , 23.8 , 11.9 , 5.95, 5.95, 35.7 ],
[11. , 11. , 22. , 11. , 5.5 , 5.5 , 33. ],
[ 9.5 , 9.5 , 19. , 9.5 , 4.75, 4.75, 28.5 ],
[ 9. , 9. , 18. , 9. , 4.5 , 4.5 , 27. ]])
How would I go about creating a Bayesian model to estimate the daily proportions? What I’m interested in recovering is true_props
.
Here’s something I tried, though the result is off so it’s evidently wrong:
observed_proportions = data / data.sum(axis=1)[:, np.newaxis]
with pm.Model(coords={'days': np.arange(7)}) as model:
props = pm.HalfNormal('probs', sigma=1, dims='days')
pm.Dirichlet('likelihood', a=props, observed=true_props)
with model:
trace = pm.sample(return_inferencedata=True)
probs = trace.posterior['probs'].mean(dim=('chain', 'draw'))
probs / probs.sum()
This gives
array([0.13391884, 0.13609767, 0.16678547, 0.13733828, 0.11443165,
0.11289476, 0.19853334])
which isn’t really close.
Any suggestions for how else to build the model / which distribution to use?