Estimate daily proportions of weekly total

MarcoGorelli · June 29, 2021, 2:15pm

Say I have some data where the weekly total varies, but the daily split within the week is usually quite similar.

E.g.:

weekly = np.array([100, 120, 90, 80, 70, 100, 119, 110, 95, 90])
true_props = np.array([.1, .1, .2, .1, .05, .05, .3])
data = np.outer(weekly, true_props)

so that data looks like this:

array([[10.  , 10.  , 20.  , 10.  ,  5.  ,  5.  , 30.  ],
       [12.  , 12.  , 24.  , 12.  ,  6.  ,  6.  , 36.  ],
       [ 9.  ,  9.  , 18.  ,  9.  ,  4.5 ,  4.5 , 27.  ],
       [ 8.  ,  8.  , 16.  ,  8.  ,  4.  ,  4.  , 24.  ],
       [ 7.  ,  7.  , 14.  ,  7.  ,  3.5 ,  3.5 , 21.  ],
       [10.  , 10.  , 20.  , 10.  ,  5.  ,  5.  , 30.  ],
       [11.9 , 11.9 , 23.8 , 11.9 ,  5.95,  5.95, 35.7 ],
       [11.  , 11.  , 22.  , 11.  ,  5.5 ,  5.5 , 33.  ],
       [ 9.5 ,  9.5 , 19.  ,  9.5 ,  4.75,  4.75, 28.5 ],
       [ 9.  ,  9.  , 18.  ,  9.  ,  4.5 ,  4.5 , 27.  ]])

How would I go about creating a Bayesian model to estimate the daily proportions? What I’m interested in recovering is true_props.

Here’s something I tried, though the result is off so it’s evidently wrong:

observed_proportions = data / data.sum(axis=1)[:, np.newaxis]

with pm.Model(coords={'days': np.arange(7)}) as model:
    props = pm.HalfNormal('probs', sigma=1, dims='days')
    pm.Dirichlet('likelihood', a=props, observed=true_props)
    
with model:
    trace = pm.sample(return_inferencedata=True)
    
probs = trace.posterior['probs'].mean(dim=('chain', 'draw'))
probs / probs.sum()

This gives

array([0.13391884, 0.13609767, 0.16678547, 0.13733828, 0.11443165,
       0.11289476, 0.19853334])

which isn’t really close.

Any suggestions for how else to build the model / which distribution to use?

MarcoGorelli · June 29, 2021, 2:53pm

This seems to work:

with pm.Model(check_bounds=False) as model:
    probs = pm.Dirichlet('probs', np.ones(7))
    mu = probs[np.newaxis, :] * weekly[:, np.newaxis]
    pm.Normal('likelihood', mu=mu, sigma=1, observed=data)

I get this as the summary of my trace:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
probs[0]	0.10	0.003	0.095	0.106	0.0	0.0	3262.0	2966.0	1.0
probs[1]	0.10	0.003	0.095	0.105	0.0	0.0	3314.0	2847.0	1.0
probs[2]	0.20	0.003	0.194	0.205	0.0	0.0	4419.0	2688.0	1.0
probs[3]	0.10	0.003	0.094	0.105	0.0	0.0	3217.0	2735.0	1.0
probs[4]	0.05	0.003	0.044	0.055	0.0	0.0	2237.0	2362.0	1.0
probs[5]	0.05	0.003	0.045	0.055	0.0	0.0	2829.0	2758.0	1.0
probs[6]	0.40	0.003	0.394	0.406	0.0	0.0	3851.0	3777.0	1.0

Topic		Replies	Views
How to scale up estimated rate/(1 time period) to rate/(2 time periods)? Questions	1	351	December 17, 2019
Understanding effect of small data size on posterior distribution version agnostic	21	712	July 20, 2022
Help with model for estimating difference in proportions Questions modeling	2	341	November 10, 2022
Can someone help me model this data...? version agnostic modeling	3	685	September 25, 2022
How to model observed percentages (bounded from 0 to 1) Questions	8	2477	January 3, 2018

Estimate daily proportions of weekly total

Related Topics