I’m trying to model a proportion statistic for each area. My input data frame has:
var1: total of times a sampled person has said yes to a question for each area (num successes)
var2: total number of sampled person for each area (trial_sizes)
With the total of ~ 6400 rows, for each row I want an estimate of the expected proportion of people saying yes. My model is as follows:
with pm.Model() as model:
u = pm.Uniform('u', lower=0.0, upper=1.0)
log_v = pm.Exponential('log_v', lam=1.5)
v = pm.Deterministic('v', tt.log(log_v))
alpha = pm.Deterministic('alpha', u*v)
beta = pm.Deterministic('beta', v*(1-u))
p_output = pm.Beta("p_output", alpha=alpha, beta=beta, shape=prop_data.shape[0])
r = pm.Binomial("r", n = use_data.trial_sizes.values,
p = p_output,
observed = use_data.loc[:,var1].values)
trace = pm.sample(draws=5000, tune=2500, njobs=4)
I’m trying to incorporate weights to each p_output as the population sizes of each area is different, its effect on the global mean proportion would be different (i.e. high proportion in small areas has less effect than a high proportion in bigger areas). Is there a way in pyMC3 to incorporate this?
Many thanks,