Newbie here.
I’m trying to address a limitation in Bayesian AB tests. We’re assuming every observation is independent, but we often observe each customer more than once. I have a model that works on synthetic data, but it is very slow when realistic data volumes are provided (BetaBinomial is a poor fit, so I don’t want to rely upon it w/conjugates / Gibbs).
I think there is redundancy in the example below because the conversion likelihood for each individual customer is sampled in the posterior. But I can’t find a way to draw independent samples from a uniform prior and feed those into the Binomial p parameter. If I give the uniform prior no shape, the model will assume all customers have the same conversion likelihood, p.
import pymc as pm
import pandas as pd
import numpy as np
observations = [30]*20 #attempts, each entry represents a new customer
occurrences = [2]*10 + [28]*10 #successes for the corresponding customers
with pm.Model() as model:
p_t = pm.Uniform("test_distn", 0, 1, shape=(len(observations), ))
p_t_mean = pm.Deterministic("test", p_t.mean())
obs_test = pm.Binomial("obs_test", n=np.array(observations), p=p_t, observed=occurrences)
step = pm.Metropolis()
trace = pm.sample(posterior_sample + burn, chains=2, step=step)
I also end up randomly flattening “test_distn” from the trace to see how the customers success likelihood is distributed. Which works, but feels ugly/hacky
This attempt to remove redundancy doesn’t work. Posterior for “test_distn” is clearly wrong
import pymc as pm
import pandas as pd
import numpy as np
observations = [30]*20 #attempts, each entry represents a new customer
occurrences = [2]*10 + [28]*10 #successes for the corresponding customers
with pm.Model() as model:
p_t = pm.Uniform("test_distn", 0, 1)
p_t_n = pm.draw(p_t, draws = len(observations))
p_t_mean = pm.Deterministic("test", p_t.mean())
obs_test = pm.Binomial("obs_test", n=np.array(observations), p=p_t_n, observed=occurrences)
step = pm.Metropolis()
trace = pm.sample(posterior_sample + burn, chains=2, step=step)
Any help would be most appreciated. It feels like an efficient solution here is highly applicable, hopefully others could benefit too