Hello all, I’ve been trying to learn and apply Bayesian sampling approaches, particularly to A/B tests and I am in the weeds. I am trying to follow the BEST approach documented in then pymc documentation here but I am coming into problems and have questions. I hope I can ask them here to get unstuck.
The setup is I have very right skewed data that is duration times between a control and test. This is my attempt at simulating data for that.
ctrl_ = sc.stats.skewnorm(loc=2.8, scale=3, a=25).rvs(10000)
ctrl = np.r_[ctrl_, np.random.randint(20, high=500, size=300)]
test_ = sc.stats.skewnorm(loc=1.4, scale=2.8, a=16).rvs(10000)
test = np.r_[test_, np.random.randint(20, high=380, size=300)]
The goal is to determine if the test group is some % smaller than the test that is meaningful statistically.
So since the data is right skewed pretty badly, I decided to take the logarithm of each group.
ctrl_log = np.log(ctrl)
test_log = np.logt(test)
pooled_mn = np.r_[ctrl_log, test_log].mean()
pooled_std = np.r_[ctrl_log, test_log].std()
df = np.min([ctrl_log.size, test_log.size]) - 1
with pm.Model() as m1:
ctrl_mu = pm.Normal("ctrl_mu", mu=pooled_mn, sigma=pooled_std * 2)
test_mu = pm.Normal("test_mu", mu=pooled_mn, sigma=pooled_std * 2)
nu = pm.Uniform("nu", lower=int(df/4), upper=df)
ctrl_sigma = pm.HalfCauchy("ctrl_sigma", beta=2)
ctrl_lam = ctrl_sigma ** -2
test_sigma = pm.HalfCauchy("test_sigma", beta=2)
test_lam = test_sigma ** -2
ctrl_duration = pm.StudentT("ctrl_duration", nu=nu, mu=ctrl_mu, lam=ctrl_lam, observed=ctrl_log)
test_duration = pm.StudentT("test_duration", nu=nu, mu=test_mu, lam=test_lam, observed=test_log)
idata = pm.Sample(2000, tune=5000, target_accept=.95, return_inference=True)
So I had some questions that I am struggling with:
- When I run this model the posterior for
ctrl_mu
andtest_mu
are very close to theaverage(ctrl_log)
andaverage(test_log)
, respectively. Is that what I am going for or should it be closer to the parameters in the skew norm, log(2.8) and log(1.4)? - When I try to convert the log transformed posterior for the mu’s back with np.exp(), they are very different from the untransformed data for ctrl and test. Is that a sign that the model is not good or that I am doing a naive conversion back of the transformed variable?
- My completely fabricated strategy was to produce the posterior of the difference between
ctrl_mu
andtest_mu
and if that difference did not include 0, claim the two are different. Than I would produce the posterior predictive usingpred = pm.sample_posterior_predictive(idata, var_names=["ctrl_duration", "test_duration"])
and plot the posterior ofnp.exp(pred["test_duration"])/np.exp(pred["ctrl_duration"]) - 1
to see the % change. Is that a correct way to show the percent change posterior? - I originally computed the percent change of the posteriors
test_mu/ctrl_mu - 1
but since they are log transformed it seemed that was different from what I really wanted, the distribution of percent change of the original scale mu’s. - Does it matter what transformation I use (e.g. square root) or if I standardize? The BEST document doesn’t say that and for some really bad skewed data the sampler seems to die on me without doing a transformation.
I hope all of this made sense and some kind soul would help me understand how to better use and understand Bayesian analysis for A/B testing. I really don’t want to go back to stats.ttest_ind . . . .
Thank you !!!