Help With A/B Testing For a Learner

Hello all, I’ve been trying to learn and apply Bayesian sampling approaches, particularly to A/B tests and I am in the weeds. I am trying to follow the BEST approach documented in then pymc documentation here but I am coming into problems and have questions. I hope I can ask them here to get unstuck.

The setup is I have very right skewed data that is duration times between a control and test. This is my attempt at simulating data for that.

``````ctrl_ = sc.stats.skewnorm(loc=2.8, scale=3, a=25).rvs(10000)
ctrl = np.r_[ctrl_, np.random.randint(20, high=500, size=300)]

test_ = sc.stats.skewnorm(loc=1.4, scale=2.8, a=16).rvs(10000)
test = np.r_[test_, np.random.randint(20, high=380, size=300)]
``````

The goal is to determine if the test group is some % smaller than the test that is meaningful statistically.

So since the data is right skewed pretty badly, I decided to take the logarithm of each group.

``````ctrl_log = np.log(ctrl)
test_log = np.logt(test)
pooled_mn = np.r_[ctrl_log, test_log].mean()
pooled_std = np.r_[ctrl_log, test_log].std()
df = np.min([ctrl_log.size, test_log.size]) - 1

with pm.Model() as m1:
ctrl_mu = pm.Normal("ctrl_mu", mu=pooled_mn, sigma=pooled_std * 2)
test_mu = pm.Normal("test_mu", mu=pooled_mn, sigma=pooled_std * 2)

nu = pm.Uniform("nu", lower=int(df/4), upper=df)

ctrl_sigma = pm.HalfCauchy("ctrl_sigma", beta=2)
ctrl_lam = ctrl_sigma ** -2
test_sigma = pm.HalfCauchy("test_sigma", beta=2)
test_lam = test_sigma ** -2

ctrl_duration = pm.StudentT("ctrl_duration", nu=nu, mu=ctrl_mu, lam=ctrl_lam, observed=ctrl_log)
test_duration = pm.StudentT("test_duration", nu=nu, mu=test_mu, lam=test_lam, observed=test_log)

idata = pm.Sample(2000, tune=5000, target_accept=.95, return_inference=True)
``````

So I had some questions that I am struggling with:

1. When I run this model the posterior for `ctrl_mu` and `test_mu` are very close to the `average(ctrl_log)` and `average(test_log)`, respectively. Is that what I am going for or should it be closer to the parameters in the skew norm, log(2.8) and log(1.4)?
2. When I try to convert the log transformed posterior for the mu’s back with np.exp(), they are very different from the untransformed data for ctrl and test. Is that a sign that the model is not good or that I am doing a naive conversion back of the transformed variable?
3. My completely fabricated strategy was to produce the posterior of the difference between `ctrl_mu` and `test_mu` and if that difference did not include 0, claim the two are different. Than I would produce the posterior predictive using `pred = pm.sample_posterior_predictive(idata, var_names=["ctrl_duration", "test_duration"])` and plot the posterior of `np.exp(pred["test_duration"])/np.exp(pred["ctrl_duration"]) - 1` to see the % change. Is that a correct way to show the percent change posterior?
4. I originally computed the percent change of the posteriors `test_mu/ctrl_mu - 1` but since they are log transformed it seemed that was different from what I really wanted, the distribution of percent change of the original scale mu’s.
5. Does it matter what transformation I use (e.g. square root) or if I standardize? The BEST document doesn’t say that and for some really bad skewed data the sampler seems to die on me without doing a transformation.

I hope all of this made sense and some kind soul would help me understand how to better use and understand Bayesian analysis for A/B testing. I really don’t want to go back to stats.ttest_ind . . . .

Thank you !!!

Hi! Thanks for posting your simulation code, it makes it pretty easier to understand your question.

``````with pm.Model() as m2:

p = pm.Exponential('p',1,shape=2)

y1 = pm.Exponential('ctrl_duration',p[0],observed=ctrl)
y2 = pm.Exponential('test_duration',p[1],observed=test)

t2 = pm.sample()
``````

The exponential distribution also turns out to be a nice way to specify the prior because the mean of an exponential also must be positive. This model samples really fast and confirms that the average duration for ctrl group is longer than the test group. I hope you find this approach more appealing. I didn’t transform the data at all for this example.

1. When I run this model the posterior for `ctrl_mu` and `test_mu` are very close to the `average(ctrl_log)` and `average(test_log)`, respectively. Is that what I am going for or should it be closer to the parameters in the skew norm, log(2.8) and log(1.4)?

Yeah I think they should be close to the average ctrl_log and test_log. The parameters in the skew normal aren’t going to quite be recovered because you add all those randints into the data.

1. My completely fabricated strategy was to produce the posterior of the difference between `ctrl_mu` and `test_mu` and if that difference did not include 0, claim the two are different. Than I would produce the posterior predictive using `pred = pm.sample_posterior_predictive(idata, var_names=["ctrl_duration", "test_duration"])` and plot the posterior of `np.exp(pred["test_duration"])/np.exp(pred["ctrl_duration"]) - 1` to see the % change. Is that a correct way to show the percent change posterior?

There isn’t a unique correct way of comparing the difference between posteriors but I like this approach just fine. Alternatively, just subtract the two predictive distributions and plot a histogram to show the mean expected difference between users in each group.

3 Likes

Thanks so much Daniel for the response and the help. It is super helpful and what I was looking for.

1 Like