Help With A/B Testing For a Learner

Hello all, I’ve been trying to learn and apply Bayesian sampling approaches, particularly to A/B tests and I am in the weeds. I am trying to follow the BEST approach documented in then pymc documentation here but I am coming into problems and have questions. I hope I can ask them here to get unstuck.

The setup is I have very right skewed data that is duration times between a control and test. This is my attempt at simulating data for that.

ctrl_ = sc.stats.skewnorm(loc=2.8, scale=3, a=25).rvs(10000)
ctrl = np.r_[ctrl_, np.random.randint(20, high=500, size=300)]

test_ = sc.stats.skewnorm(loc=1.4, scale=2.8, a=16).rvs(10000)
test = np.r_[test_, np.random.randint(20, high=380, size=300)]

The goal is to determine if the test group is some % smaller than the test that is meaningful statistically.

So since the data is right skewed pretty badly, I decided to take the logarithm of each group.

ctrl_log = np.log(ctrl)
test_log = np.logt(test)
pooled_mn = np.r_[ctrl_log, test_log].mean()
pooled_std = np.r_[ctrl_log, test_log].std()
df = np.min([ctrl_log.size, test_log.size]) - 1

with pm.Model() as m1:
   ctrl_mu = pm.Normal("ctrl_mu", mu=pooled_mn, sigma=pooled_std * 2)
   test_mu = pm.Normal("test_mu", mu=pooled_mn, sigma=pooled_std * 2)
  
   nu = pm.Uniform("nu", lower=int(df/4), upper=df)
   
   ctrl_sigma = pm.HalfCauchy("ctrl_sigma", beta=2)
   ctrl_lam = ctrl_sigma ** -2
   test_sigma = pm.HalfCauchy("test_sigma", beta=2)
   test_lam = test_sigma ** -2
   
   ctrl_duration = pm.StudentT("ctrl_duration", nu=nu, mu=ctrl_mu, lam=ctrl_lam, observed=ctrl_log)
   test_duration = pm.StudentT("test_duration", nu=nu, mu=test_mu, lam=test_lam, observed=test_log)

   idata = pm.Sample(2000, tune=5000, target_accept=.95, return_inference=True)

So I had some questions that I am struggling with:

  1. When I run this model the posterior for ctrl_mu and test_mu are very close to the average(ctrl_log) and average(test_log), respectively. Is that what I am going for or should it be closer to the parameters in the skew norm, log(2.8) and log(1.4)?
  2. When I try to convert the log transformed posterior for the mu’s back with np.exp(), they are very different from the untransformed data for ctrl and test. Is that a sign that the model is not good or that I am doing a naive conversion back of the transformed variable?
  3. My completely fabricated strategy was to produce the posterior of the difference between ctrl_mu and test_mu and if that difference did not include 0, claim the two are different. Than I would produce the posterior predictive using pred = pm.sample_posterior_predictive(idata, var_names=["ctrl_duration", "test_duration"]) and plot the posterior of np.exp(pred["test_duration"])/np.exp(pred["ctrl_duration"]) - 1 to see the % change. Is that a correct way to show the percent change posterior?
  4. I originally computed the percent change of the posteriors test_mu/ctrl_mu - 1 but since they are log transformed it seemed that was different from what I really wanted, the distribution of percent change of the original scale mu’s.
  5. Does it matter what transformation I use (e.g. square root) or if I standardize? The BEST document doesn’t say that and for some really bad skewed data the sampler seems to die on me without doing a transformation.

I hope all of this made sense and some kind soul would help me understand how to better use and understand Bayesian analysis for A/B testing. I really don’t want to go back to stats.ttest_ind . . . .

Thank you !!!

Hi! Thanks for posting your simulation code, it makes it pretty easier to understand your question.

A bunch of your questions are about transformation which I’ll group together. In general, instead of transforming your data to play nicely with your statistical model, it can be easier and more informative to change your statistical model to play nicely with the data. In your case, your data is constrained to be positive. People can only ever have positive wait times, never zero or negative. This makes the normal distribution a poor choice because it doesn’t know about the boundary at 0. Also, durations usual exhibit exponential behavior. Once someone leaves a website (or whatever your data is about), they cannot leave again. So the bulk of the duration time stacks up near zero and only a few people hang out at the website for a long time. So we can just use the exponential distribution as our outcome distribution. (Also, there is an information entropy argument for using the exponential).

with pm.Model() as m2:
  
  p = pm.Exponential('p',1,shape=2)
  
  y1 = pm.Exponential('ctrl_duration',p[0],observed=ctrl)
  y2 = pm.Exponential('test_duration',p[1],observed=test)
  
  t2 = pm.sample()

The exponential distribution also turns out to be a nice way to specify the prior because the mean of an exponential also must be positive. This model samples really fast and confirms that the average duration for ctrl group is longer than the test group. I hope you find this approach more appealing. I didn’t transform the data at all for this example.

  1. When I run this model the posterior for ctrl_mu and test_mu are very close to the average(ctrl_log) and average(test_log), respectively. Is that what I am going for or should it be closer to the parameters in the skew norm, log(2.8) and log(1.4)?

Yeah I think they should be close to the average ctrl_log and test_log. The parameters in the skew normal aren’t going to quite be recovered because you add all those randints into the data.

  1. My completely fabricated strategy was to produce the posterior of the difference between ctrl_mu and test_mu and if that difference did not include 0, claim the two are different. Than I would produce the posterior predictive using pred = pm.sample_posterior_predictive(idata, var_names=["ctrl_duration", "test_duration"]) and plot the posterior of np.exp(pred["test_duration"])/np.exp(pred["ctrl_duration"]) - 1 to see the % change. Is that a correct way to show the percent change posterior?

There isn’t a unique correct way of comparing the difference between posteriors but I like this approach just fine. Alternatively, just subtract the two predictive distributions and plot a histogram to show the mean expected difference between users in each group.

3 Likes

Thanks so much Daniel for the response and the help. It is super helpful and what I was looking for.

1 Like