I have data from payment amounts for different years (2018 and 2019). From plotting the data you can see it’s heavily skewed. I thought LogNormal might be a good fit as applying
np.log and plotting it produces something that somewhat resembles the Normal distribution.
Here’s the histogram for the “raw” data at different “zoom levels” (x-axis is payment amount):
Here is the model (I used Exponential priors trying to keep the values small, is that ok?):
with pm.Model() as model: mu1 = pm.Exponential('mu1', lam=1) sigma1 = pm.Exponential('sigma1', lam=1) mu2 = pm.Exponential('mu2', lam=1) sigma2 = pm.Exponential('sigma2', lam=1) payments_2018 = pm.Lognormal('payments_2018', mu=mu1, sigma=sigma1, observed=bq_data_payments_2018['transaction_amount']) payments_2019 = pm.Lognormal('payments_2019', mu=mu2, sigma=sigma2, observed=bq_data_payments_2019['transaction_amount']) diff = pm.Deterministic('diff', mu2-mu1) lift = pm.Deterministic('lift', mu2/mu1) trace = pm.sample(10000, tune=2000)
Here’s the trace plot (it looks ok, at least to me):
But the PPC check (plotted) looks weird:
Any ideas why this is happening? I can’t wrap my head around it?
Also, is there any better way to model that kind of data? I’m mainly looking to compare it in a “A/B experiment” fashion.
Thanks in advance!