Hi there,
I have data from payment amounts for different years (2018 and 2019). From plotting the data you can see it’s heavily skewed. I thought LogNormal might be a good fit as applying np.log
and plotting it produces something that somewhat resembles the Normal distribution.
Here’s the histogram for the “raw” data at different “zoom levels” (x-axis is payment amount):
Here is the model (I used Exponential priors trying to keep the values small, is that ok?):
with pm.Model() as model:
mu1 = pm.Exponential('mu1', lam=1)
sigma1 = pm.Exponential('sigma1', lam=1)
mu2 = pm.Exponential('mu2', lam=1)
sigma2 = pm.Exponential('sigma2', lam=1)
payments_2018 = pm.Lognormal('payments_2018', mu=mu1, sigma=sigma1, observed=bq_data_payments_2018['transaction_amount'])
payments_2019 = pm.Lognormal('payments_2019', mu=mu2, sigma=sigma2, observed=bq_data_payments_2019['transaction_amount'])
diff = pm.Deterministic('diff', mu2-mu1)
lift = pm.Deterministic('lift', mu2/mu1)
trace = pm.sample(10000, tune=2000)
Here’s the trace plot (it looks ok, at least to me):
But the PPC check (plotted) looks weird:
Any ideas why this is happening? I can’t wrap my head around it?
Also, is there any better way to model that kind of data? I’m mainly looking to compare it in a “A/B experiment” fashion.
Thanks in advance!