I have purchase data that seems to be close to lognormal, but on the log scale it has some skewness (I’m assuming this is whats referred to as a log skew normal distribution but I could be wrong here)
I’m try to fit a model to the data that models the expected revenue per purchase and the expected total revenue of the dataset, but It doesn’t seem to be fitting to the data well / returning the proper parameters. Is there a better way to fit this model?
Also I think the expectation is wrong since I’m basing it on the expectation of a lognormal distribution - how would I improve it (the variable RevA, total_revA)?
import pandas as pd
import numpy as np
import scipy.stats as scs
import pymc3 as pm
import arviz as az
alpha, loc, scale = 1.5, 4.8, 1.013
d_log_scale = scs.skewnorm(alpha, loc=loc, scale=scale).rvs(1000)
d = np.exp( d_log_scale )
print( d_log_scale.mean(), d_log_scale.std() )
with pm.Model() as model:
alpha_theta_a = pm.Normal('alpha_theta_a', 1, 1)
sig_theta_a = pm.Exponential('sig_theta_a', 1)
mu_theta_a = pm.Normal('mu_theta_a', 2, 2)
theta_a = pm.SkewNormal('theta_a', mu=mu_theta_a, sigma=sig_theta_a, alpha=alpha_theta_a)
sig_a = pm.Exponential('sig_a', 1)
Ra = pm.Lognormal('Ra', theta_a, sig_a, observed=d)
RevA = pm.Deterministic('Revenue', np.exp(theta_a + 0.5*sig_a**2))
total_revA = pm.Deterministic('total_revenue', RevA * len(d))
# sampling
prior = pm.sample_prior_predictive()
trace = pm.sample(1000, tune=1000)
posterior_predictive = pm.sample_posterior_predictive(trace)
# Save results to xarray object
data = az.from_pymc3(
trace=trace,
prior=prior,
posterior_predictive=posterior_predictive,
model=model
)
pm.summary(data.posterior)