Hi,
I am trying to model a skewed metric which runs pretty slow for smaller datasets (3k-5k), but stalls for bigger datasets around 50-60k
The distribution looks like this:
The modelling approach:
s = np.array([np.random.choice(obs_all, size=1000, replace=True).mean() for r in range(1, 500)])
mu_prior = s.mean()
mu_3std = max(s.std() * 3, 15)
mu_1half_std = max(s.std() * 1.5, 7.5)
prior_lower = mu_prior - mu_3std
prior_upper = mu_prior + mu_3std
test_val_lower = mu_prior - mu_1half_std
test_val_upper = mu_prior + mu_1half_std
with pm.Model() as ab_model:
lam = pm.Uniform(f'lam', -0.99, 0.99) # skew parameter in [-1,1]
q = pm.Uniform(f'q', 2.001, 30) # degrees fo freedom q > 2
sigma = pm.Uniform(f'sigma', 0.5, 1000) # sigma > 0
mu = pm.StudentT(f'mu', nu=4, mu=mu_prior, sigma=mu_3std)
# mu = pm.Uniform(f'mu', prior_lower, prior_upper)
SkewedStudentT(f'SkewedStudentT', lam, q, sigma, mu, observed=obs_all)
init = pm.Slice()
step = pm.Slice()
trace = pm.sample(draws=1000, tune=500, init=init, step=step, start=start, discard_tuned_samples=True, compute_convergence_checks=True)
I’d like to ask if you can recommend any settings that I may have overlooked or other modeling approach that may be better suited for problems like this.
Thanks
A