Skewed metric, SkewedStudentT attempt

Hi,

I am trying to model a skewed metric which runs pretty slow for smaller datasets (3k-5k), but stalls for bigger datasets around 50-60k

The distribution looks like this:

image

The modelling approach:

s = np.array([np.random.choice(obs_all, size=1000, replace=True).mean() for r in range(1, 500)])
mu_prior = s.mean()
mu_3std = max(s.std() * 3, 15)
mu_1half_std = max(s.std() * 1.5, 7.5)
prior_lower = mu_prior - mu_3std
prior_upper = mu_prior + mu_3std

test_val_lower = mu_prior - mu_1half_std
test_val_upper = mu_prior + mu_1half_std

with pm.Model() as ab_model:
	lam = pm.Uniform(f'lam', -0.99, 0.99)  # skew parameter in [-1,1]
	q = pm.Uniform(f'q', 2.001, 30)  # degrees fo freedom q > 2
	sigma = pm.Uniform(f'sigma', 0.5, 1000)  # sigma > 0
	mu = pm.StudentT(f'mu', nu=4, mu=mu_prior, sigma=mu_3std)
	# mu = pm.Uniform(f'mu', prior_lower, prior_upper)
	
	SkewedStudentT(f'SkewedStudentT', lam, q, sigma, mu, observed=obs_all)

	init = pm.Slice()
    step = pm.Slice()
trace = pm.sample(draws=1000, tune=500, init=init, step=step, start=start, discard_tuned_samples=True, compute_convergence_checks=True)

SkewedStudentT

I’d like to ask if you can recommend any settings that I may have overlooked or other modeling approach that may be better suited for problems like this.

Thanks
A

Hi Andras

I don’t have much experience with the skewed T distribution. So I didn’t go into the .logp formula. But am I correct in thinking that when lam=0 then it should reduce to a standard T?

Using your custom distribution when I set the observed to a standard T then I don’t get back the expected values, however I do when I use the built in PyMC3 T.

N, loc, df, scale = 8000, 4, 7, 2
obs_all = loc + scale*np.random.standard_t(df, size=N)

with pm.Model() as ab_model:
    lam = pm.Uniform(f'lam', -0.99, 0.99)  # skew parameter in [-1,1]
    q = pm.Uniform(f'q', 2.001, 30)  # degrees fo freedom q > 2
    sigma = pm.Uniform(f'sigma', 0.5, 1000)  # sigma > 0
    mu = pm.StudentT(f'mu', nu=4, mu=obs_all.mean(), sigma=max(obs_all.std() * 3, 15))

    obs = SkewedStudentT(f'obs', lam, q, sigma, mu, observed=obs_all)
    #obs = pm.StudentT(f'obs', nu=q, mu=mu, sigma=sigma, observed=obs_all)

    idata = pm.sample(return_inferencedata=True)
az.plot_posterior(idata, ref_val=[loc, 0, df, scale]);

using your custom distribution. Notice that both q and sigma are outside their respective posterior regions

using PyMC3’s

A few more general points

  • Uniform priors are generally discouraged, you should use more informative priors.
    ** For scale this can be Exponential, HalfNormal or HalfCauchy
    ** For StudentT df parameter you can use Gamma(2.5, 0.1)
    ** For mu you can probably just use Normal rather than StudentT, your sigma setting is already wide enough
    ** you can get more information here

  • You are changing the default settings in pm.sample(), unless you have strong reason to do this it is best to keep the defaults. In particular you should use the NUTS sampler whenever possible.

2 Likes

In pymc 5 you can now model skew student-t distribution like so

import pymc as pm
import matplotlib.pyplot as plt

def logp_skewt(value, nu, mu, sigma, alpha, *args, **kwargs):
    return (
        pm.math.log(2) + 
        pm.logp(pm.StudentT.dist(nu, mu=mu, sigma=sigma), value) + 
        pm.logcdf(pm.StudentT.dist(nu, mu=mu, sigma=sigma), alpha*value) - 
        pm.math.log(sigma)
    )

with pm.Model() as model:
    skewt = pm.CustomDist('likelihood', 1, 0, 3, -10, logp=logp_skewt)
    model_trace = pm.sample(
        nuts_sampler="numpyro",
        draws=10_000,
        chains=1,
    )
    
samples = model_trace.posterior.likelihood.to_numpy()

plt.hist(samples[(samples < 30) & (samples > -100)], bins=100, density=True)
plt.show()