Skewed metric, SkewedStudentT attempt

andras · July 22, 2020, 5:19pm

Hi,

I am trying to model a skewed metric which runs pretty slow for smaller datasets (3k-5k), but stalls for bigger datasets around 50-60k

The distribution looks like this:

The modelling approach:

s = np.array([np.random.choice(obs_all, size=1000, replace=True).mean() for r in range(1, 500)])
mu_prior = s.mean()
mu_3std = max(s.std() * 3, 15)
mu_1half_std = max(s.std() * 1.5, 7.5)
prior_lower = mu_prior - mu_3std
prior_upper = mu_prior + mu_3std

test_val_lower = mu_prior - mu_1half_std
test_val_upper = mu_prior + mu_1half_std

with pm.Model() as ab_model:
	lam = pm.Uniform(f'lam', -0.99, 0.99)  # skew parameter in [-1,1]
	q = pm.Uniform(f'q', 2.001, 30)  # degrees fo freedom q > 2
	sigma = pm.Uniform(f'sigma', 0.5, 1000)  # sigma > 0
	mu = pm.StudentT(f'mu', nu=4, mu=mu_prior, sigma=mu_3std)
	# mu = pm.Uniform(f'mu', prior_lower, prior_upper)
	
	SkewedStudentT(f'SkewedStudentT', lam, q, sigma, mu, observed=obs_all)

	init = pm.Slice()
    step = pm.Slice()
trace = pm.sample(draws=1000, tune=500, init=init, step=step, start=start, discard_tuned_samples=True, compute_convergence_checks=True)

SkewedStudentT

I’d like to ask if you can recommend any settings that I may have overlooked or other modeling approach that may be better suited for problems like this.

Thanks
A

nkaimcaudle · July 23, 2020, 1:02am

Hi Andras

I don’t have much experience with the skewed T distribution. So I didn’t go into the .logp formula. But am I correct in thinking that when lam=0 then it should reduce to a standard T?

Using your custom distribution when I set the observed to a standard T then I don’t get back the expected values, however I do when I use the built in PyMC3 T.

N, loc, df, scale = 8000, 4, 7, 2
obs_all = loc + scale*np.random.standard_t(df, size=N)

with pm.Model() as ab_model:
    lam = pm.Uniform(f'lam', -0.99, 0.99)  # skew parameter in [-1,1]
    q = pm.Uniform(f'q', 2.001, 30)  # degrees fo freedom q > 2
    sigma = pm.Uniform(f'sigma', 0.5, 1000)  # sigma > 0
    mu = pm.StudentT(f'mu', nu=4, mu=obs_all.mean(), sigma=max(obs_all.std() * 3, 15))

    obs = SkewedStudentT(f'obs', lam, q, sigma, mu, observed=obs_all)
    #obs = pm.StudentT(f'obs', nu=q, mu=mu, sigma=sigma, observed=obs_all)

    idata = pm.sample(return_inferencedata=True)
az.plot_posterior(idata, ref_val=[loc, 0, df, scale]);

using your custom distribution. Notice that both q and sigma are outside their respective posterior regions

using PyMC3’s

A few more general points

Uniform priors are generally discouraged, you should use more informative priors.
** For scale this can be Exponential, HalfNormal or HalfCauchy
** For StudentT df parameter you can use Gamma(2.5, 0.1)
** For mu you can probably just use Normal rather than StudentT, your sigma setting is already wide enough
** you can get more information here
You are changing the default settings in pm.sample(), unless you have strong reason to do this it is best to keep the defaults. In particular you should use the NUTS sampler whenever possible.

TPK · April 8, 2023, 8:46am

In pymc 5 you can now model skew student-t distribution like so

import pymc as pm
import matplotlib.pyplot as plt

def logp_skewt(value, nu, mu, sigma, alpha, *args, **kwargs):
    return (
        pm.math.log(2) + 
        pm.logp(pm.StudentT.dist(nu, mu=mu, sigma=sigma), value) + 
        pm.logcdf(pm.StudentT.dist(nu, mu=mu, sigma=sigma), alpha*value) - 
        pm.math.log(sigma)
    )

with pm.Model() as model:
    skewt = pm.CustomDist('likelihood', 1, 0, 3, -10, logp=logp_skewt)
    model_trace = pm.sample(
        nuts_sampler="numpyro",
        draws=10_000,
        chains=1,
    )
    
samples = model_trace.posterior.likelihood.to_numpy()

plt.hist(samples[(samples < 30) & (samples > -100)], bins=100, density=True)
plt.show()

Topic		Replies	Views
How to speed up model fitting of CustomDist? v5	0	556	April 8, 2023
Modeling with highly-skewed data version agnostic sampling	5	475	September 5, 2023
Help With A/B Testing For a Learner Questions	2	370	March 1, 2023
Why does it seem that StudentT distributions are slower to sample compared to the Normal Distribution version agnostic	1	399	July 18, 2023
Boosting Efficiency of Skewed-t Models in PyMC v5 modeling	1	338	September 18, 2023

Skewed metric, SkewedStudentT attempt

Related topics