Robust positive continuous likelihood

opherdonchin · May 7, 2024, 5:56am

This may be the wrong place to ask this, so feel free to redirect.

I have data that is positive and continuous but has outliers. I normally use the Gamma distribution to model positive continuous data or parameters. Is there some equivalent robust version the way that the Student’s T is used as a robust likelihood in place of the Normal distribution? If not based on Gamma, is there any standard way to do a robust likelihood for positive continuous variables?

(Added in edit:) There is a generalized Gamma distribution which has an extra shape parameter p. I also thought that the easiest thing to do (programmatically) is just to use a truncated Student’s t.

Opher

jessegrabowski · May 7, 2024, 6:10am

You try a LogStudentT distribution. The generative graph is just the exp of student T, so it’s easy to implement using a custom dist:

import pymc as pm

def log_student_t(mu, sigma, nu, size=None):
    return pm.math.exp(pm.StudentT.dist(mu=mu, sigma=sigma, nu=nu, size=size))

with pm.Model() as m:
    # priors for mu, sigma, nu
    y_hat = pm.CustomDist(mu, sigma, nu, dist=log_student_t, observed=y_data)

For the prior on nu, @bwengals was doing some work on a model like this with a special prior for nu, but the specifics escape me. Hopefully he can chime in himself.

We also recently experimented with a mixture of LogNormals, so each datapoint was classified via latent variable into either a low sigma or high sigma distribution, each with the same mean. The idea was an analogy to the student T as a mixture of infinite normals.

But as to the question of whether there’s a “standard” approach here, I haven’t seen one if there is.

opherdonchin · May 7, 2024, 7:29am

Thanks this is helpful. I added some ideas I’d had before above. If there’s nothing standard, we’ll just pick something and go with it.

bwengals · May 14, 2024, 6:37am

RE the prior on nu, the PC prior is a good choice. There’s a PR for it in PyMC experimental if you’re brave and not on windows. Or, to keep it simple, you can use Gamma(alpha=2, beta=0.1). That’s very close to the PC prior that says, “I think there’s a 50% chance that nu is greater than 30”, where 30 is the sort of recognized place where student t’s look very normal.

Topic		Replies	Views
How to define StudentT parameters? v5 variational_inferenc , modeling	2	479	November 13, 2023
Implementing a robit model in pymc3 Questions	2	702	December 16, 2021
Eight school problem with student t distribution for treatment effects Questions	2	996	October 19, 2022
Creating a LogStudentT distribution v5	10	593	May 6, 2023
Why GLM module doesn't have HalfStudentT distribution in Family?	1	543	September 7, 2021

Robust positive continuous likelihood

Related topics