How to define StudentT parameters?

Hello! I’ve been taking baby steps in developing my first model, as I’ve posted questions about it previously here and here. Given the sensitivity of the application, I’m proceeding with caution. While I’ve successfully identified effective priors for my parameters using TruncateNormal or Normal distribution, I’m encountering a challenge with the likelihood. I have tried the lognormal, the TruncatedNormal, Normal, and StudentT. It seems that the StudentT is the best, as displayed in the image below, and I believe it’s true because it’s more robust for outliers, which the data sometimes has.

My problem is that I don’t know how to set the StudentT parameters. Therefore, I performed a sensitivity analysis on the StudentT parameter. I’m not a statistician (and I’m saddened that I didn’t learn Bayesian Inference at university; it needed me to enter the industry to realize how important it is — that’s life). So, I was following the robust regression from here, where they define the nu parameter for the t-student as an exponential. My reasoning was trying to figure out what this parameter does, so I did a loop modifying this parameter for the values kk = [0.1,1,5,10,20,30,50,100].

trace_lam = []
post_pred_lam = []
prior_pred_lam = []
for lam_vals in kk:
    with model_t:
        # Switch out the sigma
        pm.set_data({"lam_test": lam_vals})
        trace_i = pm.sample(2500, tune=2000,chains=4, cores=2, random_seed=rng,idata_kwargs={"log_likelihood": True})
        prior_pred_lam.append(pm.sample_prior_predictive(samples=1000, random_seed=rng))  

and this what I got:

My point is: Does the t-student distribution always have a sharp peak? It seems that the parameter kk=50 is the one getting closer to the observed data, but even this is not close enough (I guess). How do I know if it’s close enough? Is it possible to constrain it to be only positive? I was watching this video that helped me a lot to understand my priors, and what I know is that 95% of my observed data (y) should be between 50 - 600 and should never be lower than 0 or higher than 4000. How do I introduce this information into my t-student distribution? I have tried the pymc.find_constrained_prior, but since it has 3 parameters and I must set one, I guess the one I’m fixing is not good, because when comparing this model is worse than my actual model.

This is my complete actual model, just in case:

with pm.Model() as model_t:
    y_obs = pm.MutableData('Y',y_prior,dims='observation')
    X = pm.MutableData('X',x_prior,dims='observation')
    lam = pm.MutableData('lam_test',kk[0])
    intercept = pm.TruncatedNormal('Intercept', mu=mu_B, sigma=std_B,lower=0)
    slope = pm.Normal('Slope', mu=mu_A, sigma=sigma_A)
    sigma = pm.HalfNormal('sigma', sigma=sigma_y)
    ν_ = pm.Exponential('ν_', 1/lam)
    ν = pm.Deterministic('ν', ν_ + 1)
    y_model = pm.Deterministic('y_model',slope * X + intercept,dims='observation')

    y = pm.StudentT('y', mu=y_model,
                         sigma=sigma, nu=ν, observed=y_obs)

    trace_3 = pm.sample(3500, tune=2000,chains=4, cores=2, random_seed=rng,idata_kwargs={"log_likelihood": True})
    prior_pred_3 = pm.sample_prior_predictive(samples=1000, random_seed=rng) 
    post_pred_3 = pm.sample_posterior_predictive(trace_3,extend_inferencedata=True,random_seed=rng)

Very very general comments here.

Be careful when comparing prior predictive distributions to observed data. That’s super-tricky, better to compare prior predictive samples to prior knowledge that observed data. Otherwise, you may be doing the Bayesian update yourself and setting the prior in your model to get close to the observed data. Priors should be informed by prior knowledge (either from your domain or from statistical/empirical knowledge). PreliZ offers some tools to explore priors (some of them still experimental), see for instance Predictive elicitation — PreliZ 0.3.6 documentation

It is very common that you will not have enough information to set the parameter nu. So in practice people use default priors like exponential 1/30 or something like pz.Gamma(2.7, 0.05) to put less weight on too low values. And usually, those priors are good enough.

If you want to avoid values below zero you can use a distribution that is restricted to positive values or choose a distribution and truncate it at 0. You can also sometimes restrict the intercept and slope in order to favor positive values. And you can set the value of sigma for the student t to be in the order of what you expected, like halfnormal(300) which will put 95 of the mass for values below ~600. Usually, this quick guessing will be enough to set priors. If you want to spend more time setting priors then you could use tool like those offered by PreliZ.


@aloctavodia Thank you for your insightful comments. I appreciate the guidance on avoiding the comparison of prior predictive distributions directly to observed data. I will stop trying to force my priors and let it go! I’ll also keep the default parameters for the t-student distribution as an exponential with 1/30. And thanks for pointing me toward resources like PreliZ for further exploration of priors; I’ll definitely check it :blush:

1 Like