Help with modelling strictly positive real values

ni1s · August 29, 2017, 11:11am

Hi, I’m new to probabilistic programming and have a probably newbie question that I couldn’t quite figure out how to solve yet:

I’m modelling sales data based on a set of variables as a linear regression problem, and I based it on the ‘Robust Regression’ example. Like in the example I’ve used Normal distribution for the weights, and StudentT for the observed, and while this actually gives ok results, it’s not an accurate model since the sales numbers can only be positive or zero, so the sampling yields impossible traces.

I’ve tried reformulating it as a Poisson distribution but that didn’t work as well, and I’ve looked into censoring the data with pm.Bound/Potent, but I couldn’t quite figure out how that works.

So, long story short, how would you model regression coefficients and the observed variable that are bounded [0,+inf)?

For context, my simple model is below:

with model:
    # intercept
    alpha = pm.Normal('alpha', mu=0, sd=10)

    # coefficients for regression
    beta = pm.Normal('beta', mu=0, sd=10, shape=D)

    lam = pm.HalfCauchy('lam', beta=10, testval=1.)

    # Expected value of outcome
    mu = alpha + beta.dot(X_shared.T)

    Y_obs = pm.StudentT('Y_obs', nu=1, mu=mu, lam=lam, observed=Y_shared)

Thank you!

cshenton · August 30, 2017, 12:10am

Your Y_obs is distributed as student t, which permits any value in -inf, inf

You have a few different options:

Use a distribution that only permits positive values (inv gamma comes to mind), however this is pretty non standard, and any econometrician would balk at it.
Use a link function to map your output on -inf, inf to 0, inf, in this case, softplus would be an appropriate choice.

A good starting point for you would be a logistic regression, where the regression model generates normally/student-t distributed logits on -inf, inf, and the logistic function maps those logits to 0, 1, i.e. to probabilities. It’s the same idea here, except you’re mapping to a different support, using softplus instead of logistic.

Interpretation wise, think of your variable on the real axis as a latent variable that represents ‘sales potential’, positive values are just the predicted sales, but the model can express stronger and stronger degrees of ‘no one want to buy this’, then you’re transforming that latent ‘sales potential’ variable into an observes ‘sales’ variable.

junpenglao · August 30, 2017, 6:09am

You can also model it as a censored data, there is a detail example here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/censored_data.py

AustinRochford · August 30, 2017, 1:14pm

You could also standardize your data (subtract mean, divide by standard deviation), which would make it no longer all positive.

ni1s · September 5, 2017, 9:03am

Thank you for the suggestions! I will definetely look into aproaching it as a logistic regression problem. Right now I ended up using a Gamma distribution for the observed values which seem to work pretty ok.

Topic		Replies	Views
[quick conceptual question] shouldn't the lognormal distribution be used as the likelihood more often? Questions	5	1165	August 6, 2019
Bounding Output and Potential Values Questions	4	1532	June 20, 2018
Help with Censored Regression Questions	23	3615	January 24, 2021
How can I penalize if the model sample negative values? Questions	6	747	June 21, 2022
How to model observed percentages (bounded from 0 to 1) Questions	8	2744	January 3, 2018

Help with modelling strictly positive real values

Related topics