Beta Regression in pyMC3

bdyetton · August 29, 2017, 11:55pm

I am trying to fit a regression model to estimate a percentage, 'sleepEf' (scaled to 0:1) from a single predictor variable 'tst' in pyMC3. Given that a percentage is bounded, I believe I should use a beta distributed outcome variable with a logit transform. Here is my model in pyMC3:

with pm.Model() as pooled_model:
    b_intr = pm.Normal('b_intr', mu=0.8, sd=100 ** 2)
    b_tst = pm.Normal('b_tst', mu=0.0, sd=100 ** 2)
    
    model_err = pm.HalfNormal('model_err', sd=50)

    # Expected value
    y_est = pm.math.invlogit(b_intr + \
            b_tst * data['tst'])

    # Data likelihood
    y_like = pm.Beta('y_like', mu=y_est, sd=model_err, observed=data['sleepEf'])

I have made sure that the sleepEf variable is never exactly 0 or 1. Here is a histogram of it:

When I try to fit the MAP estimate and then sample I get errors:

with pooled_model:
    start = pm.find_MAP(fmin=sp.optimize.fmin_powell)
    hierarchical_trace = pm.sample(2000, step=pm.Metropolis(), start=start, tune=1000)

ValueError: Optimization error: max, logp or dlogp at max have
non-finite values. Some values may be outside of distribution support.
max: {‘b_intr’: array(3.3879289615458417), ‘b_tst’:
array(2.587928961545842)} logp: array(-inf) dlogp: array([
-2.58792896e-08, -2.58792896e-08])Check that 1) you don’t have hierarchical parameters, these will lead to points with infinite
density. 2) your distribution logp’s are properly specified. Specific
issues:

Have I incorrectly specified the model?

Also note that if I do not find_MAP before sampling, my parameter values never change from their prior values.

Thanks!

I have also asked this question here:

junpenglao · August 30, 2017, 6:31am

The problem is that the model_err is too large - after internal transform the alpha and beta in the pm.Beta become negative, which is outside of the range of support.

You can check the value of mu and sd by doing:

y_est.tag.test_value
model_err.tag.test_value

And transform the mu and sd to alpha and beta by doing:

kappa = mu * (1 - mu) / sd**2 - 1
alpha = mu * kappa
beta = (1 - mu) * kappa

This is what pymc3 does internally.

aseyboldt · August 30, 2017, 7:51am

Parameterizing the beta distribution via mu and sd is always a bit of a mess. It might work much better if you logit transform your dataset instead, and then use a normal likelihood.

bdyetton · August 30, 2017, 5:45pm

Thanks guys! i got it to work with a smaller sd.

@junpenglao, maybe there should be something in the documentation on the limits of sd, (currently it only states sd > 0). Is this the kind of thing i would open a github issue for?

@aseyboldt, i think im going to stick with the beta regression because the units of my data are readily interpretable and its working now. However, im curious: i believe log transforming the data would solve the bounds issue on the mean, but would the model error be accurate when Normal(mu>0.8 or mu<0.2, sd=model_err), i.e. when mu gets close to the bounds?

junpenglao · August 30, 2017, 8:33pm

Yeah: a beta distribution with mean mu, the var should be smaller than mu(1 - mu). A PR to edit the docstring and also a better check of this would be great.

bdyetton · August 30, 2017, 9:59pm

PR is here: https://github.com/pymc-devs/pymc3/pull/2534
I just edited the docstring.

aseyboldt · August 31, 2017, 3:46pm

@bdyetton Thanks for the PR!

Just for clarification: I didn’t mean a log transform, but a logit transform.
I’m not sure what “accurate” means in this context, a logit-normal model is a bit different from the beta distribution, but unless you have a good reason why the beta would somehow be the right one, I don’t see a clear advantage to using it. They are in fact quite similar, a normal distribution and the logit of a beta aren’t that different, the beta has a bit longer tails. But the interpretation of mean and std in the logit-normal case are I think a bit more natural.

ericmjl · September 18, 2017, 3:21pm

Just chipping in with a “hooray!!!” for a PR to PyMC3, @bdyetton!

Topic		Replies	Views
How to model observed percentages (bounded from 0 to 1) Questions	8	2737	January 3, 2018
Zero One Inflated Beta Regression Questions	14	1152	January 26, 2024
Struggling with Beta mixture models Questions	3	1828	February 2, 2019
Fitting a Beta Regression Questions	3	1766	September 19, 2018
Beta-Binomial conjugate prior -- pm.Binomial buggy results...? Questions	9	839	November 3, 2021

Beta Regression in pyMC3

Related topics