Beta distribution model gives initial evaluation error with scaled observed values

matsuo_basho · March 20, 2022, 10:20pm

I’m building a model where posterior is a beta distribution.

To simplify the modeling, I hardcoded the parameters of the beta distribution based on what I saw from actual data. I had initially tried specifying the raw data in the observed,but saw in this post that in lieu of specifying the loc and scale parameters of beta, it’s correct to scale the data to [0,1]. However, this does not seem to be making a difference.

I also actually have specifications for the alpha and beta priors based on a Laplace distribution, but it appears that I need to address this issue first before specifying the priors.

data = np.array([1.726567e+06, 1.589836e+06, 1.643981e+06, 1.584314e+06])

data_norm = (data - min(data)) / (max(data) - min(data))

with pm.Model() as model:
    pred = pm.Beta("Response", alpha = 1.05, beta = 0.83, observed = data_norm)
    trace = pm.sample(1000, tune=800)

I get the following error:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{}

Initial evaluation results:
Input_var NaN

ericmjl · March 20, 2022, 10:36pm

From what I know, the Beta distribution models numbers in the interval (0, 1). Your array, data, contains data points in the millions scale, not in the (0, 1) range. That’s my first guess as to why you are getting NaN values as part of the error - the log likelihood of 1.72e6, under a Beta distribution, is undefined.

matsuo_basho · March 20, 2022, 10:43pm

I normalized the data to the [0,1] range and am using that vector to specify the posterior.

ericmjl · March 20, 2022, 11:09pm

Ahhh, you’re right! I’m sorry, I wasn’t reading carefully when I responded.

I think the PyMC implementation of the Beta distribution is undefined at 0 and 1. If you apply a clip transform:

data_norm = np.clip(data_norm, 0.01, 0.99)

then sampling might work without issues.

matsuo_basho · March 20, 2022, 11:15pm

Ok, that seems to have done the trick partly, thank you. I now get a

ValueError: The model does not contain any free variables.

which kind of makes sense given that pymc3 has nothing to sample. Interestingly, I uncommented my priors (but still leave the alpha and beta parameters hardcoded), and then it ran.

I think I will update the question entirely to address the next error I get, which occurs when I actually specify my priors.

OriolAbril · March 20, 2022, 11:53pm

Please don’t, one of the goals of discourse is to store the conversations. If the question you asked in the first post has been solved, you should mark it as solved and open a new topic with a new question. If the question is still not solved and needs further back and forth, it should happen in comments so the conversation is stored. This will allow people having the same question im the future to read the thread and follow it, hopefully finding the answer to their question too.

matsuo_basho · March 21, 2022, 10:53pm

Follow-up question is that is related to the answer: once I’ve sampled from the posterior predictive, how do I transform back to the values in the original form? The problem is that I scaled based on only 4 values, which constrains the distribution of all my posteriors within the minimum and maximum of those values. It’s not really plausible to have values only within this range, so I may need to use another strategy aside from scaling to the [0,1] range.

Topic		Replies	Views
Unable to fit Beta model to my data: SamplingError: Initial evaluation of model at starting point failed! v5 modeling	2	523	August 17, 2022
Specifying loc and scale on beta prior requires bounds, gives error version agnostic modeling	9	610	April 1, 2022
Getting beta distribution subclass with loc and scale parameters to work version agnostic	4	646	April 12, 2022
Beta distribution failing for missing value imputation? Questions	7	654	December 21, 2021
Initial evaluation of model at starting point failed! v5 sampling	10	3003	August 17, 2023

Beta distribution model gives initial evaluation error with scaled observed values

Related topics