How to model Beta distribution with a given prior

huwendi2006 · May 13, 2020, 8:27am

Hi, I have a data list like this:
data = [0.0152, 0.0075, 0.0095, 0.0071, 0.0017, 0.0038, 0.005 , 0.0015,
0.0042, 0.0014, 0.0022, 0.0011, 0.008 , 0.0055, 0.0027, 0.0011,
0.0013, 0.0035, 0.0024, 0.0012, 0.0015, 0.0009, 0.0027, 0.0029,
0.0034, 0.0009, 0.0048, 0.0031, 0.0022, 0.0026, 0.0033, 0.0037,
0.0025, 0.0017, 0.0016, 0.003 , 0.0012, 0.0024, 0.0028, 0.0031,
0.0021, 0.0038, 0.0025, 0.0012, 0.0014, 0.0049, 0.0014, 0.0014,
0.0014, 0.0008, 0.0009, 0.0012, 0.0023, 0.002 , 0.002 , 0.0015,
0.001 , 0.001 , 0.0023, 0.0015, 0.0025, 0.0014, 0.001 , 0.0008,
0.001 , 0.0013, 0.002 , 0.0013, 0.0017, 0.0023, 0.002 , 0.0008,
0.0011, 0.0008, 0.0014, 0.0013, 0.0018, 0.0013, 0.001 , 0.0023,
0.0024, 0.001 , 0.0008, 0.0013, 0.0015, 0.0014, 0.0012, 0.0008,
0.0009, 0.0009, 0.0009, 0.0012, 0.0012, 0.0015, 0.0015, 0.0016,
0.0009, 0.0008, 0.0009, 0.0017] all data is close to zero, my prior is the worst data should be less than 0.001, I want to use model like this:

with pm.Model() as model_beta:
    mu = pm.Uniform('mu',0, 0.001)
    sd = pm.Uniform('sd',0, 1)
    y = pm.Beta('y', mu=mu, sd=sd, observed=data)
    trace_beta = pm.sample(1100)
chain_beta = trace_beta[100:]

pymc3 return a Bad initial energy…
how to set the prior properly? Thanks…

AlexAndorra · May 14, 2020, 8:35am

Hi,
I think your the Beta will have a hard time modeling your data, as they are super close to zero, which the Beta’s lower bound.
Maybe I’d transform the data so that they span a larger range, like [-1, 1], and use a Normal likelihood. You can also see if log-transforming or exponentiate the data would help?

huwendi2006 · May 15, 2020, 2:38am

Thanks AlexAndorra,
all the data I have are very close to zero like I give, or very close to 1 like this:
[0.9985, 0.9987, 0.9993, 0.9989, 0.9993, 0.9991, 0.9993, 0.9983,
0.9983, 0.9995, 0.9991, 0.9991, 0.9992, 0.9995, 0.9993, 0.9995,
0.9995, 0.9992, 0.9994, 0.9992, 0.9995, 0.9992, 0.9994, 0.9995,
0.9993, 0.9992, 0.9993, 0.9993, 0.9993, 0.9993, 0.9993, 0.9995,
0.9983, 0.9995, 0.9995, 0.9995, 0.9995, 0.9966, 0.9992, 0.9992,
0.9992, 0.9992, 0.9991, 0.9991, 0.9992, 0.9994, 0.9991, 0.9989,
0.9986, 0.999 ]
I tried student t first, but since I don’t konw how totruncate t or normal between [0, 1], as for logit, my data contain lots of 0 or 1… so I chose beta distribution
log-transforming seems not useful in my case, it still has boundary.

AlexAndorra · May 15, 2020, 9:29am

I’m not sure the problem is the boundaries per se, but it’s because 0 and 1 are the limits of the Beta’s support, so I’m guessing the sampler will have a hard time there, especially since your data are very close to each other (it will be very hard for the sampler to distinguish the difference between 0.9992 and 0.9991 for instance).
As I said above, something I’d try is transforming the data so that they span a larger range, like [−1,1], and use a Normal likelihood. If you have two distinct populations, like it seems to be the case (one centered on 0, the other on 1), then extending to a Normal mixture could be worth inverstigating.

huwendi2006 · May 16, 2020, 2:37am

Hi Alex,
I use logit translate and t distibution now, not a good way but could work, thanks

Topic		Replies	Views
Beta distribution model gives initial evaluation error with scaled observed values version agnostic	6	695	March 21, 2022
LogitNormal vs. Beta vs. Logistic Questions	1	1051	August 15, 2018
Getting beta distribution subclass with loc and scale parameters to work version agnostic	4	645	April 12, 2022
Unable to fit Beta model to my data: SamplingError: Initial evaluation of model at starting point failed! v5 modeling	2	522	August 17, 2022
Zero and One inflated Beta Distribution Questions	5	1319	July 8, 2020

How to model Beta distribution with a given prior

Related topics