# How to model Beta distribution with a given prior

Hi, I have a data list like this:
data = [0.0152, 0.0075, 0.0095, 0.0071, 0.0017, 0.0038, 0.005 , 0.0015,
0.0042, 0.0014, 0.0022, 0.0011, 0.008 , 0.0055, 0.0027, 0.0011,
0.0013, 0.0035, 0.0024, 0.0012, 0.0015, 0.0009, 0.0027, 0.0029,
0.0034, 0.0009, 0.0048, 0.0031, 0.0022, 0.0026, 0.0033, 0.0037,
0.0025, 0.0017, 0.0016, 0.003 , 0.0012, 0.0024, 0.0028, 0.0031,
0.0021, 0.0038, 0.0025, 0.0012, 0.0014, 0.0049, 0.0014, 0.0014,
0.0014, 0.0008, 0.0009, 0.0012, 0.0023, 0.002 , 0.002 , 0.0015,
0.001 , 0.001 , 0.0023, 0.0015, 0.0025, 0.0014, 0.001 , 0.0008,
0.001 , 0.0013, 0.002 , 0.0013, 0.0017, 0.0023, 0.002 , 0.0008,
0.0011, 0.0008, 0.0014, 0.0013, 0.0018, 0.0013, 0.001 , 0.0023,
0.0024, 0.001 , 0.0008, 0.0013, 0.0015, 0.0014, 0.0012, 0.0008,
0.0009, 0.0009, 0.0009, 0.0012, 0.0012, 0.0015, 0.0015, 0.0016,
0.0009, 0.0008, 0.0009, 0.0017] all data is close to zero, my prior is the worst data should be less than 0.001, I want to use model like this:

``````with pm.Model() as model_beta:
mu = pm.Uniform('mu',0, 0.001)
sd = pm.Uniform('sd',0, 1)
y = pm.Beta('y', mu=mu, sd=sd, observed=data)
trace_beta = pm.sample(1100)
chain_beta = trace_beta[100:]``````

pymc3 return a Bad initial energy…
how to set the prior properly? Thanks…

Hi,
I think your the Beta will have a hard time modeling your data, as they are super close to zero, which the Beta’s lower bound.
Maybe I’d transform the data so that they span a larger range, like [-1, 1], and use a Normal likelihood. You can also see if log-transforming or exponentiate the data would help?

Thanks AlexAndorra,
all the data I have are very close to zero like I give, or very close to 1 like this:
[0.9985, 0.9987, 0.9993, 0.9989, 0.9993, 0.9991, 0.9993, 0.9983,
0.9983, 0.9995, 0.9991, 0.9991, 0.9992, 0.9995, 0.9993, 0.9995,
0.9995, 0.9992, 0.9994, 0.9992, 0.9995, 0.9992, 0.9994, 0.9995,
0.9993, 0.9992, 0.9993, 0.9993, 0.9993, 0.9993, 0.9993, 0.9995,
0.9983, 0.9995, 0.9995, 0.9995, 0.9995, 0.9966, 0.9992, 0.9992,
0.9992, 0.9992, 0.9991, 0.9991, 0.9992, 0.9994, 0.9991, 0.9989,
0.9986, 0.999 ]
I tried student t first, but since I don’t konw how totruncate t or normal between [0, 1], as for logit, my data contain lots of 0 or 1… so I chose beta distribution
log-transforming seems not useful in my case, it still has boundary.

I’m not sure the problem is the boundaries per se, but it’s because 0 and 1 are the limits of the Beta’s support, so I’m guessing the sampler will have a hard time there, especially since your data are very close to each other (it will be very hard for the sampler to distinguish the difference between 0.9992 and 0.9991 for instance).
As I said above, something I’d try is transforming the data so that they span a larger range, like [−1,1], and use a Normal likelihood. If you have two distinct populations, like it seems to be the case (one centered on 0, the other on 1), then extending to a Normal mixture could be worth inverstigating.

Hi Alex,
I use logit translate and t distibution now, not a good way but could work, thanks