Hi everyone,
I’m pretty new to PyMC3 and Bayesian computation in general, and am struggling with some of the
implementation details of a model I’m trying to build.
What I’m trying to accomplish, very concisely, is the following: given an array of betas defined like so
with pm.Model() as model:
betas = pm.Beta('betas', alpha=1.5, beta=1, shape=3)
Is it possible to set beta[3] to be equal to 1 exactly (i.e. not have it as a distribution)?
I actually have a second dimension to these betas, and depending on the value of the second dimension I would like to exclude some of the betas in the first dimension.
Is this doable? Or should I simply create a list of lists, and define each Beta distribution variable separately?
Thanks!
Omri
1 Like
If you’re setting Beta 3 to a point value then its not a distribution anymore like you said. In this case why not do
betas = pm.Beta('betas', alpha=1.5, beta=1, shape=2)
since it sound like you only have two distributions (and one constant)? Can you share what the rest of your model looks like?
Hi Ravin,
thank you for your reply! Actually I tried to put in the most concise part of the model, since the general question is about manipulating these arrays of distributions.
The reason I try to have them together, instead of separate, is that in my input data I have an index variable for each row that says which of these distributions should be used. I want to make it easy to use as in the hierarchical model examples in the pymc3 documentation, e.g. where a county_idx is used to specify which parameter fits with which row of the data. This is why I would actually like to be able to compose these arrays from different distributions, or sometimes constant values.
I may even want to set some of these distributions to have different initial parameters.
Let me then refine the question: how can I get an array of distributions which can then be accessed using an index, as is done in the Radon example in the documentation?
To be specific, assume I have the following data and model:
data = pd.DataFrame(dict(a=[1,2,3,0,2,3,1,...]))
with pm.Model() as model:
alpha0 = pm.Beta('alpha0', alpha=1, beta=1.5)
alpha1 = pm.Normal('alpha1', mu=0, sigma=4)
alpha2 = pm.Normal('alpha2', mu=0, sigma=2)
alpha3 = pm.Deterministic('alpha3', 1)
# How can i do now something like:
alphas = [alpha0, alpha1, alpha2, alpha3]
# such that I can then do
alpha = alphas[data.a]
I hope this clarifies it a bit. The reason being, I will probably have different domain knowledge for each of the alphas, which may allow me to set some of them to constants, or have tighter prior distributions, but they all are mutually exclusive based on the data.
Thanks!
Omri
You could initialize all distributions and then use a tt.switch
to replace the ones you don’t want by the constant value.
import theano.tensor as tt
with pm.Model() as model:
betas = pm.Beta('betas', 1, 1, shape=n)
used_betas = tt.switch(index_expression, betas, 1)
Where index_expression
is a variable that evaluates to 1/True for the betas you want to use and 0/False for those you want replaced by the constant of 1. You can also use different constants for each replaced beta but then you need a vector as large as the size of betas
.
Theano might actually be clever enough to know your switch is constant and not even include the non-used betas in the final graph expression.
1 Like
Oh I see than in your last example you actually want to reuse the variables? You have to be careful to make sure it is doing what you think it does. If you stack 3 variables as in:
dists = pm.math.stack([alpha1, alpha2, alpha3])
used = dists[np.array([0, 0, 0, 1, 1, 2, 2, 2])
keep in mind that the used distributions will not be “copies” of each other but the exact same distribution used repeatedly. This means you cannot infer different values for used[0:3], because they are exactly the same object.
Finally you can use a very narrow prior instead of having a real constant.
with pm.Model() as m:
betas = pm.Beta('betas', alpha=[1, 1, 1, 100, 1], beta=[1, 1, 1, 0.1, 1], shape=5)
Your fourth beta will behave basically as a constant of 1. This might be a bit though for the sampler though.
Hi Ricardo,
thanks for the thorough responses! I am especially intrigued by the stack option you presented. I am aware that it is then the same object being matched - that’s exactly what I want. I assume that the effect of each alpha is the same, and I wish to infer that alpha based on the data. It is in this respect exactly the same as in the radon hierarchical model example, where the county effect is the same effect, not different copies of it. At least, that’s what I think it is.
what I was missing was how to stack them, and it seems that pm.math.stack
is the way to go for me. I’ll check it out and report back in case I have more questions.
Thank you very much for the help!
Omri