Hello, I recently purchased Allen Downey’s Think Bayes - I’m working through Chapter 20 - Counting Cells (the chapter can be opened in Google Colab here). There is a question I’m pondering that affects some of my models. It seems that we do not want all random variables to be adaptive (i.e., have changeable parameters). For example, in the snippet below, we want yeast_conc
to be adaptive because that is what we are inferring. But we do not want shaker1_vol
to be adaptive because it is measured outside of the model; it is a stochastic, but we don’t want mu
and sd
to be changed by sampling.
with pm.Model() as model:
yeast_conc = pm.Normal("yeast conc",
mu=2 * billion, sd=0.4 * billion)
shaker1_vol = pm.Normal("shaker1 vol",
mu=9.0, sd=0.05)
My question is two-part, one related to this example, and one general:
- For anyone who has worked this example, do you agree that
shaker1_vol
should be a frozen/ non-adaptive distribution rather than a random variable, as presently coded?
- How would we go about coding a non-adaptive stochastic variable into a model in pymc3? I considered adding a hierarchical layer where
mu
and sd
are really narrow truncated priors to constrain shaker1_vol
. But maybe there is a better way if this is a common problem.
Thanks!
Greg
Haven’t looked through this particular example. so take this for what it’s worth. If shaker volume is measured, then you can simply use the measured values. No need for a parameter at all. The fact that volume is/was a random variable is not relevant if you have realized values of that variable (unless I am misunderstanding something).
Thanks for the response! Yes, the shaker volume is measured (at 9.0), but there is measurement error, so the shaker1_vol
variable is included in the model to add noise (~0.5%). So another way to frame my question is whether there is a typical way to add a set amount of noise to a model. From my understanding, the code above will allow the “noise term” to drift during sampling and might hone in on more or less noise than 0.5%. Maybe I’m missing something, though. . .
So I guess it depends on what exactly you mean by this. Are the measurements assumed to reflect/include this noise? If so, how do you know the noise is normally distributed? How do you know what the mean and standard deviation of this noise is? If these questions are answered using prior knowledge rather than data, then you would typically allow the data to inform the answers to these questions. In other words, you can’t know what the answers to these questions are, so you are uncertain, so you shouldn’t be particularly surprised when your posterior differs from your prior. You might, for example, learn that the data implies a SD that is much tighter than you previously expected.
Sorry for the slow response - I think you would need to look at the Downey example for this to make sense (Colab link above). He is fitting a single data point with a bunch of priors based on the precision of laboratory procedures, and the goal is to calculate the error. I think that the pm.Normal("shaker1 vol", .. . )
variable could be a misspecification because mu
and sigma
can drift during sampling, but they should be fixed based on the setup of the question.
Could be. But I read the model as implying that the volumes of the shakers and the samples from each shaker are uncertain quantities. So the “specification” (prior) suggests that these quantities are normally distributed with a specific mean and SD, but the modeler is open to the possibility of data altering any of these beliefs.
I still have not figured out if it is possible to “freeze” a distribution so that the posterior is the same as the prior. However, I explored how a careful specification of priors can help. I added a notebook here.
Check out @junpenglao’s answers in this thread. Does that give you what you need?
Thank you Christian. That thread with the updated API yields the following code, which will fix the shaker volume variables so they do not update:
import theano.tensor as tt
rand_stream = tt.random.utils.RandomStream()
with pm.Model() as model:
yeast_conc = pm.Normal("yeast conc",
mu=2 * billion, sd=0.4 * billion)
shaker1_vol = pm.Deterministic("shaker1 vol",
rand_stream.normal(9.0, 0.05))
shaker2_vol = pm.Deterministic("shaker2 vol",
rand_stream.normal(9.0, 0.05))
shaker3_vol = pm.Deterministic("shaker3 vol",
rand_stream.normal(9.0, 0.05))
I added similar code to an example in the notebook link (previous post) in case others need to solve the same problem.
Thanks,
Greg
2 Likes