I want to model Bayesian updating in PyMC3, but with a probability distribution for the observations, not actual observed data.
E.g. for some parameters N (sample count), \alpha, and \beta (prior distribution parameters), we know the analytical expression for updating a Beta prior with Binomial data:
X \sim B(N, \theta_{true})
\theta_{prior} \sim Beta(\alpha, \beta)
\theta_{post} = Beta(\alpha + X, \beta + N - X)
How would I simulate this update step in PyMC3 if I don’t know an analytical expression for \theta_{post}?
I would like to do something like the following:
with pm.Model() as model:
obs_distribution = pm.Binomial("X", N, theta_true)
theta_ = pm.Beta("theta", alpha, beta)
obs_sampled = pm.Binomial("y", N, theta, observed=obs_distribution)
But of course this doesn’t work, since observed=
needs a concrete value list, not a random variable.
Welcome!
Could you say more about the scenario you are envisioning? You state that you don’t have “actual observed data”. Can you/do you want to simulate data from the binomial?
Sorry for the confusing question. I do want to simulate data from the binomial.
The question was conflating two concepts:
- Modeling outcomes for one random sample from B(N, \theta_{true}). In this case I can just draw a sample from that distribution, and plug that sample into
observed=sampled_data
. That’s not what I want.
- Marginalizing out the binomial RV to obtain samples for \theta_{posterior} | \theta_{true}, N, \alpha, \beta. This is what I’m interested in, but I’m not sure how to model that with pymc3.
For my problem, I can get the marginalized posterior distribution above as \sum_{i=0}^N P(\theta | X=i, N, \alpha, \beta) P(X=i | N, \theta_{true}). But first, that sum is cumbersome, and second, that approach falls apart if the posterior is not analytically tractable.
So my question is: how should I model my problem so that pymc3 generates samples from \theta_{posterior} | \theta_{true}, N, \alpha, \beta? I’m specifically interested in an approach that will work even if I’m not working with a conjugate prior.
I think this sort of thing could work (I think). Note that the conjagacy is ignored/irrelevant here.
import arviz as az
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt
rand_stream = tt.random.utils.RandomStream(1234)
theta_true = 1/3
N = 20
alpha = 1
beta = 1
with pm.Model() as model:
credible_data = rand_stream.binomial(N, theta_true)
theta = pm.Beta("theta", alpha=alpha, beta=beta)
y = pm.Binomial("y", N, theta, observed=credible_data)
trace = pm.sample(return_inferencedata=True)
print(az.hdi(trace))
az.plot_posterior(trace)
plt.show()
There could be other, potentially more straightforward, ways of doing this. But that might get you started.
Well, MCMC is all about doing integration without analytical expressions. The (histogram) marginal distribution of a parameter x
given a well sampled posterior is found in trace["x"]
(that is, the sampled values of x
ignoring all other variables)
Are you trying to do MCMC inside MCMC? I couldn’t quite follow your idea
Let’s say I have a model: \theta \sim Beta(\alpha, \beta) and X_{model} \sim Binom(n, \theta). Now, if have some concrete observations for X, I can pass these into pm.Binomial(..., observed=x_obs)
. Easy.
But what if I don’t have concrete values for X, but I know that it’s in reality sampled from another distribution X_{obs} \sim Binom(m, \theta_{true}). I can’t (easily, see @cluhmann’s answer) pass a probability distribution into observed=
.
@cluhmann: Thanks a lot for your answer. Seems to be working nicely 
I don’t quite understand how it is working yet. I guess that that under the hood pymc3 / theano recognizes that RandomStream is not a tensor, but a tensor-generating-function? I have never worked with pymc3 or Theano before, so I don’t have a good grasp of the mechanics here.
There’s a bit more information about random streams in the documentation for aesara, the successor to theano (e.g., here and here).
I have the feeling that if PyMC does anything sensible, it would give you back the prior distribution?
In any case, I would check the outputs very carefully, since you are totally in undefined PyMC behavior region. I have no idea what NUTS does when the logp and derivatives are changing at every evaluation
Data aren’t random, because you already observed them. It doesn’t make sense to set random variables as observed.
MCMC also isn’t an optimization algorithm. You don’t find the maximum of anything, you’re sampling from a fixed model. The posterior distribution is exactly whatever you write down. The posterior distribution also is NOT the likelihood function, it’s the likelihood times the prior. You can set the likelihood as normal, but the posterior will be whatever it is, given the priors and data. If the posterior were simply a Student T, you wouldn’t need MCMC to figure out how to get samples from it!
Finally, what do you mean by “checking if samples lie within the student t distribution of the test data”? The Student T distribution has support on the whole real line, so everything lies within a student T distribution.