Random variable as observation

ChrisAichinger · January 12, 2022, 2:39pm

I want to model Bayesian updating in PyMC3, but with a probability distribution for the observations, not actual observed data.

E.g. for some parameters N (sample count), \alpha, and \beta (prior distribution parameters), we know the analytical expression for updating a Beta prior with Binomial data:

X \sim B(N, \theta_{true})
\theta_{prior} \sim Beta(\alpha, \beta)
\theta_{post} = Beta(\alpha + X, \beta + N - X)
How would I simulate this update step in PyMC3 if I don’t know an analytical expression for \theta_{post}?

I would like to do something like the following:

with pm.Model() as model:
    obs_distribution = pm.Binomial("X", N, theta_true)
    theta_ = pm.Beta("theta", alpha, beta)
    obs_sampled = pm.Binomial("y", N, theta, observed=obs_distribution)

But of course this doesn’t work, since observed= needs a concrete value list, not a random variable.

cluhmann · January 12, 2022, 4:41pm

Welcome!

Could you say more about the scenario you are envisioning? You state that you don’t have “actual observed data”. Can you/do you want to simulate data from the binomial?

ChrisAichinger · January 12, 2022, 8:21pm

Sorry for the confusing question. I do want to simulate data from the binomial.

The question was conflating two concepts:

Modeling outcomes for one random sample from B(N, \theta_{true}). In this case I can just draw a sample from that distribution, and plug that sample into observed=sampled_data. That’s not what I want.
Marginalizing out the binomial RV to obtain samples for \theta_{posterior} | \theta_{true}, N, \alpha, \beta. This is what I’m interested in, but I’m not sure how to model that with pymc3.

For my problem, I can get the marginalized posterior distribution above as \sum_{i=0}^N P(\theta | X=i, N, \alpha, \beta) P(X=i | N, \theta_{true}). But first, that sum is cumbersome, and second, that approach falls apart if the posterior is not analytically tractable.

So my question is: how should I model my problem so that pymc3 generates samples from \theta_{posterior} | \theta_{true}, N, \alpha, \beta? I’m specifically interested in an approach that will work even if I’m not working with a conjugate prior.

cluhmann · January 12, 2022, 9:58pm

I think this sort of thing could work (I think). Note that the conjagacy is ignored/irrelevant here.

import arviz as az
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt
rand_stream = tt.random.utils.RandomStream(1234)

theta_true = 1/3
N = 20
alpha = 1
beta = 1
with pm.Model() as model:
    credible_data = rand_stream.binomial(N, theta_true)
    theta = pm.Beta("theta", alpha=alpha, beta=beta)
    y = pm.Binomial("y", N, theta, observed=credible_data)
    trace = pm.sample(return_inferencedata=True)
print(az.hdi(trace))
az.plot_posterior(trace)
plt.show()

There could be other, potentially more straightforward, ways of doing this. But that might get you started.

ricardoV94 · January 13, 2022, 6:19am

Well, MCMC is all about doing integration without analytical expressions. The (histogram) marginal distribution of a parameter x given a well sampled posterior is found in trace["x"] (that is, the sampled values of x ignoring all other variables)

Are you trying to do MCMC inside MCMC? I couldn’t quite follow your idea

ChrisAichinger · January 13, 2022, 1:52pm

Let’s say I have a model: \theta \sim Beta(\alpha, \beta) and X_{model} \sim Binom(n, \theta). Now, if have some concrete observations for X, I can pass these into pm.Binomial(..., observed=x_obs). Easy.

But what if I don’t have concrete values for X, but I know that it’s in reality sampled from another distribution X_{obs} \sim Binom(m, \theta_{true}). I can’t (easily, see @cluhmann’s answer) pass a probability distribution into observed=.

ChrisAichinger · January 13, 2022, 1:58pm

@cluhmann: Thanks a lot for your answer. Seems to be working nicely

I don’t quite understand how it is working yet. I guess that that under the hood pymc3 / theano recognizes that RandomStream is not a tensor, but a tensor-generating-function? I have never worked with pymc3 or Theano before, so I don’t have a good grasp of the mechanics here.

cluhmann · January 13, 2022, 3:30pm

There’s a bit more information about random streams in the documentation for aesara, the successor to theano (e.g., here and here).

ricardoV94 · January 13, 2022, 4:05pm

I have the feeling that if PyMC does anything sensible, it would give you back the prior distribution?

In any case, I would check the outputs very carefully, since you are totally in undefined PyMC behavior region. I have no idea what NUTS does when the logp and derivatives are changing at every evaluation

jessegrabowski · January 20, 2023, 8:04pm

Data aren’t random, because you already observed them. It doesn’t make sense to set random variables as observed.

MCMC also isn’t an optimization algorithm. You don’t find the maximum of anything, you’re sampling from a fixed model. The posterior distribution is exactly whatever you write down. The posterior distribution also is NOT the likelihood function, it’s the likelihood times the prior. You can set the likelihood as normal, but the posterior will be whatever it is, given the priors and data. If the posterior were simply a Student T, you wouldn’t need MCMC to figure out how to get samples from it!

Finally, what do you mean by “checking if samples lie within the student t distribution of the test data”? The Student T distribution has support on the whole real line, so everything lies within a student T distribution.

Topic		Replies	Views
Sampler issues on Beta prior Binomial likelihood v5 bug	5	386	December 8, 2022
The results of a switch are observed directly (beginner) Questions	1	906	November 11, 2018
Beta-Binomial conjugate prior -- pm.Binomial buggy results...? Questions	9	812	November 3, 2021
Can I avoid sampling irrelevant bits of the distribution? Questions	1	467	February 2, 2020
Is there any material gently explain how PYMC3 class/function works Questions	2	629	June 29, 2019

Random variable as observation

Related topics