Unknown Random Variable

Hi all,

This is perhaps a non-standard question. I am building a model, and there is an unknown parameter Z. I have placed a prior over Z \sim N(0, 1). The model has several other parameters, \theta.

Usually, inference would compute p(Z, \theta | \mathcal{D}). However, even though statistically \mathcal{D} might provide evidence about the value of Z, I want to force the posterior to be the same as the prior, or alternatively, I want to compute p(\theta | \mathcal{D}, Z \sim N(0, 1)). The reason that I want to do this is that the model is mis-specified, and so I don’t think the evidence should be used to choose Z.

If I build my model as usual, the posterior on Z will not be equal to the prior because \mathcal{D} does provide evidence. The most naive way of doing this would be to sample from p(Z), compute p(\theta | \mathcal{D}, Z) and average them. This would require many runs of NUTS, and so I was wondering if there was a way to do this using PyMC3?

I think this amounts to somehow removing Z from the computational graph, but I’m not sure how to do this,

Alternatively:

p(\theta | \mathcal{D}) = \int p(\theta, Z | \mathcal{D}) dZ = \int p(\theta| \mathcal{D}, Z) p(Z | \mathcal{D}) dZ,

I set p(Z | \mathcal{D}) = p(Z) i.e., I want to discard the evidence that \mathcal{D} provides about Z. Therefore,

p(\theta | \mathcal{D}) = \int p(\theta | \mathcal{D}, Z) p(Z) dZ,

which can be compute by sampling Z from the prior, performing NUTS sampling for that value, and then averaging this across a bunch of runs. I just want to do this by doing NUTS sampling once.

Thanks!

Perhaps this is trivial and not very useful but if you run NUTS and come up with samples of both \theta and Z, then if you simply ignore the samples of Z then the resulting samples of \theta will be drawn from the marginal distribution p(\theta \vert D) and will not be specific to any single value of Z. This marginal distribution will be affected by choice of priors via the integration you listed however, so using a N(0,10) prior will give differing results from N(0,1).

Also, note that sampling Z, fixing its value in the model, and running NUTS for each of these sampled values should give you the same results as just running NUTS once. If you do the former, you’re mixing MCMC and vanilla Monte Carlo but you won’t be changing anything fundamental about the target distribution. Edit - not right.

Hi Chris,

Thanks for the input.

You are right that ignoring Z amounts to having samples from p(\theta | \mathcal{D}). However, this marginalises over the posterior of Z, rather than the prior over Z, so this isn’t the integral that I actually want to perform.

I don’t believe that the sampling procedure that I suggested and running MCMC should give the same results, because of the above issue.

Interesting. You’re right but I’m not quite sure how to get what you want. For anyone else as confused as me, you can reproduce the desired result with the following code to see that the resulting distributions are different.

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

prior_draws = np.random.randn(100)
traces = []

z_observation = 2.0

for mu in prior_draws:
    with pm.Model() as individual_model:
        y = pm.Normal('y', mu=mu)
        z = pm.Normal('z',mu=y,observed=z_observation)
        traces.append(pm.sample(progressbar=False))
        
individual_trace = np.concatenate([trace['y'].ravel() for trace in traces])
        
with pm.Model() as combined_model:
    x = pm.Normal('x')    
    y = pm.Normal('y', mu=x)
    z = pm.Normal('z',mu=y,observed=z_observation)

    combined_trace = pm.sample(draws=20000)
    
plt.hist(individual_trace,bins=100, density=True,alpha=0.5)
plt.hist(combined_trace['y'],bins=100, density=True,alpha=0.5);