Generating data from marginalized model with fixed value for the parameter of interest

Hello,

I’m new to PyMC so I apologize in advance if my question is trivial or if it has already been documented somewhere (I took some time to do a search but it was not successfull, hence my message on this forum).

I’d like to do some inference in the context of radioactivity measurement. In its simplest form, the model I want to work with is the product of two poisson probabilities:

As you can see, there are :

  • two observables : Nb and Ng
  • three parameters : a, b and w. The parameter of interest is a. The two other parameters are nuisance parameters.

The priors I’m considering are:

  • uniform priors for b and a
  • normal prior for w: w~Normal(mu=4.1,sigma=0.6)

My attempt at implementing this model is shown in the attached script. Running this script gives a good result for the posterior distribution on my parameter of interest a (by “good result” I mean that the result is in agreement with what I obtain with independent softwares not based on PyMC). As I’m new to PyMC this script might however not be written in its most optimal form, so don’t hesitate to correct it if you see that something is weirdly implemented :slight_smile:

What I’d like to do now is :

  1. To draw the observable Ng from the model marginalized over b and w, and with a fixed value for the parameter of interest a. That is, I want to draw Ng from this probability law:

 At the end of this step, the goal is to have a rather large sample of values of Ng: {Ng}.
  1. Once step 1 is done, I’d like to use the sample {Ng} instead of the single value ng=21670 as the “observed data” in my model, and repeat the inference for a.

Any help regarding these questions would be greatly appreciated.

Best regards,
Emmanuel

toymodel.py (800 Bytes)

I don’t quite understand your motivation but if you want to treat random variables as observables, there is no direct way to do it. One way people seem to have done it is using potentials:

I don’t know if there are any new methods to do this. For what you are asking, are you imagining a situation in which you are trying to do all of this on top of the old model?

with pm.Model() as model:
   previous models priors (b,a,w) and likelihoods (P1, P2)
   RV = sampled random observables from some distribution that depends on a
   P1(Ng=RV | a,b,w) likelihood

If this is what you want to do, in my very limited Bayesian modelling experience, this looks like a very unexpected form for a model.

On the other hand if you want to sample from a distribution first and then use that “static” data to train your model, I imagine that wouldn’t be very different from your first model (modulo the details of construction that distribution)? Is the question how to construct Pmarginal?

Hello,

I’ll try to explain my motivations more clearly.

I’d like to know what the data Ng would look like under the hypothesis that the parameter a is equal to some value (for example a=0, which for me corresponds to the “background only hypothesis”, or null hypothesis let’s say). In my understanding of bayesian statistics, this should be done by building a “marginal model” with a=constant, where marginalization is done over the nuisance parameters.

If possible, I’d like to build this marginal model from the model that I already have (given in the script I attached in my first message).

To me, this “marginal model” thus describes the law of probability of Ng given a (it’s what I called Pmarginal). So yes, my question is essentially on how to construct Pmarginal by fixing the value of some parameters. In other words, how do we marginalize over some parameters in PyMC while fixing the value of other parameters ?

Once I have this marginal model, I’d like to generate the Ng observable from it and then inject this new set of Ng values (thus generated under the hypothesis a=constant value) as the observed value is the non-marginal model. This new set of Ng value is what we could call a pseudo-dataset under the null hypothesis.

The ultimate goal is to have a posterior like this:

P(a|Ng(a=0))

where Ng(a=0) represents my null hypothesis.

Perhaps do is the tool for the job?

The functionality is already in PyMC

https://www.pymc.io/projects/docs/en/stable/api/model/generated/pymc.model.transform.conditioning.do.html

I dont think I know (in my limited experience) how to do it starting from the model that you already have but if I understand your question I would simply build Pmarginal myself as histogram and then sample data from there. Never used Ricardo’s suggestion the do calculus though that also seems like an interesting option. In more detail:

1- Since you know the exact forms of P1,P2 and your priors are uniform (they are basically just setting boundaries for your integrals), I would check if the integral you have given can be calculated exactly (if you are lucky you might get something in terms of incomplete gamma functions). If not I would integrate it numerically. In both cases you get a density (up to scale) for Pmarginal depending on ng. Using that density as an histogram you can generate random Ng observables, see for instance:

2- Then you use these randomly generated observables just as you have used ng in your model. This being said, I have never attempted such a thing myself so I don’t have any feeling on how the resolution you choose for your integration and histogram sampling would affect the results you would get, so that should be something that is also tested.

ps:
Note that in pymc_experimental there is marginalization over finite discrete parameters

https://www.pymc.io/projects/experimental/en/latest/generated/pymc_experimental.MarginalModel.html

but I suspect that would not be sufficient for you since you want to marginalize over continuous parameters. But still putting it here in case you find it useful somehow.

Thank you all for you answers and ideas. I’ll give a look at your suggestions in the next days and get back to you once I figured out how to do this.