Applying a selection function

Hello everyone,

I have a complicated problem to solve which involves applying a selection function, and I currently can’t figure out a way of doing this in PyMC. To give a little more context, I am working with a sample that was selected according to a given observed property. I know that I am not observing every possible value because many existing cases fall below the sensitivity of my experiment, and I need to correct for the missing data. To correct for this I have a selection function, which gives me the probability of selecting a value in my sample as a function of the said value (the selection function looks like an error function).

What I want to do is to draw some values from a distribution and then accept or reject the drawn values according to a given probability that depends on the drawn value (the selection function described above). I tried to do this with “switch” but with no luck. How could I implement that?

Can you share some numpy code that generates data according to your model?

Hi @ricardoV94

Sure. Thanks for your answer.

Sounds like you have an upper truncated observation process. To handle it correctly you’ll need to figure out the CDF of your likelihood. Each observation would then need to be rescaled by the cdf up to the truncation point.

You can check the pdf description of Truncated distributions in PyMC: pymc.Truncated — PyMC dev documentation

If you click on source and scroll down you’ll see how its implemeted in truncated_logprob

Thanks for the answer @ricardoV94 . The issue I see here is that this puts a sharp cut below and above some pre-defined value, whereas in my case I would like to implement a gradual cut as shown in the above code

The truncation point does not need to be fixed, it can also be an unobserved variable with an arbitrary prior, which I think represents your model.

Whether the model is identifiable or not I am not sure.

If every observation has an independent truncation point you might not have enough information to determine your parameters. If they share a common prior/hyper-parameters then it might be fine. You should be able to figure this out with parameter recovery experiments.