Applying a selection function

deckert · September 16, 2022, 3:17pm

Hello everyone,

I have a complicated problem to solve which involves applying a selection function, and I currently can’t figure out a way of doing this in PyMC. To give a little more context, I am working with a sample that was selected according to a given observed property. I know that I am not observing every possible value because many existing cases fall below the sensitivity of my experiment, and I need to correct for the missing data. To correct for this I have a selection function, which gives me the probability of selecting a value in my sample as a function of the said value (the selection function looks like an error function).

What I want to do is to draw some values from a distribution and then accept or reject the drawn values according to a given probability that depends on the drawn value (the selection function described above). I tried to do this with “switch” but with no luck. How could I implement that?

ricardoV94 · September 16, 2022, 7:47pm

Can you share some numpy code that generates data according to your model?

deckert · September 17, 2022, 8:34am

Hi @ricardoV94

Sure. Thanks for your answer.

ricardoV94 · September 17, 2022, 10:31am

Sounds like you have an upper truncated observation process. To handle it correctly you’ll need to figure out the CDF of your likelihood. Each observation would then need to be rescaled by the cdf up to the truncation point.

You can check the pdf description of Truncated distributions in PyMC: pymc.Truncated — PyMC dev documentation

If you click on source and scroll down you’ll see how its implemeted in truncated_logprob

deckert · September 18, 2022, 4:20pm

Thanks for the answer @ricardoV94 . The issue I see here is that this puts a sharp cut below and above some pre-defined value, whereas in my case I would like to implement a gradual cut as shown in the above code

ricardoV94 · September 18, 2022, 4:33pm

The truncation point does not need to be fixed, it can also be an unobserved variable with an arbitrary prior, which I think represents your model.

Whether the model is identifiable or not I am not sure.

If every observation has an independent truncation point you might not have enough information to determine your parameters. If they share a common prior/hyper-parameters then it might be fine. You should be able to figure this out with parameter recovery experiments.

Topic		Replies	Views
Modeling with soft truncation version agnostic modeling	6	389	December 24, 2023
Truncated Inverse normal distribution (also known as Wald distribution) Questions	15	2903	April 27, 2018
Modelling a distribution with a portion removed from it v5 modeling	5	388	February 23, 2024
Truncated normal distribution with optimal truncation thresholds via DensityDist Questions	8	3017	June 21, 2018
Is this a censored-data problem or not? version agnostic	6	269	October 27, 2023

Applying a selection function

Related topics