Sum of two Binomial Distributions

ggluck · April 3, 2018, 3:51am

I’m relatively new to PyMC, so I might be making a very simple error. A little bit about the problem I’m trying to model. There is food that is potentially contaminated which is being tested. We know the false positive and false negative rates of the testing procedure. I’m trying to get some bounds on the actual number of contaminated samples.

n = 100 
y = 50
fn = .2
fp = .1
with Model() as food_model:
    positives = pm.DiscreteUniform("positives", lower = 0, upper = n)
    true_positives = pm.Binomial("true_positives", n = positives, p = (1 - fn))
    false_positives = pm.Binomial("false_positives", n = y - positives, p = fp)
    model = pm.Deterministic('model', true_positives + false_positives, observed = y)
    

with food_model:

    # Draw samples
    trace = sample(1000, njobs=2)
    # Plot parameter
    forestplot(trace, varnames=['positives'])

The error is:
TypeError Traceback (most recent call last)
in ()
7 true_positives = pm.Binomial(“true_positives”, n = positives, p = (1 - fn))
8 false_positives = pm.Binomial(“false_positives”, n = y - positives, p = fp)
----> 9 model = pm.Deterministic(‘model’, true_positives + false_positives, observed = y)
10
11

TypeError: Deterministic() got an unexpected keyword argument ‘observed’

For reference, I used the advice given here: https://github.com/pymc-devs/pymc3/issues/1108

Nadheesh · April 3, 2018, 4:33am

@ggluck Deterministic is not a random variable, it is correspondent to represent a deterministic variable (as a combination of several RVs). We can use deterministic variables to capture the transformations to the random variables and store them in the trace.

However, if you can fix the deterministic using observed then there is no point of storing it in the trace.

Please look this git issue for more details - https://github.com/pymc-devs/pymc3/issues/1971

Why do you want to use deterministic at the end? And how are you going to determine the bounds using a single observed value. I think you should provide multiple actual number of contaminated samples from multiple experiments.

I don’t think that I can help you with your exact usecase, if you can provide the probabilistic model of your usecase then someone can help you.

ggluck · April 3, 2018, 5:00am

Thanks for the prompt response. I was using one data point just to test if the code was working.
Here are some simulated data which should make the probabilistic model clearer.

fn = .2
fp = .05
n = [100] * 200
contaminated  = np.random.binomial(100,.4)
not_contaminated = 100 - contaminated
true_positive = np.random.binomial(contaminated, (1-fn)) 
false_positive =np.random.binomial(not_contaminated,fp)
y = true_positive + false_positive

with Model() as food_model:
    positives = pm.DiscreteUniform("positives", lower = 0, upper = n)
    proportion_positives = pm.Deterministic("proportion_positives", positives/100.0)
    true_positives = pm.Binomial("true_positives", n = positives, p = (1 - fn))
    false_positives = pm.Binomial("false_positives", n = y - positives, p = fp)
    model = pm.Deterministic('model', true_positives + false_positives, observed = y)
    


with food_model:

    # Draw samples
    trace = sample(1000, njobs=2)
    # Plot parameter
    forestplot(trace, varnames=['proportion_positives'])

print pm.stats.hpd(trace['proportion_positives'], .05)

Nadheesh · April 3, 2018, 5:16am

Of course things should work with one data point (running without an error does not mean that you’ll get the answer). The problem is that you can’t use deterministic as the likelihood. I don’t have sufficient knowledge on your exact usecase to decide the suitable likelihood for that.

Assume that suitable likelihood is Normal distribution.

n = 100
y = 50
fn = .2
fp = .1
with pm.Model() as food_model:
    positives = pm.DiscreteUniform("positives", lower=0, upper=n)
    true_positives = pm.Binomial("true_positives", n=positives, p=(1 - fn))
    false_positives = pm.Binomial("false_positives", n=y - positives, p=fp)
    model = pm.Normal('model', true_positives + false_positives, observed=y)

with food_model:
    # Draw samples
    trace = pm.sample(1000, njobs=2)
    # Plot parameter
    pm.forestplot(trace, varnames=['positives'])

Now this will work. It is because deterministic can not be a likelihood of a probabilistic model.

When I said the probabilistic model, I requested the equations (or conditional relationship) between the each random variable and how they should be mapped to likelihood.

Lets see if someone else may have a better insight on this problem

junpenglao · April 3, 2018, 6:40am

@Nadheesh is right, the problem here is that you cannot assign observed to pm.Deterministic, in this case it is also difficult to get a close form likelihood function for the observed Positive.

We are working to implement a likelihood-free algorithm (Approximate Bayesian Computation) for these cases. Currently, this is an ongoing project by @agustinaarroyuelo, @aloctavodia and me - Hopefully, this will be selected as one of the GSOC 2018 projects.

In this relatively simple case, you can try a reject sampling implemented in numpy:

n = 100
y = 50
fn = .2
fp = .1

# reject sampling
Nsample = 1000
yhat = 0
TrueP = []
for i in range(Nsample):
    # only accept if the simulated data is the same as the observed
    while yhat != y:
        # simulate from latent true postive, can be replaced with informative prior
        trueP = np.random.randint(0, n)
        # simulate observation
        yhat = np.random.binomial(trueP, 1 - fn) +\
               np.random.binomial(n - trueP, fp)
    TrueP.append(trueP)
    yhat = 0
TrueP = np.asarray(TrueP)

# display result
plt.hist(TrueP, np.arange(n));

ggluck · April 4, 2018, 7:07pm

Thanks for all your help. I appreciate the response.

bridgeland · June 2, 2020, 11:18am

@ggluck: how did you end up modeling your problem?

Topic		Replies	Views
Negative binomial model with exposure version agnostic gaussian_process , modeling	2	285	February 13, 2024
Problem with Deterministic variable Questions	7	1018	March 22, 2021
@pymc.deterministic v5 modeling	1	857	July 18, 2022
Beta-Binomial conjugate prior -- pm.Binomial buggy results...? Questions	9	836	November 3, 2021
SamplingError In Binomial Model. Practice Problem 1M21 From BMCP Book	2	301	May 10, 2023

Sum of two Binomial Distributions

Related topics