How to limit a Geometric Discrete Distribution in a range

Hi mates !
I’m a PyMC3 beginner, the last weeks I was looking at all your posts for my specific doubts but I found a trouble with that Distribution and I want to share with you. If you can help me, I would be grateful.

The issue is when I want to plot a normal Geometric Discrete Distribution:

(In a .py program)

   nombres_clientes = pm.Geometric('clients_names', p=0.02685)
   trace = pm.sample(10000, cores=1)

My result is:


What I want is to limit the distribution between [1,100], I don’t want more data than 100. That is because I want only to pick values between that interval, to identify each number to a real name, and my database size is 100.

It could be maybe an stupid problem but I’ve been two hours trying to do it. I thought to use with logp function but I don’t know very well how to write in it.

Thankyou communiy.

You can use pm.Bound() to achieve this

with pm.Model() as model:
    names1 = pm.Geometric('names1', p=0.02685)
    names2 = pm.Bound(pm.Geometric, upper=100)('names2', p=0.02685)
    trace = pm.sample(10000, cores=1)

Or a potential:

with pm.Model(): 
    g = pm.Geometric('g', p=.01) 
    pm.Potential('constrain', tt.switch((g<1) | (g>100), -np.inf, 0.)) 
    trace = pm.sample() 

Thankyou both ! I got my goal :slight_smile:

@junpenglao @nkaimcaudle Sorry for opening the post again, but if I want to limit a continuous distribution, again between 0 and 101, what should I use ?

I did that:

import numpy as np
import pymc3 as pm
import theano.tensor as tt

with pm.Model() as nombre_Agapito:

edad = pm.Weibull("edad", alpha=2.2309, beta=0.02945)
dist = pm.Potential('dist', tt.switch((edad<1) | (edad>100),-np.inf, 0.))
trace1 = pm.sample(10000, cores=1)


and I get the error:

SamplingError: Bad initial energy

In theory the same two techniques will still work however here I think your starting parameters are causing the first sample from Weibull to be outside the 1-100 range. If I change to alpha=5. and beta=2. then it does (sometimes) work.

How are you getting your initial parameter inputs? They are very specific and not very well suited to cover the range 1-100

np.mean( pm.Weibull.dist(alpha=2.2309, beta=0.02945).random(size=10_000)>1 ) # equals 0.00

If I draw 10,000 random samples from the Weibull with those parameters then none of them are above 1.

1 Like

Hi there!
Just jumping in because I tried implementing @junpenglao’s solution with a Potential:

bike_count = pm.Geometric('bike_count', p, observed=bike_data["count"])
pm.Potential('constraint', tt.switch(bike_count > 1100, -np.inf, 0.))

This samples perfectly, but I’m not sure the constraint was applied, when looking at PPCs:

idata = az.from_pymc3(trace=trace_bike_3, prior=prior_samples, posterior_predictive=post_samples)

This gives:

PPCs seem to still go way out of range compared to observations, don’t they?

Just reviving this post: do you think it’s a bug, or did I make mistake somewhere?

Oh yeah seems like a bug - the posterior predict cannot account for the potential

We should add a warning to posterior_preditive whenever we detect a model contains a potential, in most case it will not be correct.

Thanks Junpeng! I’ll open an issue on GitHub then

1 Like