I’m trying to use PyMC3 to work out an example (exercise 3.3, p 47) from David MacKay’s great book " Information Theory, Inference, and Learning Algorithms" .

I was able to use a crude (manual) grid search approach and it worked OK. I was hoping to learn more about PyMC3 by using it for the same purpose. Here’s what I tried:

```
import numpy as np
import pymc3 as pm
import arviz as az
import theano.tensor as tt
print(f"Running on PyMC3 v{pm.__version__}")
np.random.seed(451)
x = np.random.exponential(3,size=500)
minx=1
maxx=20
obs = x[np.where(~((x<minx) | (x>maxx)))] # remove values outside range
with pm.Model() as expModel:
λ = pm.Exponential("λ", lam=5)
x = pm.Exponential('x', lam=1/λ, observed=obs)
clip_exp = pm.Potential("clip_exp", tt.switch(((x<minx)|(x>maxx)),-np.inf,0))
trace= pm.sample(2000, tune=1000, return_inferencedata=True)
az.summary(trace)
```

Sadly it doesn’t appear to have worked:

```
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd
λ 3.683 0.188 3.333 4.051 0.003 0.002 3268.0 3268.0
```

Any suggestions would be appreciated!

thank you,

-Steve

1 Like

I think you want to use a bound on `lambda`

(or `1/lambda`

)?

`x`

is not a random variable since you observed it, so the `pm.Potential`

is not doing anything (`x`

will always equal obs, and the `clip_exp`

will always return `0`

).

Ah, OK. Thank you for that insight. The idea is that I’m observing x, which is exponentially distributed, but I’m only able to observe values of x between minx, and maxx. Values outside this range are not available to me. In my model \lambda has the same units as x, so I need to specific lam (which has units of 1/x) as the reciprocal of \lambda. I looked into pm.Bound, but my understanding is that it cannot be used for observed values. Is there a simple way to describe observed values that are censored in this way?

thanks,

-Steve

1 Like

According to Wikipedia it seems you need to add a correction term equal to `cdf(max) - cdf(min)`

. My guess would be a potential like this might do the job:

```
λ = pm.Exponential("λ", lam=5)
x = pm.Exponential('x', lam=1/λ, observed=obs)
exp_dist = pm.Exponential.dist(lam=1/λ). # this is not part of the model, just used to get the logcdf
clip_exp = pm.Potential("clip_exp", tt.log(tt.exp(exp_dist.logcdf(maxx)) - tt.exp(exp_dist.logcdf(minx))))
```

1 Like

Ah, I see. It’s not independent of \lambda, just x. This didn’t seem to work, but I’ll poke around a bit more. Thanks for the help.

I wrongly used the argument `mu`

in `exp_dist`

instead of `lam`

in my code snippet above (changed it now). Maybe that’s why it was not working?

The other thing is whether the value in the `Potential`

should be negated (I never remember this).

This STAN page may be helpful to understand the general approach: 7.4 Sampling Statements | Stan Reference Manual

Thanks. Here’s what I’ve got right now:

```
import numpy as np
import pymc3 as pm
import arviz as az
import theano.tensor as tt
print(f"Running on PyMC3 v{pm.__version__}")
np.random.seed(451)
x = np.random.exponential(3,size=500)
minx=1
maxx=20
obs = x[np.where(~((x<minx) | (x>maxx)))] # remove values outside range
with pm.Model() as expModel:
λ = pm.Exponential("λ", lam=1/5) # prior exponential with mean of 5
x = pm.Exponential('x', lam=1/λ, observed=obs) # obs exponential with mean of $\lambda$.
exp_dist = pm.Exponential.dist(lam=1/λ) # this is not part of the model, just used to get the logcdf
clip_exp = pm.Potential("clip_exp", -pm.math.logdiffexp(exp_dist.logcdf(maxx), exp_dist.logcdf(minx)))
trace= pm.sample(2000, tune=1000, return_inferencedata=True)
az.summary(trace)
```

Still getting:

```
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd
λ 3.874 0.206 3.484 4.257 0.004 0.003 3279.0 3279.0
```

I’m expecting \lambda to be around 3.

The `-pm.math...`

doesn’t seem to matter (+ or -, makes no difference). Not sure how to approach debugging this. Any ideas?

Maybe the `Potential`

needs to be multiplied by the number of observations (I am running out of ideas here )?

Edit: There is one example with censored data in this Notebook (resources/DataAnalysis.ipynb at master · pymc-devs/resources · GitHub), section 5.5. It strengthens my conviction that the the correction term has to be proportional to the number of observations.

`clip_exp = pm.Potential("clip_exp", pm.math.logdiffexp(exp_dist.logcdf(maxx), exp_dist.logcdf(minx)) * x.size)`

Edit: It seems to work be working in this Colab NoteBook (?)

1 Like

Oh, wow. I never would’ve guessed that! Thank you. I’ll give that a try.

I think that worked! Here’s what I have now:

```
import numpy as np
import pymc3 as pm
import arviz as az
import theano.tensor as tt
print(f"Running on PyMC3 v{pm.__version__}")
np.random.seed(451)
x = np.random.exponential(3,size=500)
minx=1
maxx=20
obs = x[np.where(~((x<minx) | (x>maxx)))] # remove values outside range
def censor_correct(λ):
result = -tt.log(tt.exp(-minx/λ)- tt.exp(-maxx/λ))*len(obs)
return result
with pm.Model() as expModel:
λ = pm.Exponential("λ", lam=1/5) # prior exponential with mean of 5
clip_exp = pm.Potential("clip_exp", censor_correct(λ))
x = pm.Exponential('x', lam=1/λ, observed=obs) # obs exponential with mean of $\lambda$.
trace= pm.sample(2000, tune=1000, return_inferencedata=True)
az.summary(trace)
```

```
mean sd hdi_3% hdi_97%
λ 2.895 0.159 2.603 3.199
```

Thank you!

3 Likes