Issue getting started with simple example


I am brand new to using pymc3 package and am just trying to implement an example from a course on measurement uncertainty that I’m taking. (Note this is an optional employee education course through work, not a graded class where I shouldn’t find answers online). The course uses R but I find python to be preferable.

The (simple) problem is posed as following:

Say you have an end-gauge of actual (unknown) length at room-temperature “len”, and measured length “m”. The relationship between the two is:

len = m/(1+alpha*dT),

where “alpha” is an expansion coefficient and “dT” is the deviation from room temperature and “m” is the measured quantity. The goal is to find the posterior distribution on “len” in order to determine its expected value and standard deviation (i.e. the measurement uncertainty)

The problem specifies prior distributions on alpha and dT (Gaussians with small standard deviation) and a loose prior on “len” (Gaussian with large standard deviation). The problem specifies that “m” was measured 25 times with an average of 50.000215 and standard deviation of 5.8e-6. We assume that the measurements of “m” are normally distributed with a mean of the true value of “m”.

One issue I had is that the likelihood doesn’t seem like it can be specified just based on these statistics in pymc3, so I generated some dummy measurement data (I ended up doing 1000 measurements instead of 25). Again, the question is to get a posterior distribution on “len” (and in the process, although of less interest, updated posteriors on “alpha” and “dT”).

Here’s my code, which is not working and having convergence issues:

from IPython.core.pylabtools import figsize
import numpy as np
from matplotlib import pyplot as plt
import scipy.stats as stats
import pymc3 as pm
import theano.tensor as tt

basic_model = pm.Model()

xdata = np.random.normal(50.000215,5.8e-6,1000)
with basic_model:
    #prior distributions
    theta = pm.Normal('theta',mu=-.1,sd=.04)
    alpha = pm.Normal('alpha',mu=.0000115,sd=.0000012)
    length = pm.Normal('length',mu=50,sd=1)

mumeas = length*(1+alpha*theta)

with basic_model:
    obs = pm.Normal('obs',mu=mumeas,sd=5.8e-6,observed=xdata)
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(10000, tune=200000,step=step,start=start,njobs=1)

length_samples = trace['length']

plt.hist(length_samples, histtype='stepfilled', bins=30, alpha=0.85,
         label="posterior of $\lambda_1$", color="#A60628", normed=True)

I would really appreciate any help as to why this isn’t working. The course instructor isn’t familiar with pymc3 and isn’t sure what the bug is. I tried the default sampler (NUTS I think) as well as Metropolis but that completely failed with a zero gradient error. The (relevant) course slides are below

Ok, let’s use \LaTeX. Let L_T be the actual (unknown) length at room temperature and let L_o be the measured length. The relation between both is:

L_T = \dfrac{L_o}{1 + \alpha \cdot \Delta t},

with \alpha the expansion coefficient and \Delta t the deviation from the room temperature. L_o was measured 25 times and you have the mean and the standard deviation. You say

the question is to get a posterior distribution on “len”

which means the objective is to get the posterior distribution of L_T. According to the image you show

L_T (1 + \alpha \Delta t) = L_o.

You have priors for \alpha, \Delta t and L_T. With pymc3, you will find the posterior distribution of L_o. With my notation, the code is

with pm.Model() as basic_model:
    #prior distributions
    delta_t = pm.Normal('delta_t', mu=-0.1, sd=0.04)
    alpha = pm.Normal('alpha', mu=.0000115, sd=.0000012)
    L_T = pm.Normal('L_T', mu=50, sd=1)

    L_o = pm.Normal('L_o', mu=L_T * (1 + alpha*delta_t), sd=5.8e-6, observed=xdata)

Wait! You have data, 25 measurements. How do you include that in the variable xdata?

mu_t = 50.000215
sig_t = 5.8e-6

xdata = [mu_t] * 23
xdata.append(-2 * np.sqrt(3) * sig_t + mu_t)
xdata.append(2 * np.sqrt(3) * sig_t + mu_t)

You can check np.std(xdata , ddof=1) and np.mean(xdata). I leave to you how I did it. And now, sample

with basic_model:
    trace = pm.sample(draws=2000, tune=2000, step=pm.NUTS())

Without step=pm.NUTS(), it crashes and I don’t know why. You can see pm.traceplot(trace) and pm.summary(trace).

df = pm.summary(trace)
df[['mean', 'sd']].style.format('{:.11f}')
             mean 	     sd
delta_t -0.10037946095  0.04040735948
alpha 	0.00001149310 	0.00000121483
L_T 	50.00027270495 	0.00002410124

After all of this, I can answer your question.

#you can change the number of samples
with basic_model:
    post_c = pm.sample_ppc(trace, samples=2000, vars=[L_T, alpha, delta_t]) 
plt.figure(figsize=(10, 6))
_, _, _ = plt.hist(post_c['L_T'], bins=20, edgecolor='w', density=True, color="#A60628")
plt.title("Posterior of $L_T$")

And for \alpha and \Delta t

plt.figure(figsize=(10, 6))
_, _, _ = plt.hist(post_c['alpha'], bins=20, edgecolor='w', density=True, color='C1')
plt.title(r"Posterior of $\alpha$")

plt.figure(figsize=(10, 6))
_, _, _ = plt.hist(post_c['delta_t'], bins=20, edgecolor='w', density=True, color='C5')
plt.title(r"Posterior of $\Delta t$")
# plt.savefig('hist_a.png', dpi=150)


By the way, if the course uses R, what packages are you using?

1 Like