The 95% interval of PyMC3 trace doesn't cove the real values

Well I have another question about how the final trace in PyMC3 works. I used hpd() function to find 95% interval but almost all true values are outside of this interval. What that means? how can I improve the results?

Before running the model, I standardized all columns and then reversed to the original sale (using scikit-learn MinMaxScaler)

bd = trace[-1000:]['y_est']
predictions = pd.DataFrame()
predictions['pred'] = bd.mean(axis=0)
predictions = predictions[~predictions.pred.isnull()]
predictions['y'] = new_data.y
predictions['residual'] = predictions.pred - predictions.y

bd_hpd = pm3.hpd(bd)

predictions['prctile_95'] = bd_hpd[:, 1]
predictions['prctile_5'] = bd_hpd[:,0]

Could you please also provide your model and some (simulation) data?

1 Like

Trying to find a toy data/example based on my experiments but takes some time. In meantime, what this figure shows? Do we always expect the trace cover the real data? if not, what it means?

It depends, that’s why it is a bit difficult to tell without more information…

Thanks for your help. Here is an example of hierarchal model with toy data (attachment):
sample.csv (379.7 KB)

import numpy as np
import pandas as pd
import pymc3 as pm3
import matplotlib.pylab as plt

samples = pd.read_csv('sample.csv', sep=',')

# some records may have missing values. So, skip them during the modelling
masked = samples[~samples.gc.isnull()]
num_ind = len(masked.idx.unique())
idx = np.asarray(masked.idx.astype(int))

with pm3.Model() as hierarchical_model:
    mu_intercept = pm3.Normal('mu_intercept', mu=0, sd=30)
    mu_d = pm3.Normal('mu_d', mu=0, sd=30)
    sigma_intercept = pm3.Uniform('sigma_intercept', lower=0, upper=10)
    sigma_d = pm3.Uniform('sigma_d', lower=0, upper=500)
    sigma_gc = pm3.Uniform('sigma_gc', lower=0, upper=500)
    sigma_p = pm3.Uniform('sigma_p', lower=0, upper=500)
    intercept_prior = pm3.Normal('intercept_prior',
                                 mu=mu_intercept,
                                 sd=sigma_intercept,
                                 shape=num_ind)
    d_prior = pm3.Normal('d_prior',
                         mu=mu_d,
                         sd=sigma_d,
                         shape=num_ind)
    gc_prior = pm3.HalfNormal('gc_prior',
                              sd=sigma_gc,
                              shape=num_ind)

    p0_prior = pm3.Normal('p0_prior', mu=0, sd=sigma_p, shape=num_ind)
    p1_prior = pm3.Normal('p1_prior', mu=0, sd=sigma_p, shape=num_ind)
    p2_prior = pm3.Normal('p2_prior', mu=0, sd=sigma_p, shape=num_ind)
    p3_prior = pm3.Normal('p3_prior', mu=0, sd=sigma_p, shape=num_ind)
    p4_prior = pm3.Normal('p4_prior', mu=0, sd=sigma_p, shape=num_ind)

    y_est = (
        intercept_prior[idx] +
        d_prior[idx] * masked.d +
        gc_prior[idx] * masked.gc +
        p0_prior[idx] * masked.p0 +
        p1_prior[idx] * masked.p1 +
        p2_prior[idx] * masked.p2 +
        p3_prior[idx] * masked.p3 +
        p4_prior[idx] * masked.p4
    )

    y_est = pm3.Deterministic('y_est', y_est)

    sigma = pm3.HalfNormal('sigma', sd=100)
    epsilon = pm3.HalfNormal('epsilon', sd=sigma)
    pm3.Normal('y_pred',
               mu=y_est,
               sd=epsilon,
               observed=masked.y)

with hierarchical_model:
    start, sds, elbo = pm3.variational.advi(n=20000)
    step = pm3.Metropolis()
    hierarchical_trace = pm3.sample(300000, start=start, step=step)

varnames = ['mu_intercept', 'sigma_intercept', 'sigma_p', 'epsilon', 'sigma']
pm3.traceplot(hierarchical_trace[-1000:], varnames)
pm3.plot_posterior(hierarchical_trace[-1000:], varnames)
pm3.autocorrplot(hierarchical_trace[-1000:], varnames=varnames)

bd = hierarchical_trace[-1000:]['y_est']
bd_hpd = pm3.hpd(bd)
masked['pred'] = bd.mean(axis=0)
masked['prctile_95'] = bd_hpd[:, 1]
masked['prctile_5'] = bd_hpd[:,0]

for idx, sample in masked.groupby('idx'):
    plt.figure(figsize=(20,15))
    plt.plot(sample.d, sample.y, '*', markersize=30)
    plt.plot(sample.d, sample.pred, '-', markersize=20, color='g')
    plt.fill_between(sample.d, sample.prctile_5, sample.prctile_95, color='r', alpha=0.5);
    plt.title('Actual and Prediction for sample#{}'.format(idx), fontsize=30, fontweight='bold')
    plt.yticks(size=20)
    plt.xticks(size=20)
    plt.legend(fontsize=30)

NUTS sampling is too slow for this problem so I had to use Metropolis.

Try something like this:

import numpy as np
import pandas as pd
import pymc3 as pm3
import matplotlib.pylab as plt
import seaborn as sb

%matplotlib inline

samples = pd.read_csv('~/Downloads/sample.csv', sep=',')

# some records may have missing values. So, skip them during the modelling
masked = samples[~samples.gc.isnull()]
num_ind = len(masked.idx.unique())
idx = np.asarray(masked.idx.astype(int))

with pm3.Model() as hierarchical_model:
    mu_intercept = pm3.Normal('mu_intercept', mu=0, sd=30)
    mu_d = pm3.Normal('mu_d', mu=0, sd=30)
    sigma_intercept = pm3.HalfCauchy('sigma_intercept', beta=2.5)
    sigma_d = pm3.HalfCauchy('sigma_d', beta=2.5)
    sigma_gc = pm3.HalfCauchy('sigma_gc', beta=2.5)
    sigma_p = pm3.HalfCauchy('sigma_p', beta=50)
    
    intercept_prior = pm3.Normal('intercept_prior_raw',
                                 mu=0,
                                 sd=1,
                                 shape=num_ind)
    intercept_prior = pm3.Deterministic(
        'intercept_prior', mu_intercept + sigma_intercept * intercept_prior)
    
    d_prior = pm3.Normal('d_prior_raw', mu=0, sd=1, shape=num_ind)
    d_prior = pm3.Deterministic(
        'd_prior', mu_d + sigma_d * d_prior)
    

    gc_prior = pm3.HalfNormal('gc_prior_raw',
                              sd=1,
                              shape=num_ind)
    gc_prior = pm3.Deterministic('gc_prior', sigma_gc * gc_prior)

    p_prior = pm3.Normal('p_prior_raw', mu=0, sd=1, shape=(5, num_ind))
    p_prior = pm3.Deterministic('p_prior', sigma_p * p_prior)

    y_est = (
        intercept_prior[idx] +
        d_prior[idx] * masked.d +
        gc_prior[idx] * masked.gc +
        p_prior[0, idx] * masked.p0 +
        p_prior[1, idx] * masked.p1 +
        p_prior[2, idx] * masked.p2 +
        p_prior[3, idx] * masked.p3 +
        p_prior[4, idx] * masked.p4
    )

    y_est = pm3.Deterministic('y_est', y_est)

    epsilon = pm3.HalfCauchy('epsilon', beta=2.5)
    pm3.StudentT('y_pred',
                 nu=7,
                 mu=y_est,
                 sd=epsilon,
                 observed=masked.y)

I mainly changed switched to a non-centered parametrization, and fixed a problem with epsilon. It runs with NUTS now (pm.sample(1000, njobs=4)), although there still is some trouble with the posterior I haven’t figured out. The it reaches max_treedepth a lot.

Don’t switch to metropolis if nuts doesn’t work well. If NUTS doesn’t work, then Metropolis most likely won’t either, it just hides this much better.

1 Like

And about the comparison between prediction and samples: You are comparing the wrong thing. The interval is the uncertainty in the predictor, it ignores the likelihood. You probably want to have a look at the predictive posterior instead. (What values would we expect for new measurements). You can get a sample from that using pm.sample_ppc(trace). Something like this:

with hierarchical_model:
    ppc = pm3.sample_ppc(trace)

bd = ppc['y_pred']
bd_hpd = pm3.hpd(bd)
masked['pred'] = np.median(bd, axis=0)
masked['prctile_95'] = bd_hpd[:, 1]
masked['prctile_5'] = bd_hpd[:,0]

That’s great. Thanks.

Glad I could help. You might want to think about the predictors for the p-values (I guess that’s what they are?) a bit. They give the sampler a hard time, and in many cases that is an indication that the model might not fit well. Don’t ask me why there would be a relationship between posteriors that are nice to sample and good models, but from experience there seems to be one.

1 Like