The 95% interval of PyMC3 trace doesn't cove the real values

nikzad · July 17, 2017, 4:45am

Well I have another question about how the final trace in PyMC3 works. I used hpd() function to find 95% interval but almost all true values are outside of this interval. What that means? how can I improve the results?

Before running the model, I standardized all columns and then reversed to the original sale (using scikit-learn MinMaxScaler)

bd = trace[-1000:]['y_est']
predictions = pd.DataFrame()
predictions['pred'] = bd.mean(axis=0)
predictions = predictions[~predictions.pred.isnull()]
predictions['y'] = new_data.y
predictions['residual'] = predictions.pred - predictions.y

bd_hpd = pm3.hpd(bd)

predictions['prctile_95'] = bd_hpd[:, 1]
predictions['prctile_5'] = bd_hpd[:,0]

junpenglao · July 17, 2017, 5:57am

Could you please also provide your model and some (simulation) data?

nikzad · July 20, 2017, 1:35pm

Trying to find a toy data/example based on my experiments but takes some time. In meantime, what this figure shows? Do we always expect the trace cover the real data? if not, what it means?

junpenglao · July 20, 2017, 1:48pm

It depends, that’s why it is a bit difficult to tell without more information…

nikzad · July 21, 2017, 2:03am

Thanks for your help. Here is an example of hierarchal model with toy data (attachment):
sample.csv (379.7 KB)

import numpy as np
import pandas as pd
import pymc3 as pm3
import matplotlib.pylab as plt

samples = pd.read_csv('sample.csv', sep=',')

# some records may have missing values. So, skip them during the modelling
masked = samples[~samples.gc.isnull()]
num_ind = len(masked.idx.unique())
idx = np.asarray(masked.idx.astype(int))

with pm3.Model() as hierarchical_model:
    mu_intercept = pm3.Normal('mu_intercept', mu=0, sd=30)
    mu_d = pm3.Normal('mu_d', mu=0, sd=30)
    sigma_intercept = pm3.Uniform('sigma_intercept', lower=0, upper=10)
    sigma_d = pm3.Uniform('sigma_d', lower=0, upper=500)
    sigma_gc = pm3.Uniform('sigma_gc', lower=0, upper=500)
    sigma_p = pm3.Uniform('sigma_p', lower=0, upper=500)
    intercept_prior = pm3.Normal('intercept_prior',
                                 mu=mu_intercept,
                                 sd=sigma_intercept,
                                 shape=num_ind)
    d_prior = pm3.Normal('d_prior',
                         mu=mu_d,
                         sd=sigma_d,
                         shape=num_ind)
    gc_prior = pm3.HalfNormal('gc_prior',
                              sd=sigma_gc,
                              shape=num_ind)

    p0_prior = pm3.Normal('p0_prior', mu=0, sd=sigma_p, shape=num_ind)
    p1_prior = pm3.Normal('p1_prior', mu=0, sd=sigma_p, shape=num_ind)
    p2_prior = pm3.Normal('p2_prior', mu=0, sd=sigma_p, shape=num_ind)
    p3_prior = pm3.Normal('p3_prior', mu=0, sd=sigma_p, shape=num_ind)
    p4_prior = pm3.Normal('p4_prior', mu=0, sd=sigma_p, shape=num_ind)

    y_est = (
        intercept_prior[idx] +
        d_prior[idx] * masked.d +
        gc_prior[idx] * masked.gc +
        p0_prior[idx] * masked.p0 +
        p1_prior[idx] * masked.p1 +
        p2_prior[idx] * masked.p2 +
        p3_prior[idx] * masked.p3 +
        p4_prior[idx] * masked.p4
    )

    y_est = pm3.Deterministic('y_est', y_est)

    sigma = pm3.HalfNormal('sigma', sd=100)
    epsilon = pm3.HalfNormal('epsilon', sd=sigma)
    pm3.Normal('y_pred',
               mu=y_est,
               sd=epsilon,
               observed=masked.y)

with hierarchical_model:
    start, sds, elbo = pm3.variational.advi(n=20000)
    step = pm3.Metropolis()
    hierarchical_trace = pm3.sample(300000, start=start, step=step)

varnames = ['mu_intercept', 'sigma_intercept', 'sigma_p', 'epsilon', 'sigma']
pm3.traceplot(hierarchical_trace[-1000:], varnames)
pm3.plot_posterior(hierarchical_trace[-1000:], varnames)
pm3.autocorrplot(hierarchical_trace[-1000:], varnames=varnames)

bd = hierarchical_trace[-1000:]['y_est']
bd_hpd = pm3.hpd(bd)
masked['pred'] = bd.mean(axis=0)
masked['prctile_95'] = bd_hpd[:, 1]
masked['prctile_5'] = bd_hpd[:,0]

for idx, sample in masked.groupby('idx'):
    plt.figure(figsize=(20,15))
    plt.plot(sample.d, sample.y, '*', markersize=30)
    plt.plot(sample.d, sample.pred, '-', markersize=20, color='g')
    plt.fill_between(sample.d, sample.prctile_5, sample.prctile_95, color='r', alpha=0.5);
    plt.title('Actual and Prediction for sample#{}'.format(idx), fontsize=30, fontweight='bold')
    plt.yticks(size=20)
    plt.xticks(size=20)
    plt.legend(fontsize=30)

NUTS sampling is too slow for this problem so I had to use Metropolis.

aseyboldt · July 21, 2017, 8:34am

Try something like this:

import numpy as np
import pandas as pd
import pymc3 as pm3
import matplotlib.pylab as plt
import seaborn as sb

%matplotlib inline

samples = pd.read_csv('~/Downloads/sample.csv', sep=',')

# some records may have missing values. So, skip them during the modelling
masked = samples[~samples.gc.isnull()]
num_ind = len(masked.idx.unique())
idx = np.asarray(masked.idx.astype(int))

with pm3.Model() as hierarchical_model:
    mu_intercept = pm3.Normal('mu_intercept', mu=0, sd=30)
    mu_d = pm3.Normal('mu_d', mu=0, sd=30)
    sigma_intercept = pm3.HalfCauchy('sigma_intercept', beta=2.5)
    sigma_d = pm3.HalfCauchy('sigma_d', beta=2.5)
    sigma_gc = pm3.HalfCauchy('sigma_gc', beta=2.5)
    sigma_p = pm3.HalfCauchy('sigma_p', beta=50)
    
    intercept_prior = pm3.Normal('intercept_prior_raw',
                                 mu=0,
                                 sd=1,
                                 shape=num_ind)
    intercept_prior = pm3.Deterministic(
        'intercept_prior', mu_intercept + sigma_intercept * intercept_prior)
    
    d_prior = pm3.Normal('d_prior_raw', mu=0, sd=1, shape=num_ind)
    d_prior = pm3.Deterministic(
        'd_prior', mu_d + sigma_d * d_prior)
    

    gc_prior = pm3.HalfNormal('gc_prior_raw',
                              sd=1,
                              shape=num_ind)
    gc_prior = pm3.Deterministic('gc_prior', sigma_gc * gc_prior)

    p_prior = pm3.Normal('p_prior_raw', mu=0, sd=1, shape=(5, num_ind))
    p_prior = pm3.Deterministic('p_prior', sigma_p * p_prior)

    y_est = (
        intercept_prior[idx] +
        d_prior[idx] * masked.d +
        gc_prior[idx] * masked.gc +
        p_prior[0, idx] * masked.p0 +
        p_prior[1, idx] * masked.p1 +
        p_prior[2, idx] * masked.p2 +
        p_prior[3, idx] * masked.p3 +
        p_prior[4, idx] * masked.p4
    )

    y_est = pm3.Deterministic('y_est', y_est)

    epsilon = pm3.HalfCauchy('epsilon', beta=2.5)
    pm3.StudentT('y_pred',
                 nu=7,
                 mu=y_est,
                 sd=epsilon,
                 observed=masked.y)

I mainly changed switched to a non-centered parametrization, and fixed a problem with epsilon. It runs with NUTS now (pm.sample(1000, njobs=4)), although there still is some trouble with the posterior I haven’t figured out. The it reaches max_treedepth a lot.

Don’t switch to metropolis if nuts doesn’t work well. If NUTS doesn’t work, then Metropolis most likely won’t either, it just hides this much better.

aseyboldt · July 21, 2017, 8:40am

And about the comparison between prediction and samples: You are comparing the wrong thing. The interval is the uncertainty in the predictor, it ignores the likelihood. You probably want to have a look at the predictive posterior instead. (What values would we expect for new measurements). You can get a sample from that using pm.sample_ppc(trace). Something like this:

with hierarchical_model:
    ppc = pm3.sample_ppc(trace)

bd = ppc['y_pred']
bd_hpd = pm3.hpd(bd)
masked['pred'] = np.median(bd, axis=0)
masked['prctile_95'] = bd_hpd[:, 1]
masked['prctile_5'] = bd_hpd[:,0]

nikzad · July 21, 2017, 11:12am

aseyboldt:

with pm3.Model() as hierarchical_model:
mu_intercept = pm3.Normal(‘mu_intercept’, mu=0, sd=30)
mu_d = pm3.Normal(‘mu_d’, mu=0, sd=30)
sigma_intercept = pm3.HalfCauchy(‘sigma_intercept’, beta=2.5)
sigma_d = pm3.HalfCauchy(‘sigma_d’, beta=2.5)
sigma_gc = pm3.HalfCauchy(‘sigma_gc’, beta=2.5)
sigma_p = pm3.HalfCauchy(‘sigma_p’, beta=50)

intercept_prior = pm3.Normal('intercept_prior_raw',
                             mu=0,
                             sd=1,
                             shape=num_ind)
intercept_prior = pm3.Deterministic(
    'intercept_prior', mu_intercept + sigma_intercept * intercept_prior)

d_prior = pm3.Normal('d_prior_raw', mu=0, sd=1, shape=num_ind)
d_prior = pm3.Deterministic(
    'd_prior', mu_d + sigma_d * d_prior)


gc_prior = pm3.HalfNormal('gc_prior_raw',
                          sd=1,
                          shape=num_ind)
gc_prior = pm3.Deterministic('gc_prior', sigma_gc * gc_prior)

p_prior = pm3.Normal('p_prior_raw', mu=0, sd=1, shape=(5, num_ind))
p_prior = pm3.Deterministic('p_prior', sigma_p * p_prior)

y_est = (
    intercept_prior[idx] +
    d_prior[idx] * masked.d +
    gc_prior[idx] * masked.gc +
    p_prior[0, idx] * masked.p0 +
    p_prior[1, idx] * masked.p1 +
    p_prior[2, idx] * masked.p2 +
    p_prior[3, idx] * masked.p3 +
    p_prior[4, idx] * masked.p4
)

y_est = pm3.Deterministic('y_est', y_est)

epsilon = pm3.HalfCauchy('epsilon', beta=2.5)
pm3.StudentT('y_pred',
             nu=7,
             mu=y_est,
             sd=epsilon,
             observed=masked.y)

That’s great. Thanks.

aseyboldt · July 21, 2017, 11:26am

Glad I could help. You might want to think about the predictors for the p-values (I guess that’s what they are?) a bit. They give the sampler a hard time, and in many cases that is an indication that the model might not fit well. Don’t ask me why there would be a relationship between posteriors that are nice to sample and good models, but from experience there seems to be one.

Topic		Replies	Views
Prediction using sample_ppc in Hierarchical model Questions from_github	6	4990	December 14, 2017
Computing Prediction Intervals without sampling Questions	4	1046	May 16, 2018
Sample method of pymc3 Questions	4	682	July 15, 2018
Sampler occasionally gets static in pymc3.1, but not in pymc3.0 Questions development	19	2224	July 5, 2018
PyMC3-Sampling not taking place for the entire range of values between the given bounds of a Uniform prior Questions	7	1592	January 14, 2018

The 95% interval of PyMC3 trace doesn't cove the real values

Related topics