Unexpected Results when using WAIC to Compute Log Predictive Accuracy

Nadheesh · May 13, 2018, 7:29am

Related to : Using WAIC to compute Log Predictive Accuracy

Can someone please explain this behavior.

I used NUTS to train linear regression model with a generated dataset that have 100 data instances and 100 dimensions.

If we compute WAIC for the train data it reduces when increasing the number of samples (samples drawn from the target distribution), as expected. However, when I calculate the WAIC for the test data, at first it gives a very low value and then it increased as shown in the figure below (x axis is time taken and y axis is WAIC).

mcmc_elpd

It should be noted that, eventhough the WAIC is really high when number of samples are increased, the model converges to the true parameters. Even if we compute the MSE for test set, that also converge to the 0 with time as expected. I also checked the traceplot and observe that the variance of the model also reduces when drawing more samples. However, when I ran the same experiment reducing the number of dimensions to 10 then MCMC WAIC converges as expected.

Moreover, when I compute LOO for test data, the process fails with and error during the _psislw computation (this won’t happen when computing LOO with train data).

It is a great help if someone can help me to understand this behavior. Is that means we can’t use current implementation of WAIC and LOO to calculate the predictive accuracy?

I have attached the relevant script for your reference.

import pymc3 as pm
import numpy as np
from sklearn.model_selection import train_test_split
from theano import shared, tensor as tt
from sklearn.metrics import mean_squared_error

seed = 7
np.random.seed(seed)
def operator(x, w, c, noise):
    """
    Operator is used to generate artificial datasets
    :param w:
    :param c:
    :param noise:
    :return:
    """
    return x * w + c + noise

n = 120
d = 100

_n = n
n = int(n + n*0.2)
X = np.matrix([np.random.uniform(0, 1, d) for i in range(n)])
_w = np.matrix(np.random.uniform(-10, 10, d)).T
_c = float(np.random.uniform(-10, 10, 1))


y = operator(x=X, w=_w, c=_c, noise=np.matrix(np.random.normal(scale=0.5, size=n)).T)
y = np.array(y).flatten()

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=int(_n*0.2), shuffle=True)


def adding_intecept(X):
    """
    Adding a column for intercept
    :param X: features
    :return:
    """
    return np.c_[X, np.ones(X.shape[0])]


train_X, test_X = adding_intecept(train_X), adding_intecept(test_X)

shared_X = shared(train_X)
shared_y = shared(train_y)

with pm.Model() as model:
    # Define hyper-prior
    alpha = pm.Gamma("alpha", alpha=1e-2, beta=1e-4)

    # Define priors'
    w = pm.Normal("w", mu=0, sd=alpha, shape=train_X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu = tt.dot(w, shared_X.T)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=shared_y)

    trace = pm.sample(500, discard_tuned_samples=False, random_seed=seed, njobs=1, chains=1)

trace1 = trace[:50] # 50 samples from tuned samples
trace2 = trace[500:1000] # after discarding 500 tune samples

print("True coefficients")
print(np.array(_w).flatten().tolist()+[_c])
print("Coefficients inferred ")
print(trace1['w'].tolist())
print(trace2['w'].tolist())

shared_X.set_value(test_X)
print("MSE")
print(mean_squared_error(test_y, pm.sample_ppc(trace1, 10000, model)['y'].mean(axis=0)))
print(mean_squared_error(test_y, pm.sample_ppc(trace2, 10000, model)['y'].mean(axis=0)))

shared_X.set_value(train_X)
print("WAIC for train data")
print(pm.waic(trace1, model).WAIC)
print(pm.waic(trace2, model).WAIC)

shared_X.set_value(test_X)
shared_y.set_value(test_y)

print("WAIC for test data")
print(pm.waic(trace1, model).WAIC)
print(pm.waic(trace2, model).WAIC)

junpenglao · May 13, 2018, 7:53am

I can run loo on the test data set without problem - are you on master?
Two quick comments:

When I run the code above I see a lot of warning for both WAIC and LOO, this is an indication of problem in the computation and you cannot trust the result.
I would suggest you to set the number of predictors to a lower dimension - it is likely that your model is overfitting with few data points.

Nadheesh · May 13, 2018, 8:03am

Thanks for the suggestions @junpenglao. I will re-check the LOO.

I did a little bit of investigation. It seems, you are correct. When calculating the WAIC; computed lppd_i are getting smaller as we increase the number of samples drawn from the target distribution.

However, when we compute the var(logp), this result in some large variance values and that cause the large WAIC.

So what is the alternative, if we can’t trust WAIC or LOO for this case. Can we just use lppd values?

junpenglao · May 13, 2018, 8:08am

You need to find out first what is causing the large var(logp) in WAIC or large khat in LOO. And depending on your model set up, there are different possibility that cause such problem. See for example:

Topic		Replies	Views
Using WAIC to compute Log Predictive Accuracy Questions	3	724	May 11, 2018
Compute waic for multioutput Gaussian process Questions	0	414	April 14, 2023
Calculating WAIC for models with multiple likelihood functions Questions	20	4508	October 26, 2020
Is model with high or low WAIC and LOO better version agnostic arviz	1	1605	August 12, 2021
DIC, WAIC, WBIC on regression tasks Questions	6	3802	October 20, 2021

Unexpected Results when using WAIC to Compute Log Predictive Accuracy

Related topics