I tried training a linear regression model using both ADVI and MCMC. ADVI is expected to have less accuracy than MCMC when both models are converged. However, I found that ADVI has unacceptable accuracy compare to the MCMC sampling.
The script that was used for evaluation :
import time
import numpy as np
import pymc3 as pm
from sklearn.datasets import load_diabetes, load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from theano import tensor as tt, shared
def adding_intecept(X):
"""
Adding a column for intercept
:param X: features
:return:
"""
return np.c_[X, np.ones(X.shape[0])]
data = load_diabetes() # load diabetes dataset
# data = load_boston() # load boston dataset
data["data"] = adding_intecept(data["data"])
X, X_test, y, y_test = train_test_split(data["data"], data["target"], test_size=0.25, shuffle=False)
shared_X = shared(X)
# TODO : add hyper prior
# Define the model
with pm.Model() as regression_model:
# Define priors
sigma = pm.HalfCauchy('sigma', beta=10)
w = pm.Normal("w", mu=0, sd=20, shape=X.shape[1])
mu = tt.dot(w, shared_X.T)
# Define likelihood
likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)
# Inference
t_start = time.time()
trace = pm.sample(1000)
time_taken_mcmc = (int(round((time.time() - t_start) * 1000)))
t_start = time.time()
apprx = pm.fit(10000)
time_taken_advi = (int(round((time.time() - t_start) * 1000)))
shared_X.set_value(X_test)
# MCMC error
ppc = pm.sample_ppc(trace, model=regression_model, samples=50000)
pred_mcmc = ppc['y'].mean(axis=0)
mse_mcmc = mean_squared_error(y_test, pred_mcmc)
print("Inference : MCMC, wall time : %ims, mse: %f" % (time_taken_mcmc, mse_mcmc))
# ADVI error
ppc = pm.sample_ppc(apprx.sample(50000), model=regression_model, samples=50000)
pred_advi = ppc['y'].mean(axis=0)
mse_advi = mean_squared_error(y_test, pred_advi)
print("Inference : ADVI, wall time : %ims, mse: %f" % (time_taken_advi, mse_advi))
The output :
100%|ββββββββββ| 1500/1500 [00:03<00:00, 466.80it/s]
Average Loss = 2,413.6: 100%|ββββββββββ| 10000/10000 [00:01<00:00, 6276.26it/s]
Finished [100%]: Average Loss = 2,410.3
Inference : MCMC, wall time : 4532ms, mse: 5440.780652
Inference : ADVI, wall time : 6138ms, mse: 29495.179671
Since I was not happy with the ADVI results I extended the evaluation by evaluating the linear regression using CAVI.
The extended script :
from skbayes.linear_models import VBLinearRegression
lr = VBLinearRegression(n_iter=100)
t_start = time.time()
lr.fit(X,y)
time_taken_vi = (int(round((time.time() - t_start) * 1000)))
pred_vi = lr.predict(X_test)
mse_vi = mean_squared_error(y_test, pred_vi)
print("Inference : VI, wall time : %ims, mse: %f" % (time_taken_vi, mse_vi))
Output :
Inference : VI, wall time : 1ms, mse: 2823.810371
As you can observe, the CAVI performs as expected (It outperform MCMC aswell. May be due to the less number of samples draws during MCMC). Iβm interested in knowing why ADVI failed to perform as desired, if CAVI can achieve better accuracy.
I appreciate any help to understand the following observations,
- ADVI has very poor accuracy compared to MCMC for linear regression. What could be the reasons for that?
- Why did ADVI fail to achieve sufficient accuracy if CAVI is capable of providing better accuracy ?
- Why is ADVI much slower than CAVI (is it due to the automatic differentiation of expressions) ?
- Why is ADVI slower than MCMC? Is it because overhead due to automatic differentiation higher than gain of it stochastic VI in this particular case ?
- What is the complete time taken to perform ADVI? Is it the wall time measured by my script (6138ms) or the time logged by PyMC3 (1s)?
Note : used skbayes to perform linear regression with CAVI