I tried training a linear regression model using both ADVI and MCMC. ADVI is expected to have less accuracy than MCMC when both models are converged. However, I found that ADVI has unacceptable accuracy compare to the MCMC sampling.
The script that was used for evaluation :
import time import numpy as np import pymc3 as pm from sklearn.datasets import load_diabetes, load_boston from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from theano import tensor as tt, shared def adding_intecept(X): """ Adding a column for intercept :param X: features :return: """ return np.c_[X, np.ones(X.shape)] data = load_diabetes() # load diabetes dataset # data = load_boston() # load boston dataset data["data"] = adding_intecept(data["data"]) X, X_test, y, y_test = train_test_split(data["data"], data["target"], test_size=0.25, shuffle=False) shared_X = shared(X) # TODO : add hyper prior # Define the model with pm.Model() as regression_model: # Define priors sigma = pm.HalfCauchy('sigma', beta=10) w = pm.Normal("w", mu=0, sd=20, shape=X.shape) mu = tt.dot(w, shared_X.T) # Define likelihood likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y) # Inference t_start = time.time() trace = pm.sample(1000) time_taken_mcmc = (int(round((time.time() - t_start) * 1000))) t_start = time.time() apprx = pm.fit(10000) time_taken_advi = (int(round((time.time() - t_start) * 1000))) shared_X.set_value(X_test) # MCMC error ppc = pm.sample_ppc(trace, model=regression_model, samples=50000) pred_mcmc = ppc['y'].mean(axis=0) mse_mcmc = mean_squared_error(y_test, pred_mcmc) print("Inference : MCMC, wall time : %ims, mse: %f" % (time_taken_mcmc, mse_mcmc)) # ADVI error ppc = pm.sample_ppc(apprx.sample(50000), model=regression_model, samples=50000) pred_advi = ppc['y'].mean(axis=0) mse_advi = mean_squared_error(y_test, pred_advi) print("Inference : ADVI, wall time : %ims, mse: %f" % (time_taken_advi, mse_advi))
The output :
100%|██████████| 1500/1500 [00:03<00:00, 466.80it/s] Average Loss = 2,413.6: 100%|██████████| 10000/10000 [00:01<00:00, 6276.26it/s] Finished [100%]: Average Loss = 2,410.3 Inference : MCMC, wall time : 4532ms, mse: 5440.780652 Inference : ADVI, wall time : 6138ms, mse: 29495.179671
Since I was not happy with the ADVI results I extended the evaluation by evaluating the linear regression using CAVI.
The extended script :
from skbayes.linear_models import VBLinearRegression lr = VBLinearRegression(n_iter=100) t_start = time.time() lr.fit(X,y) time_taken_vi = (int(round((time.time() - t_start) * 1000))) pred_vi = lr.predict(X_test) mse_vi = mean_squared_error(y_test, pred_vi) print("Inference : VI, wall time : %ims, mse: %f" % (time_taken_vi, mse_vi))
Inference : VI, wall time : 1ms, mse: 2823.810371
As you can observe, the CAVI performs as expected (It outperform MCMC aswell. May be due to the less number of samples draws during MCMC). I’m interested in knowing why ADVI failed to perform as desired, if CAVI can achieve better accuracy.
I appreciate any help to understand the following observations,
- ADVI has very poor accuracy compared to MCMC for linear regression. What could be the reasons for that?
- Why did ADVI fail to achieve sufficient accuracy if CAVI is capable of providing better accuracy ?
- Why is ADVI much slower than CAVI (is it due to the automatic differentiation of expressions) ?
- Why is ADVI slower than MCMC? Is it because overhead due to automatic differentiation higher than gain of it stochastic VI in this particular case ?
- What is the complete time taken to perform ADVI? Is it the wall time measured by my script (6138ms) or the time logged by PyMC3 (1s)?
Note : used skbayes to perform linear regression with CAVI