Using ADVI (with mini-batch) with streaming data

Nadheesh · May 22, 2018, 5:36am

I’m trying to figure out how can we use the current ADVI (with mini-batch) implementation with streaming data. Usually, In ADVI, we have to provide the complete dataset before start training. However, I’m interested in using streaming datasets to perform the Bayesian Inference.

Please let me know that Is it possible to use ADVI with streaming dataset (as a online ML algorithm)? if possible then how can I proceed with that?

junpenglao · May 22, 2018, 9:07am

What do you mean by streaming data? I guess you can use the theano.shared and do .set_value() when you have new data comes in.

Nadheesh · May 22, 2018, 9:42am

Thanks for the suggestion. I thought of that, then I felt that may not continuously update the posterior.

#initiate
advi = ADVI()
for _x, _y in batches:
    share_x = share_x.set_value(_x)
    share_y = share_y.set_value(_y)
    minibatch_x, minibatch_y = pm.Minibatch(_, batch_size=100),  pm.Minibatch(_y batch_size=100)
    apprx = fit.(100, more_replacements={shared_X: minibatch_x, shared_y: minibatch_y})

So will this update the mean, std approximations of ADVI incrementally for each batch? Here it should be noted that I completely discard the previous batches when a new batch comes.

junpenglao · May 22, 2018, 9:59am

No it will not. I see what you mean now - basically what you want is a way to incrementally update the approximation, so that if you are taking batch by batch the end result will be identical as if you train on the whole dataset altogether. Is that right?

If that’s the case, I dont think there is an out of box solution yet. I was thinking of how to do this without recompiling a new model with new prior, but so far still no good answer.

Nadheesh · May 22, 2018, 10:39am

yup.

I see. So is this a limitation due to the way that ADVI is implemented in pymc3 ?

Because, in the paper, the way author presented ADVI make the impression that it can be used for streaming data as well.

junpenglao · May 22, 2018, 10:42am

I think this is a limitation of almost all the framework currently, they are not build to handle Bayesian Filtering problem.

Unless I miss it somewhere, in the paper they are referring to minibatching the data input to reduce the computation demand (which is what you can already do with minibatch in PyMC3).

Nadheesh · May 22, 2018, 10:51am

I don’t think you missed it either, I didn’t see a place they explicitly tell that, but I got the impression due to the way they present mini-batch.

Can you please clarify me this statement again. Does that mean ADVI mini-batch is not meant to train with streaming data? Is this a limitation in the ADVI framework?

junpenglao · May 22, 2018, 11:36am

No it’s not a limitation for ADVI alone, you have the same problem doing sampling. I see this question comes up quite a bit, for example after you inference via posterior sample, is is possible to update the trace according to the new observation - I have not yet see any solution that is general and easy to implement.

As for ADVI specifically, there is a scaling that you need to define via total_size kwarg. This means that if you have future data with unknown length, you need to properly reweigh the scaling - simply replacing the minibatch value is not valid.

Nadheesh · May 22, 2018, 7:12pm

Thanks for the clarification @junpenglao. This exactly what I wanted to know.

Nadheesh · June 6, 2018, 6:01am

I think Edward has managed to extend ADVI for online learning. @junpenglao do you know how does this work?

junpenglao · June 6, 2018, 6:23am

That page describe the same thing as the minibatch in PyMC3.

Nadheesh · June 6, 2018, 1:17pm

Yup, it seems I misunderstood what they say. They are proposing to use Bayesian filtering.

Anyway, you mentioned that we use size of the dataset for the ADVI computations. Is it only to compute the scaling( total_size kwarg), which defines the amount of scaling of the computations on the mini-batch?

junpenglao · June 6, 2018, 1:20pm

Yep that’s the only application right now.

Nadheesh · June 7, 2018, 9:01am

I tried to implement this using PyMC3 : https://discourse.edwardlib.org/t/iterative-estimators-bayes-filters-in-edward/104/4

This is my code:

seed = 7
np.random.seed(seed)

d = 5
m = 10
coeffs = np.random.uniform(-10, 10, d)

def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def next_set(w, m=10, d=3):
    x = np.random.rand(m, d)
    y = np.dot(x, w) 
    y = y + np.random.normal(0, 0.1, m)
    return x, y

test_x, test_y = next_set(coeffs, 50, d)

model = pm.Model()
shared_w_mu = shared(np.full(d, 0.0))
shared_w_sd = shared(np.full(d, 1.0))
shared_sigma_mu = shared(0.0)
shared_sigma_sd = shared(1.0)

x, y = next_set(coeffs, m, d)
shared_x = shared(x)
shared_y = shared(y)

with model:
    w = pm.Normal('w', mu=shared_w_mu.get_value(), sd=shared_w_sd.get_value(), shape=d)
    sigma = pm.Normal("s", shared_sigma_mu.get_value(), sd=shared_sigma_sd.get_value())

    mu = tt.dot(shared_x, w)
    pm.Normal("y", mu=mu, sd=sigma, observed=shared_y)

    advi = pm.ADVI(total_size = 500)

for i in range(50):
    with model:
        apprx = advi.fit(50)

    x, y = next_set(coeffs, m, d)
    shared_x.set_value(x)
    shared_y.set_value(y)

    mu_dic = apprx.groups[0].bij.rmap(apprx.params[0].eval())
    sd_dic = apprx.groups[0].bij.rmap(apprx.params[1].eval())
    
    shared_w_mu.set_value(mu_dic['w'])
    shared_w_sd.set_value(sd_dic['w'])

    shared_sigma_mu.set_value(mu_dic['s'])
    shared_sigma_sd.set_value(sd_dic['s'])

    pred = np.dot(test_x, mu_dic['w']) # avoid ppc to improve the performance
    print(mean_absolute_percentage_error(test_y, pred))

Here, I try to extend the ADVI for streaming ML using Bayesian Filtering, assuming that we know the total_size. However, the accuracy of estimated coefficients improves very slowly compared to the Edward implementation. What am I doing wrong?

junpenglao · June 7, 2018, 9:22am

The total_size should be specified in pm.Normal("y", mu=mu, sd=sigma, observed=shared_y)

Nadheesh · June 7, 2018, 12:05pm

Changed but did not improve

junpenglao · June 7, 2018, 12:37pm

apprx = advi.fit(50) is probably too little for the approximation to converge, check also the optimizer.

Nadheesh · June 13, 2018, 3:38am

I increase the number of iterations and set the optimizer to the same optimizer used in Edward (ADAM). However, still the PyMC3 model does not improve.

I think the model is not updated with the new mean and std once single batch is trained. Do we have a way of changing the mean and std of the each FreeRV dynamically?

junpenglao · June 13, 2018, 6:03am

Not sure what you mean - when you are in the new batch the mean and std of the ADVI approximation will not be reinitialized, it will start with the mean and std resulting from the training of the last batch.

Try removing .get_value() in the model definitation:

with model:
    w = pm.Normal('w', mu=shared_w_mu, sd=shared_w_sd, shape=d)
    sigma = pm.Normal("s", shared_sigma_mu, sd=shared_sigma_sd)

Topic		Replies	Views
Minibatch for a large dataset ADVI Questions	2	1209	September 7, 2018
PPC with Minibatch ADVI Questions	4	1182	August 22, 2022
Advi_minibatch is deprecated? Questions	4	1285	September 20, 2018
Usage of Grouped ADVI Questions	2	542	October 18, 2017
ADVI Minibatch slows down with increasing size of data Questions	3	988	April 19, 2019

Using ADVI (with mini-batch) with streaming data

Related topics