Performance speedup for updating posterior with new data

9f0sdau90 · April 18, 2018, 12:04pm

Hi All

I’ve been developing a Poisson Regression model with the capabilities to incrementally update the posterior as new data becomes available (online learning).

I’ve been using @davidbrochart Updating Priors notebook as a guide for how to do this in pymc3. I’ve attached an example to this post.

The problem I’m having is that although the initial sampling takes 1 to 2 seconds (on 100 data points) - when new data is fed in this jumps to about 15 seconds per new data point (I’m feeding in just 1 new point for each update of the posterior). I’ve run this on a more powerful virtual machine (non-gpu) and the minimum I’ve got it down to is 6-10 seconds.

I’m wondering is there something I’m missing that could help quicken the updates? It seems like a lot of effort is spent initialising a new model every time - is there a way to by-pass this given that we’ve just run a similar model?

Super appreciate any help on this.

Thanks.
poisson_regression_online_learning.py (3.7 KB)

junpenglao · April 18, 2018, 4:19pm

I had a look at your code. Unfortunately, I dont see much more room to improve the speed, as the Interpolation rv always need to reinitialized, and the bottom neck in the evaluation of scipy Spline smooth function is also difficult to improve without rewriting everything in theano.

Alternatively, if you are happy with using some other distribution as an approximation, it would simplify the code a lot:

new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
output_shared = theano.shared(new_data['output'].values)
x1_shared = theano.shared(new_data['x1'].values)
    
mu0 = theano.shared(trace['alpha_0'].mean(), name='hyper_mu0')
sd0 = theano.shared(trace['alpha_0'].std(), name='hyper_sd0')
mu1 = theano.shared(trace['alpha_1'].mean(), name='hyper_mu1')
sd1 = theano.shared(trace['alpha_1'].std(), name='hyper_sd1')

with pm.Model() as update_model:
    alpha_0 = pm.StudentT('alpha_0', mu=mu0, sd=sd0, nu=1, testval=0.)
    alpha_1 = pm.StudentT('alpha_1', mu=mu1, sd=sd1, nu=1, testval=0.)
    theta = (alpha_0 + alpha_1 * x1_shared)

    likelihood = pm.Poisson(
        'output',
        mu=np.exp(theta),
        observed=output_shared,
    )
    trace = pm.sample(1000)
    traces.append(trace)
    
for _ in range(1, 10):
    new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
    output_shared.set_value(new_data['output'].values)
    x1_shared.set_value(new_data['x1'].values)
    mu0.set_value(trace['alpha_0'].mean())
    sd0.set_value(trace['alpha_0'].std())
    mu1.set_value(trace['alpha_1'].mean())
    sd1.set_value(trace['alpha_1'].std())
    
    with update_model:
        trace = pm.sample(1000)
        traces.append(trace)

9f0sdau90 · April 18, 2018, 6:03pm

Thanks for looking at the code. To be honest your code changes has dropeed runtime per update from 15s down to a few seconds - so pretty awesome speedup in any case! Shame about the other bottlenecks though - figured not much can be done if it comes from elsewhere.

mrbayes · July 2, 2019, 3:58pm

In the original code, the priors of alpha_0 and alpha_1 are given by normal distributions. When 9f0sdau90 updates the priors, s/he fits a gaussian kernel to the posterior samples. Given that, I’m just curious why you chose to use a student t distribution for the updated priors instead of a normal distribution, like (eg) this:
alpha_0 = pm.Normal(‘alpha_0’, mu=mu0, sd=sd0)
alpha_1 = pm.Normal(‘alpha_1’, mu=mu1, sd=sd1)

junpenglao · July 3, 2019, 6:02am

Using student t is more robust to outliners, as it has a heavier tail.

tomkov · February 3, 2021, 8:56pm

Is there a way of doing such update for vectorised Normal RV?

Here is code example for what i’d like to do

DRAWS = 1000
TUNE = 100

Xw1 = np.random.normal(0,25,1000)
Xw2 = np.random.normal(0,25,1000)
Xw3 = np.random.normal(0,25,1000)
n, p = 1, .5
a = 0.5
b = np.array([0.8,-0.25,0.1])
y = 1*(logistic(a+b[0]*Xw1+b[1]*Xw2+b[2]*Xw3)>0.5)

data = pd.DataFrame({
    "Xw1" : Xw1,
    "Xw2" : Xw2,
    "Xw3" : Xw3,
    "y" : y
})

labels = ["Xw1","Xw2","Xw3"]

X = data[labels].to_numpy()
y = data["y"].to_numpy()

with pm.Model() as logistic_model:
    alpha = pm.Normal("alpha",mu=0.0,sigma=10)
    betas = pm.Normal("betas", mu=0.0, sigma=10, shape=X.shape[1]) #<--- for the next training/inference i'd like set posteriors from first data points as priors to new data points 

    # set predictors as shared variable to change them for PPCs:
    predictors = pm.Data("predictors", X)
    p = pm.Deterministic("p", pm.math.invlogit(alpha + predictors@betas))

    outcome = pm.Bernoulli("outcome", p=p, observed=y)

    trace = pm.sample(draws=DRAWS,
                      tune=TUNE, 
                      cores=2,
                      return_inferencedata=True) # draw posterior samples using NUTS sampling

I’m not a fan of naming beta coefficients one by one, makes models hard to change, right?

junpenglao · February 4, 2021, 6:50am

You can try modeling betas as:

betas = pm.Normal("betas", mu=np.zeros(X.shape[1]), sigma=np.ones(X.shape[1]) * 10, shape=X.shape[1])

and replace mu sigma with value computed from posterior sample for the next batch.

tomkov · February 8, 2021, 9:35am

Thank you, it should work.

Another quick question (maybe off-topic), is pm.Normal initialised this way (with shape>1) equal to Multivariate normal with only main diagonal in pymc3?

junpenglao · February 9, 2021, 12:49pm

Yes.

Topic		Replies	Views
Poisson regression model for beginner v5 theano , modeling	5	987	June 28, 2023
Simulating new data points while having data-dependent priors Questions	0	413	May 10, 2019
Shared theano in multiple regression Questions	12	1025	February 12, 2019
Sampling is very slow when using theano.scan in pymc3 v3	1	423	July 12, 2022
Implementing a more sophisticated pymc3 model v3 theano , modeling	8	653	June 11, 2022

Performance speedup for updating posterior with new data

Related Topics