Posterior sample from an approximation with Minibatches?


#1

If I train a model with Minibatch(es), how do I replace its/their value after approximation so that I can sample the posterior for a fixed and known subset of the data?

I think I’m missing something obvious, but am going circles to no avail. Is there an example somewhere that would cover this?


#2

After trainning, you can set the value using .set_value:

In [3]: x = pm.Minibatch(np.random.randn(100,1000))

In [4]: x.eval()
Out[4]: 
array([[-0.34829082,  0.45134496, -1.03519675, ..., -1.80109584,
         0.57008269, -0.67598918],
       [ 0.21424994,  1.34934134, -1.06525594, ..., -0.22126869,
        -0.41794257,  1.98838051],
       [ 3.36126028, -0.31755657, -0.82344513, ..., -0.96290175,
         0.27808139, -0.8185643 ],
       ...,
       [-0.02245564, -0.278961  , -0.73791364, ..., -0.45770139,
         0.53545464,  1.73343965],
       [-2.04156804, -0.72927505,  0.4760791 , ...,  0.99587714,
         1.86240101,  0.92661746],
       [ 1.9240239 , -0.75818123, -0.22159818, ..., -0.05474053,
         0.40436321, -0.68701168]])

In [5]: x.eval()
Out[5]: 
array([[-0.30117326,  0.34028205,  0.91282827, ...,  0.35780587,
        -1.12898777, -0.82772432],
       [-1.04008581, -2.16347976, -1.49979547, ..., -0.96488536,
        -0.74512995,  1.99544914],
       [-1.38044767,  0.70221734,  0.30531702, ...,  0.13047399,
        -2.17276674,  1.09855923],
       ...,
       [-0.59638083, -0.43653478,  0.93582342, ..., -0.69513752,
        -1.87323037, -0.53201399],
       [-0.95627072,  2.23213058, -0.67099718, ...,  1.0537604 ,
        -0.97095504, -0.40000886],
       [ 1.0656682 , -1.14207618, -1.41437202, ..., -0.45646884,
         0.64780939,  0.0238533 ]])

where each evaluation is different. After seting the value, now each evaluation returns the same:

In [11]: x.set_value(np.zeros((100, 128)))

In [12]: x.eval()
Out[12]: 
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

#3

Thanks for the pointer. Am I correct in understanding that your example works because they’re all zeros?

I need each row to be a different defined and tractable series so as to sample the posterior parameters on specific realisations.

An example might help.

Say the data is the realisation of a known gaussian process, but each realisation has some (normally distributed) shift. We aim to find the parameters of that shift, and wish to obtain a posterior for the shift of any individual realisation.

Data generation and model:

n_subjects = int(1e3)
n_obs = 200
cov = 2. * pm.gp.cov.RatQuad(1, 0.2, 0.1)
X = np.linspace(0, 2, n_obs)[:,None]
K = cov(X).eval()

fulldata = pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K).random(size=n_subjects) + \
           pm.Normal.dist(2, 3).random(size=(n_subjects,1))
batchsize= 128
batches = Minibatch(fulldata, batchsize)

with Model():
    mu = pm.Normal('hyper_mu', 0, 1e2)
    tau = pm.HalfCauchy('hyper_tau', 2.5)
    shifts = pm.Normal('shifts', mu, tau=tau, shape=batchsize, total_size=nobs)
    observed = batches - shifts.dimshuffle(0, 'x')
    pm.MvNormal('realisations', mu=np.zeros(K.shape[0]), cov=K, observed=observed, shape=(batchsize, n_obs), total_size=(n_subjects, n_obs))
    approximation = pm.fit(30000)

Now how do I get a posterior sample on the shift for fulldata[0,:] or even for all 128 first realisations, ie fulldata[:128,:]?


#4

You are right. Using set_value is not the right way to do. In this case please try using more_replacement and sample_node (more details here: doc), there is also more information in the Minibatch docstring.


#5

Although according to the docstring .set_value() really should work, @ferrine?


#6

Looking at more_replacement, this :

sample_shifts = approximation.sample_node(shifts, more_replacements={batches: fulldata[:128,:]}, size=100)
posteriorshifts = sample_shifts.eval()

…does return something. I’ll check if the data makes any sense and update.

Thanks!


#7

Myeah no, I’m just getting a noisy average, all the samples look identical. I can also see the hyper_tau's estimate is way too low. So, I don’t know if this is only an error during inference, or also during posterior sampling.

The above model, when run with a subset of the data (removing any total_size) works wonderfully in both NUTS and ADVI. The means of the samples on shifts have impressive precision (pearsonr of 0.94 on a subset of 128 subjects), and the hyper parameters are well recovered.

As soon as I switch to minibatch, the standard deviation of the shifts reaches near-0 levels. The mean is about ok. The fit between sampled posteriors for a sample and their real values gives a pearsonr of 0.

So I’m guessing minibatches either doesn’t play well with hierarchical models, or I’m not doing it right.


#8

I see. Yes I would say this is probably more a modelling problem. One thing you should be aware of is that Minibatch by design samples randomly with replacement, which means that you can have the same data point being selected more than once in one .eval() call during one training step. This is not a very big deal in some model, but for a spatial model and GP it would create problem - You can get around with it by using more informative prior (as you have much less information in one minibatch, your model is more likely to become degenerative).

In your case, the problem is shifts = pm.Normal('shifts', mu, tau=tau, shape=batchsize, total_size=nobs). Notice here that they are the same for each realization of the minibatch, which is not the same model when you are only using a subset of data.


#9

Yes I was getting around to it that it’s seeing those shifts as being fixed.

So, I thought I had it figured out with this:

batches = Minibatch(fulldata, 50) 

with pm.Model():
  # an informed prior
  mu = pm.Normal('shiftmu', np.mean(fulldata), 10)
  sigma = pm.HalfCauchy('sigma', 2.5)
  cov = 2. * pm.gp.cov.RatQuad(1, 0.2, 0.1)
  gp = pm.gp.Latent(cov_func=cov)
  f = gp.prior('f', X=X)
  shifts = pm.Deterministic('shifts', tt.mean(batches-f.T, axis=1))
  pm.Normal('infer', mu, sigma, observed = shifts, total_size= n_subjects)
  approximation = pm.fit()

Which would have then allowed me to sample the posterior for the shift of selected record(s) with the same method as previously

sample_shifts = approximation.sample_node(shifts,  more_replacements={batches: fulldata[:128,:]}, size=100)
posteriorshifts = sample_shifts.eval()

This time around, taking into account that, as you said, the priors here weigh heavily on the posterior, the informed unbiassed prior strongly improves the inference of the hyperparameters.

However no matter what data I use in the posterior sampling, the results correlate strongly from subset to subset, but not at all with their actual values.

So I’m deducing that even in this setup the shifts are seen as being somehow fixed, which really makes no sense since they are now the observed variable.

It can’t be that the replacement is not taking place, otherwise it would still randomly mini-batch and results would not correlate between calls with different data.

I’m confused. Could it be the Deterministic that ADVI (still) expects to have fixed values?


#10

I am a bit confused by your model. In my opinion, you should not do

    shifts = pm.Deterministic('shifts', tt.mean(batches-f.T, axis=1))
    pm.Normal('infer', mu, sigma, observed = shifts, total_size= n_subjects)

I cant quite wrap my head around why you are taking the deterministic shifts as the observed… What are you intend to do here?

Also, are you sure you want to compute the mean over axis=1? If you are modelling the shift over n_obs you should compute the mean across axis=0