Using multi-multivariate PDFs when defining the model's likelihood and sampling step method


#1

TL;DR:

I want to draw samples from the posterior PDF of some parameters using Metropolis, the parameters are many vectors that are statistically independent. How can I define the step method such that it draws proposals from many low dimensional normal PDFs?

My data includes a set of a set of bidimensional (plane) coordinates, an array of size (n,m,2). There is a likelihood for every 2-D coordinate, expressed as a bivaluate mean and a specific covariance. How can I define in my model many multivariate normal distributions asociated with the data?

In Extensum:

I have some data and a probabilistic model with parameters. The model involves some calculations on the data that I can’t write using Theano so I created a new Theano Op. The model depends on many parameters: an 8 dimensional vector and thirty-three 6 dimensional vectors, so in total 8+33*6=206 scalar parameters. Also I know that all this vectors are statistically independent but I expect statistical dependence between the elements of a single vector. So no inter-vector dependance but yes intra-vector dependance. When fed some input data and the parameters my model outputs the means and covariances of a set of 2 dimensional coordinates, so the likelihood is actually many 2-D gaussians.

First I wrote my own implementation of Metropolis using numpy and scipy. It works but it took a heck lot of work to test and to deal with the complexities of the model and the data. Also I would prefer to relay on a more standardised implementation. This way I can just cite a known library instead of explaning in length something that is not really central to my work.

So I came upon PyMC3, inmediatelly re-coding stuff to adapt my code to the library. It works but is amazingly slow! It takes about 40secs for a single iteration when my own implementation took about two orders of magnitude less. I propose the following sources of slowness:

1- Having to draw proposals from a 206-D gaussian. I had to stack all parameters into one long vector because I don’t know to define a step function for many variables each with it’s own covariance in PyMC3. How can I define a step function step = pm.Metropolis(....) that could then be given to pm.step function as argument? I want this step function to draw proposals from one 8-D and thirty-three 6-D normal distributions. I guess that this should be faster. Also it’s a better representation of the model I think. (This other post talks about defining a custom step function but I just want to compose a list/dictionay/whatever of 34 existing step functions).

2- Defining multi-multivariate likelihood. The probability of the data (n*m bidimensional coordinates) is evaluated as the probability of a very long vector with all the n*m*2 (33*54*2=3564) values given normal distribution in 3564 dimensions. In my own implementation I could evaluate n*m=1782 separate bidimensional gaussians. The prblem is that I have many (bivaluate) means each with it’s own covariance. I tried to follow th examples and but it results in error because when defining the likelihood as multivariate normal It only accepts a matrix of two indexes as covariance. How can I define the likelihood as many multivariate normal distributions?

3- Maybe Theano is just slow? … I don’t know.

So basically stacking everything into huge multidimensional vectors is working but I figure PyMC3 might have some way to do it more afficiently and elegantly.

The relevant part of code

class ProjectionT(theano.Op):
    '''
    here I just wrap some numeric python routine that uses numpy and others
    so that it can be called by theano
    '''
    # xInt, xExternal are sized (8,) and (33,6) respectively
    itypes = [T.dvector, T.dmatrix]
    # xyM, cM are sized (33,54,2) and (33,54,2,2) respectively
    otypes = [T.dtensor3, T.dtensor4]

    # Python implementation:
    def perform(self, node, inputs_storage, output_storage):

        Xint, Xext = inputs_storage
        xyM, cM = output_storage

        # do something with data and inputs to calculate xy, cm
        # this is something I can't port to theano because of reasons...
        xy = ...
        cm = ...

        xyM[0] = xy # write to outputs
        cM[0] = cm

    # optional:
    check_input = True


# ==== model definition
projectionModel = pm.Model()
projTheanoWrap = ProjectionT() # instance of the function written in python

with projectionModel:
    # Priors for unknown model parameters, uniform PDF between xLow and xUpp
    xAll = pm.Uniform('xAll', lower=xLow, upper=xUpp, shape=allLow.shape)

    xIn = xAll[:Ns[-1]] # reshape to 8-D vector
    xEx = xAll[Ns[-1]:].reshape((n, 6)) # reshape to n=33 6-D vectors

    xyM, cM = projTheanoWrap(xIn, xEx) # apply function written in python


    # the following is the way I found of defining the PDF of the data
    # It should be 54*33=1782 bidimensional gaussians, each with it's own
    # mean and covariance. But I stack everithing into une big vector
    mu = T.reshape(xyM, (-1,)) # the data's PDF should be centered here

    # A big covariance with smaller covariance block in the diagonal
    bigC = T.zeros((nTot, nTot))
    c3Diag = T.reshape(cM, (-1, 2, 2))  # list of 2x2 covariance blocks
    bigC = T.set_subtensor(bigC[xInxs, yInxs], c3Diag)

    # this is the PDF that the data (3564 dimensions) should follow 
    Y_obs = pm.MvNormal('Y_obs', mu=mu, cov=bigC, observed=yObs)
    # I would prefer to express this likelihood as a list of 2D gaussians


# ==== Metropolis sampling
start = {'xAll' : xAll0}

nDraws = 500
nChains = 6

Scov = allUpp - allLow # some 

with projectionModel:
    # define one big step function fo all 206 parameters
    step = pm.Metropolis(S=Scov, scaling=1e-8)
    # I would prefer a list os steps i.e.:
    # step = [pm.Metropolis(S=...) for i in range(...)]

    trace = pm.sample(draws=nDraws, step=step, start=start, njobs=nChains,
                      chains=nChains)

#2

You can try profiling your code to find out where is it slow. I think the theanoOp might be one (you said you can’t port to theano, so I guess you are calling an outside library?), the other one is the MvNormal with large Cov matrix. So to answer your question:

1, when you said “define a step function for many variables each with it’s own covariance in PyMC3”, you meant the cov of the proposal right? You can try the following:

    xIn = pm.Uniform('xAll', lower=xLow, upper=xUpp, shape=8)
    xEx = pm.Uniform('xAll', lower=xLow, upper=xUpp, shape=(n, 6))
    
    ...
    step1 = pm.Metropolis(vars=[xIn], S=Scov1)
    step2 = pm.Metropolis(vars=[xEx], S=Scov2)
    
    pm.sample(step=[step1, step2])

Few point to be careful:

  • Scov2 is a 33*6 by 33*6 cov matrix, since you would like thirty-three 6-D normal distributions, you just need to perform Kronecker product between a 33D identity matrix and a 6D cov matrix.
  • PyMC3 automatically transform parameters into the real line. So you should either scale your proposal, or do something like xIn = pm.Uniform('xAll', lower=xLow, upper=xUpp, shape=8, transform=None)

2, yes a large cov matrix is likely very slow. But since you have “n*m=1782 separate bidimensional gaussians”, you can either try the new pm.MatrixNormal distribution, or write your own logp function to compute the logp more efficiently.


#3

1, I implemented the changes and now I use two step functions just like in you code snippet. So thanks, I like it that way much more. For Scov2 I gave the standar deviations for each element, so a (33,6) matrix was enough and it works perfectly.

2, oukey, I wrote my logp function because I thought that MatrixNormal would be faster but not as fast as It couldget (and figuring out how to calculate U and V was going to be as much work as learning to write my logp function). But I’m having some issues with some tensors shapes, I’ll get it to work eventually.

3, profiling? I’ll check it out.


#4

2, I circumvented the problem of writing the logp function. Instead of having many bidimensional vectors to be evaluated in many corresponding bivaluate nomal PDFs I project all the vectors into the basis vectors that diagonalise each PDF, effectively obtaining many points that must be evaluated against a standard normal distribution (mean zero and covariance of one).


#5

linear algebra for the win :wink: