Matrix operations involving observed and latent variables

Richard_Border · March 9, 2022, 2:00am

Hi all,

Long time Stan user giving PyMC3 a shot. I’m trying to wrap my head around implementing a custom log likelihood that involves matrix operations with observed data and latent variables. Specifically, I don’t understand how to do math with random variables and TensorVariables.

Here’s an dumb model that illustrates my confusion (this is not the model I actually want to implement but illustrates where I’m getting tripped up). The generative model is

vg, ve ~ priors
b | vg ~ MVN(0, C*vg)
y | b, ve ~ MVN(Xb, I * ve)

where C is a known m-by-m matrix and b is a vector of latent effects. I simulate these data via:

import numpy as np
import pymc3 as pm
from pymc3 import math as pmm

n=200
m=100
vg=.4
ve=.6
X = np.random.multivariate_normal(np.zeros(m), np.eye(m), size=n)
u = .1
C = (np.eye(m) + np.outer(u,u))
S = C*vg/m
beta = np.random.multivariate_normal(np.zeros(m), S)
e = np.random.multivariate_normal(np.zeros(n), np.eye(n)*ve, size=1)
y = X @ beta + e

I have no trouble implementing this using the built-in MVN functions. Here’s a working PyMC3 implementation:

with pm.Model() as lmm:
    ## theta
    vg  = pm.HalfCauchy(name='vg', beta=1.0);
    ve  = pm.HalfCauchy(name='ve', beta=1.0);
    
    ## b | theta 
    b = pm.MvNormal(name='b',mu=np.zeros(m),cov=C*vg/m,shape=(1,m))
    linpred = b @ X.T

    ## y | Xb, theta 
    loglik = pm.Normal(name="loglik", mu=linpred, sigma=ve, observed=y, shape =n)

Even though b is a latent vector and X is some observed data, there’s no problem computing linpred. However, when I try to implement the MVN lpdf manually, I run into TypeError: unsupported operand type(s) for @: 'TensorVariable' and 'MultiObservedRV'. Here’s my broken attempt to implement this manually:

with pm.Model() as lmm2:
    ## theta
    vg  = pm.HalfCauchy(name='vg', beta=1.0);
    ve  = pm.HalfCauchy(name='ve', beta=1.0);
    
    ## b | theta
    def mvn_lpdf(C):
        cov=C*vg/m 
        return -(pmm.logdet(cov)+ pmm.dot(pmm.matrix_inverse(cov) @ b,b))
    b = pm.DensityDist(name='b',logp=mvn_lpdf,observed={"C":C})
    linpred = b @ X.T

    ## y | Xb, theta 
    loglik = pm.Normal(name="loglik", mu=linpred, sigma=ve, observed=y, shape =n)

What’s causing the type errors in the return -(pmm.logdet(cov)+ pmm.dot(pmm.matrix_inverse(cov) @ b,b)) line?

Thanks,
Richard

cluhmann · March 10, 2022, 1:21am

Welcome!

I can’t get your example to run at all because the b in that same line is undefined. what am I missing?

Richard_Border · March 10, 2022, 8:08pm

Quick note:

in this line both bs should be b.T but I don’t appear to be able to edit this.

To answer your question, my understanding of PyMC3 is worse than I realized! I didn’t know that a variable defined in the context of one model lmm is within scope for a separate model lmm2. Since I was running lmm first, I didn’t even get the undefined error, so I’ve been trying to debug the wrong thing.

b is supposed to be a latent variable with density mvn_lpdf. I would like model lmm2 to be the same as lmm. It’s not clear to me how I define a latent variable in terms of its density. FWIW, in Stan, I’d construct a function mvn_lpdf(b,C, vg), declare b as a vector parameter, and then call target+=mvn_lpdf( b | C, vg) in the model block.

cluhmann · March 12, 2022, 7:24pm

You can check out the documentation for DensityDist here and the developers guide here for some info about the signature of the logp function passed when creating a DensityDist object.

I believe that the following is a working model that is roughly halfway between the two models you presented. It uses a DensityDist to define a custom RV, but instead of hand-coding the logp(), I just used the logp() from pm.MvNormal. But hopefully that illustrates how to use the DensityDist. Now you just need to replace the logp() with your own custom function. The random argument is used during prior predictive sampling and may not be needed in your case (not 100% on that).

with pm.Model():
    ## theta
    vg  = pm.HalfCauchy(name='vg', beta=1.0);
    ve  = pm.HalfCauchy(name='ve', beta=1.0);

    mvnormdist = pm.MvNormal.dist(mu=np.zeros(m),
                                  cov=C*vg/m,
                                  shape=(1,m))
    b = pm.DensityDist(
        'b',
        logp=mvnormdist.logp,
        random = mvnormdist.random,
        shape=(1,m)
    )
    linpred = b @ X.T

    ## y | Xb, theta 
    loglik = pm.Normal(name="loglik", mu=linpred, sigma=ve, observed=y, shape=n)

OriolAbril · March 13, 2022, 10:41am

The main difference between the two original codes is the use of the observed kwarg. There are two things to be aware of. DensityDist objects are very flexible and powerful and at the same time (or because of that) are tricky to use.

Observed variables don’t take different values for each draw while sampling. You said you have used Stan. Stan does the same. When using the ~ notation in Stan, both observed and latent variables are declared with the x ~ distribution() syntax, and being observed or not depends on whether x is defined in the data block or not. Like in PyMC, you can use x at any other point in the model too, but if it’s observed its values will be constant. Your first b definition sets a multivariate normal prior given a mu and a covariance, the second doesn’t.

In PyMC <4.0, DensityDist can take a single array as observed or can take a dictionary. When taking a dictionary it doesn’t matter if the dictionary has a single item, the generated object is this MultiObservedRV object which has potentially multiple observed variables in it (that are also completely arbitrary) and therefore can’t be used for any linear algebra operation.

Bonus note. DensityDist is generally used for observed variables, that only need the logp term to sample from the posterior. To still be able to do forward sampling (i.e. sample from the prior, prior predictive or posterior predictive distributions) you need to also use the random kwarg as @cluhmann showed in his example which also avoids using the observed kwarg and is the recommended way to use DensityDist.

Richard_Border · March 14, 2022, 6:07pm

Thanks @cluhmann and @OriolAbril for the help so far, but I’m still having difficulty seeing how to define a custom density. Using a built in logpdf such as mvnormdist.logp in @cluhmann 's example works fine, but my reason for wanting to use DensityDist() is to implement a distribution that isn’t built in. The only example of a custom lpdf I can find in the documentation is the exponential survival distribution, but this is a simple univariate density. How can I implement a lpdf that involves matrix operations?

Here’s a simpler example that hopefully helps get my question across. First I generate the data

import numpy as np
import pymc3 as pm
from pymc3 import math as pmm

n=200 ## sample size
m=100 ## number of random effects
var_beta=.5/np.sqrt(m) ## variance of  latent random effects
var_resid=.5 ## residual variance

## design matrix
X = np.random.multivariate_normal(np.zeros(m), np.eye(m), size=n)

## latent random effects
beta = np.random.multivariate_normal(np.zeros(m), np.eye(m)*var_beta, size=1).T

## residuals
e = np.random.multivariate_normal(np.zeros(n), np.eye(n)*var_resid, size=1).T

## outcome
y = X @ beta + e

I can estimate the variance components using the builtin MVN lpdf without issue:

with pm.Model() as lmm:
    ## priors on variance components
    var_beta_hat = pm.HalfCauchy(name='var_beta_hat', beta=1.0);
    var_resid_hat  = pm.HalfCauchy(name='var_resid_hat', beta=1.0);
    
    ## distribution of latent random effects beta given variance components
    mvnormdist = pm.MvNormal.dist(mu=np.zeros(m),
                                  cov=np.eye(m)*var_beta_hat/m,
                                  shape=(1,m))
    ## DensityDist with built in MVN logp
    beta_hat = pm.DensityDist('beta_hat', logp=mvnormdist.logp, shape=(1,m))

    ## Do some linear algebra with the latent variable
    linear_predictor = beta_hat @ X.T
    
    ## distribution of outcome given everything else
    y_hat = pm.Normal(name="y_hat", mu=linear_predictor, sigma=var_resid_hat, observed=y, shape=n)

However, I’m unable to figure out how to code the density (-.5*(log det C + b^T C^-1 b + m log 2 pi)) by hand:

with pm.Model() as lmm2:
    ## priors on variance components
    var_beta_hat = pm.HalfCauchy(name='var_beta_hat', beta=1.0);
    var_resid_hat  = pm.HalfCauchy(name='var_resid_hat', beta=1.0);
    
    ###########  What can I do to make this work without using a built lpdf
    ## distribution of latent random effects beta given variance components
    def custom_logp(beta_hat):
        mu_beta = np.zeros(m) ## mean of latent variable
        cov_beta = np.eye(m)*var_beta_hat2 ## variance of latent variable
        ## components of MVN lpdf
        tmp = np.linalg.solve(cov_beta, beta_hat)
        qform = tmp @ beta_hat
        ldet = np.log(np.linalg.det(cov_beta))
        return -.5*(qform + ldet + m * log(2*np.pi)
    ###########
    
    ## DensityDist with built in MVN logp
    beta_hat = pm.DensityDist('beta_hat', logp=custom_logp, shape=(1,m))

    ## Do some linear algebra with the latent variable
    linear_predictor =  beta_hat.T @ X
    
    ## distribution of outcome given everything else
    y_hat = pm.Normal(name="y_hat", mu=linear_predictor, sigma=var_resid_hat, observed=y, shape=n)

How can I rewrite the emphasized block of code above to make this work? Again, I explicitly want to avoid using the built-in MVN lpdf as, though it works for this toy example, my intended application involves a niche distribution that isn’t part of PyMC. Thanks again

OriolAbril · March 14, 2022, 7:38pm

Doesn’t the custom_logp you have (but using theano and pymc.math operations) work already?

I have modified your 2nd model to this:

with pm.Model() as lmm2:
    ## priors on variance components
    var_beta_hat = pm.HalfCauchy(name='var_beta_hat', beta=1.0);
    var_resid_hat  = pm.HalfCauchy(name='var_resid_hat', beta=1.0);
    
    ###########  What can I do to make this work without using a built lpdf
    ## distribution of latent random effects beta given variance components
    def custom_logp(beta_hat):
        cov_beta = tt.eye(m)*var_beta_hat ## variance of latent variable
        ## components of MVN lpdf
        qform = pmm.dot(beta_hat, pmm.dot(pmm.matrix_inverse(cov_beta), beta_hat.T))
        ldet = tt.log(pmm.det(cov_beta))
        return -.5*(qform + ldet + m * tt.log(2*np.pi))
    ###########
    
    ## DensityDist with built in MVN logp
    beta_hat = pm.DensityDist('beta_hat', logp=custom_logp, shape=(1,m))

    ## Do some linear algebra with the latent variable
    linear_predictor =  pmm.dot(X, beta_hat.T)
    
    ## distribution of outcome given everything else
    y_hat = pm.Normal(name="y_hat", mu=linear_predictor, sigma=var_resid_hat, observed=y, shape=n)

which compiles and seems to sample. I am not sure I have converted everything right, my theano knowledge is very basic (I suppose there is some solve there too, but I don’t know about it so I haven’t used it) and it looks quite slow (not sure what could be done about this though).

cluhmann · March 14, 2022, 8:06pm

I understand that you don’t want to use one of the built-in distributions. My intention was to simply illustrate how your logp() function would need to operate. You can also just check out the logp() method of pymc’s MVNormal class (defined here) if you want to see how that works.

Richard_Border · March 14, 2022, 10:07pm

@OriolAbril Aha I see–now I understand your first comment about observed being the problem. This is very helpful, I should be able to implement my desired model. And, re:the model running slowly, that’s not too surprising given the way I wrote the model. E.g., it explicitly constructs a diagonal matrix and inverts it, doesn’t use the Cholesky used by pm.MvNormal, explicitly computes the determinant of an isotropic matrix, etc. So hopefully I should be able to get something speedier in my application.

@cluhmann Apologies, I hope I didn’t sound combative at all, your recommendations, especially pointing me toward the dev documentation, are greatly appreciated!

Richard_Border · March 14, 2022, 10:20pm

And just for completeness, here’s a working version of the second model with a more efficient representation of the custom lpdf (a speed up of 2 orders of magnitiude relative to my naive use of pm.MvNormal in lmm above):

import pymc3 as pm
from pymc3 import math as pmm
from theano import tensor as tt

with pm.Model() as lmm3:
    ## priors on variance components
    var_beta_hat = pm.HalfCauchy(name='var_beta_hat', beta=1.0);
    var_resid_hat  = pm.HalfCauchy(name='var_resid_hat', beta=1.0);
    
    ## distribution of latent random effects beta given variance components
    def custom_logp(beta_hat):
        qform = pmm.dot(beta_hat.T,beta_hat)/var_beta_hat
        ldet = m*tt.log(var_beta_hat)
        return -.5*(qform + ldet + m * tt.log(2*np.pi))
    ###########
    
    ## DensityDist with built in MVN logp
    beta_hat = pm.DensityDist('beta_hat', logp=custom_logp, shape=(m,1))

    ## Do some linear algebra with the latent variable
    linear_predictor =  pmm.dot(X, beta_hat)
    
    ## distribution of outcome given everything else
    y_hat = pm.Normal(name="y_hat", mu=linear_predictor, sigma=var_resid_hat, observed=y, shape=n)

Topic		Replies	Views
Custom Likelihood with Loops Questions	0	561	August 10, 2018
GP Latent implmenentation in PyMC version agnostic gaussian_process	1	13	February 22, 2025
Comparing Models with Marginalized Latent Discrete Variables in PyMC v5 modeling	2	161	March 19, 2024
Observed Deterministic Questions	12	9637	June 19, 2022
Tensorflow probability help: Irrelevant to PyMC3 Questions	2	519	October 20, 2020

Matrix operations involving observed and latent variables

Related topics