Call MvNormal.random() method inside model

Hello!

I’d like to generate a random sample of the multivariate normal distribution inside a model. The idea behind is to pass the generated sample further as observed values for some likelihood.

Code is following:

n = 10000 # sample size
m = 5 # number of covariates

model = pm.Model()

with model:
    packed_L = pm.LKJCholeskyCov('packed_L', n=m, eta=2., sd_dist=pm.HalfCauchy.dist(2.5))
    L = pm.expand_packed_triangular(m, packed_L)
    Sigma = pm.Deterministic('Sigma', L.dot(L.T))
    mu = pm.Normal('mu', mu=0., sd=1., shape=m)
    Y = pm.MvNormal('Y', mu=mu, chol=L, shape=(n,m))
    sample = Y.random()

And I got following error:

KeyError                                  Traceback (most recent call last)
<ipython-input-57-b968c007a358> in <module>()
     10     mu = pm.Normal('mu', mu=0., sd=1., shape=m)
     11     Y = pm.MvNormal('Y', mu=mu, chol=L, shape=(n,m))
---> 12     sample = Y.random()
     13 

/usr/local/lib/python2.7/site-packages/pymc3-3.5rc1-py2.7.egg/pymc3/model.pyc in __call__(self, *args, **kwargs)
     40 
     41     def __call__(self, *args, **kwargs):
---> 42         return getattr(self.obj, self.method_name)(*args, **kwargs)
     43 
     44 

/usr/local/lib/python2.7/site-packages/pymc3-3.5rc1-py2.7.egg/pymc3/distributions/multivariate.pyc in random(self, point, size)
    247             return dist.rvs(size)
    248         elif self._cov_type == 'chol':
--> 249             mu, chol = draw_values([self.mu, self.chol_cov], point=point, size=size)
    250             if mu.shape[-1] != chol[0].shape[-1]:
    251                 raise ValueError("Shapes for mu and chol don't match")

/usr/local/lib/python2.7/site-packages/pymc3-3.5rc1-py2.7.egg/pymc3/distributions/distribution.pyc in draw_values(params, point, size)
    297                 # the stack of nodes to try to draw from. We exclude the
    298                 # nodes in the `params` list.
--> 299                 stack.extend([node for node in named_nodes_parents[next_]
    300                               if node is not None and
    301                               node.name not in stored and

KeyError: packed_L

Is it conceptually possible to do like this in the framework of PyMC3?

Thanks,
Best Regards

Alex

[but why].gif

Joke aside, really why would you want to do that? Passing random generation from one of the node to downstream as observed sounds like a bad idea.

Since .random() wont be directly called in a pm.Model, if you really want you can generate it from theano random generator (already wrapped in pymc3):

import pymc3 as pm
from pymc3 import tt_rng

n = 10000 # sample size
m = 5 # number of covariates

with pm.Model() as model:
    packed_L = pm.LKJCholeskyCov('packed_L', n=m, eta=2., sd_dist=pm.HalfCauchy.dist(2.5))
    L = pm.expand_packed_triangular(m, packed_L)
    Sigma = pm.Deterministic('Sigma', L.dot(L.T))
    mu = pm.Normal('mu', mu=0., sd=1., shape=m)
    Y = pm.MvNormal('Y', mu=mu, chol=L, shape=(n,m))
    sample = mu + tt_rng().normal(size=(n,m)).dot(L)
1 Like

Thank for the quick response!
Honestly, it’s a bit hard to explain - “why?”:slight_smile: Above question was quite simplified. I’ll try to explain in more details.

I’m still struggling with a problem which I described here Dealing with imputation of left censored multivariate normal distribution.

So I’m trying to replicate well known “Imputation-Prediction” algorithm for missing values reconstruction for multivariate normal with unknown covariance and unknown mean (the last condition could be relaxed and we can assume zero means). Based on the literature, usually, people solve this problem with Gibbs sampler. And the idea is to implement the algorithm using PyMC.

Assuming that we have a very simple pattern of “missingness” - only one covariate is partly observed (my real problem has many patterns with a different combination of “missingness”).
So, I see a formulation of a problem in such way:

  1. Define RV for covariance matrix Sigma
  2. Define RV for mean vector Mu
  3. Calculate conditional means and covariances for each missing values based on the observed part using the property of MVNormal
  4. Draw random samples from that conditional distributions. That’s why I asked about random() inside a model.
  5. Substitute samples to the original dataset. (I’m not sure at all if it’s possible to do inside model)
  6. Define Likelihood with Mu, Sigma and updated observed…and define Potential to take into account censoring factor.

I’m not sure if it’s possible to implement these steps using only out-of-box PyMC features. And moreover, I’m not sure about these steps at all.

I would really appreciate any suggestions!
Thanks a lot,
Alex

Hmm, your use case does sound reasonable. For censored distribution usually we integrate the truncated tail area, but since it is difficult for multivariate distribution I think your solution is good to try.