Implementing a new distribution (loop over x and y in logp)

aryaf · November 1, 2018, 3:31pm

Is there any way to use Theano.scan inside logp? I am trying to loop over observed variable x and the error covariance which are both fixed and have the same shape[0]. But I cannot make this work.

mu = tt.vector("mu")
cov = tt.dmatrix("cov")
covErr = tt.dmatrix("covErr")
x = tt.dmatrix("x")

def map_func(i, x, mu, cov, covErr):
    return multivariate.MvNormal.dist(mu=mu, cov=cov+covErr[i]).logp(x[i])

result, updates = theano.scan(fn=map_func,
                              sequences=[tt.arange(x.shape[0])],
                              n_steps=x.shape[0],
                              non_sequences=[x, mu, cov, covErr])

final_result = tt.sum(result)

calculate_logp = theano.function(inputs=[x, mu, cov, covErr], outputs=final_result, updates=updates)

class MvNormalwErr2(pm.distributions.Continuous):
    
    def __init__(self, mu, cov, covErr, *args, **kwargs): 
        super(MvNormalwErr2, self).__init__(*args, **kwargs)
        self.mu = mu = tt.as_tensor_variable(mu)
        self.cov = cov = tt.as_tensor_variable(cov)
        self.covErr = covErr = tt.as_tensor_variable(covErr)

        self.mean = self.median = self.mode = self.mu = self.mu
                    
    def logp(self, x):
        return calculate_logp(x, self.mu, self.cov, self.covErr)

PS: I have tried something like the following in my model, but it is super slow, I am trying to get to the speed of MvNormal. For 200 observation I have a hard time to run it, so it is not scalable to 100k observations.

with pm.Model() as model3:
     packed_L = pm.LKJCholeskyCov('packed_L', n=2,
                             eta=2., sd_dist=pm.HalfCauchy.dist(2.5))

     L = pm.expand_packed_triangular(2, packed_L)
     sig = pm.Deterministic('sig', L.dot(L.T))
     mu = pm.Normal('mu', 0., 10., shape=2,
              testval=x.mean(axis=0))
     #obs = MvNormalwErr2('obs', mu=mu, cov=sig, covErr=covErr, observed=x)
     obs = []
     for i in range(x.shape[0]):
          obs.append(pm.MvNormal('obs'+str(i), mu=mu, cov=sig+covErr[i], observed=x[i]))
     obs = tt.stack(obs)

junpenglao · November 2, 2018, 5:49pm

IIUC, you have two loops: for each slice of x, you loop over a covErr which is some error matrix/scaler?
I think it is going to be slow in any case if there are a lot of elements in covErr. But looking at your code below, seems covErr and x has the same number in the dimension you want to loop over? If that’s the case you can build it in one theano.scan loop

Topic		Replies	Views
Using theano scan with pymc3 Questions	3	875	December 4, 2020
Data probability calculation Questions theano , bug	0	518	March 1, 2021
Defining custom multivariate Distribution with gradients Questions theano	5	1590	August 24, 2018
Create theano custom operation with [tt.dmatrix, tt.dmatrix] input Questions	3	908	May 6, 2019
Custom distribution for Truncated Gamma v3 theano , modeling	8	900	August 24, 2024

Implementing a new distribution (loop over x and y in logp)

Related topics