Initial evaluation result failed

I am running the below, it works for a very small dataset, but when I increase it, the initial evaluation results fail. I have tried different initialization, it works once every ten attempts.

import numpy as np
import theano.tensor as tt
import pymc3 as pm

# simulated data
np.random.seed(20090425)
n = 10
p = 50
k = 10
t = 200
lamb = np.diag(np.random.normal(10,.2,size=(k)))
W = np.random.normal(.1,.5,size=(p,k))
beta = 0.1
y =  np.zeros((n,t,p))
for i in range(n):
    for T in range(t):
        z = np.random.multivariate_normal(np.zeros(k), lamb)
        y[i,T,:] = np.squeeze(np.random.multivariate_normal(np.matmul(W,z),np.eye(p)))

def beta_spike_slab(shape,spike):
    inclusion_prop = 0.7
    beta_spike = pm.Normal('beta_spike', 0, spike, shape=shape)
    beta_slab = pm.Normal('beta_slab', 10, shape=shape)
    gamma = pm.Bernoulli('gamma', inclusion_prop, shape=shape)
    beta_spike_slab = pm.Deterministic('beta_spike_slab',(beta_spike * (1-gamma)) + ((beta_slab * gamma)))
    return beta_spike_slab

with pm.Model() as model:
    beta = pm.HalfCauchy("beta", 4)
    alpha =  pm.HalfCauchy("alpha", 4)
    
    W_t = beta_spike_slab((k,p),0.1)

    A = pm.MvNormal("A",mu = np.zeros(k) , cov= np.eye(k), shape =(n,k) )
    A = A*(A>0)
    Ad = pm.Deterministic('Ad', A * alpha)
    
    for i in range(n):
       
        z = pm.MvNormal("Z"+str(i), mu = np.zeros(k), cov = np.eye(k), shape=(t,k))
        zd = pm.Deterministic("Zd"+str(i),  pm.math.dot(z,tt.diag(tt.squeeze(Ad[i,:]))))

        obs = np.reshape(y[i,:,:],[p*t])
        y_obs = pm.MatrixNormal("y_obs"+str(i),  mu=(pm.math.dot(zd,W_t)),colcov=beta*np.eye(p),rowcov=np.eye(t),observed=obs.reshape(t, p),shape=(t,p))

with model:
    tr = pm.sample(100, tune=100)

Almost every time Z has -inf in the results. Is there something wrong with the model formulation or the coding?

It’s the model formulation. You need to make sure that the covariance matrix argument to your MvNormals is guaranteed to be a symmetric positive definite matrix. Using a multivariate normal prior on A and then using A as a covariance matrix won’t work.

Also, I can help you faster in the future if you strip down your code to the bare minimum - try to include only the essentials! For an example of how that would look, check this gist out.

Thanks a lot! I missed that part. I cleaned the code and updated the question. I have modified my code so that the sampling is non-centered distribution, I read that it can speed up the sampling. That improved the sampling rate but even with that if I am trying to scale it to n = 100 or n = 10 in this case, the sampling is extremely slow. It takes more than 3 seconds to get a sample. Am I still missing something here?

I suspect that the usage of the for-loop here is problematic - try to vectorize everything if you can. I suspect that you could rework this so that the shape of y_obs is (n, t, p) or (t, p, n) so that you could avoid instantiating n instances of the MatrixNormal object.

Also, it looks like you have a discrete latent variable in this model. Since these require a sampler appropriate for discrete variables like the binary-optimized Gibbs sampler, the samples may not converge to the posterior as rapidly as for a model with only continuous latent variables.

I was thinking of vectorizing and the problem is each n has a different value of t, I will have to keep the loop because of that. In the worst case, I will make each value of t the same. I realized that I can put any shape in the mean value of multivariate normal, so I change matrix normal to just multivariate normal and also changed discrete latent variables to continuous random variables and I also switched to ADVI.

Do you think if there is any another way to avoid instantiating n instances? I tried with n=400 and it is taking for instantiating, probably I would have to make all t values same.