Sampler initiation stuck

Hello,

I’m inferring latent parameters of a generalized linear mixed model. The data is a fraction of a categorical random field, which is dependent on three latent Gaussian random fields. I was following this example notebook. The sampler seems stuck at the initial position and does not move, while the CPU utilisation goes up to 100%.

Sampling: [choice, coeffs1, coeffs2, coeffs3, sof_x, sof_z, stddev]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]

Below is a reproducible code snippet. I’m running the script in Spyder on a Windows10 PC.

import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt

#%%
def generate_cov(stddev, coords, sof):
    cov = np.zeros([coords.shape[0], coords.shape[0]])
    distance = np.zeros([coords.shape[0], coords.shape[0], 2])
    for i, coord in enumerate(coords):
        distance[i, :, :] = coords - coord
    cov = stddev**2 * np.exp(-np.pi*(distance[:, :, 0]**2/sof[1]**2+distance[:, :, 1]**2/sof[0]**2))
    return cov

#%% Synthetic data
domain_x, domain_z = 80, 30
dx, dz = 1, 1
x_coords, z_coords = np.arange(0, domain_x, dx), np.arange(0, domain_z, dz)
xpos, zpos = np.meshgrid(x_coords, z_coords)
xpos, zpos = np.atleast_2d(xpos.ravel()).T, np.atleast_2d(zpos.ravel()).T
Coords = np.hstack([xpos, zpos])

nxpoints, nzpoints = x_coords.size, z_coords.size
ncells = nxpoints*nzpoints

dd = np.zeros([ncells, 4])+100
dd[:, 0] = np.tile(x_coords, z_coords.size)
dd[:, 2] = np.repeat(z_coords, x_coords.size)
dd[10::x_coords.size, -1] = [0]*2+[1]*17+[2]*4+[100]*7       
dd[40::x_coords.size, -1] = [0]*7+[1]*8+[2]*13+[100]*2
dd[70::x_coords.size, -1] = [0]*5+[1]*12+[2]*6+[100]*7
obs_idx = np.array(np.where(dd[:, -1]!=100))

data = pd.DataFrame(dd, columns =['X', 'Y', 'Dep', 'Code'])
data = data.astype({'Code': int})

#%% Model
K = 3 
n_kl = 100
Beta = np.zeros(3)
sof_x_mu, sof_x_std = 30, 5
sof_z_mu, sof_z_std = [5,5,5], [1,1,1]

observed1 = np.array(data[data.Code!=100].Code, dtype=int)
coords1 = {'choices': ['type1', 'type2', 'type3'], 'obs': range(observed1.size),}
with pm.Model(coords=coords1) as model:
    
    sof_x = pm.Normal('sof_x', sof_x_mu, sof_x_std, shape=K)
    sof_z = pm.Normal('sof_z', sof_z_mu, sof_z_std, shape=K)
    stddev = pm.HalfCauchy('stddev', beta=1)
    
    cov1 = generate_cov(stddev, Coords, sof=[sof_z[0], sof_x[0]])
    cov2 = generate_cov(stddev, Coords, sof=[sof_z[1], sof_x[1]])
    cov3 = generate_cov(stddev, Coords, sof=[sof_z[2], sof_x[2]])
    
    # K-L expansion of covariance matrices
    eigvals1, eigvecs1 = pt.linalg.eigh(cov1)    # eigvals is in ascending order, keep the last n_kl columns of eigvecs
    eigvals2, eigvecs2 = pt.linalg.eigh(cov2) 
    eigvals3, eigvecs3 = pt.linalg.eigh(cov3)  
    eigvals1 = eigvals1[-n_kl:]
    eigvals2 = eigvals2[-n_kl:]
    eigvals3 = eigvals3[-n_kl:]
    eigvecs1_scaled = eigvecs1[:, -n_kl:] * np.sqrt(eigvals1)  
    eigvecs2_scaled = eigvecs2[:, -n_kl:] * np.sqrt(eigvals2)                               
    eigvecs3_scaled = eigvecs3[:, -n_kl:] * np.sqrt(eigvals3)
    
    # sample proposal K-L coefficients
    coeffs1 = pm.Normal('coeffs1', 0, 1, shape=n_kl)
    coeffs2 = pm.Normal('coeffs2', 0, 1, shape=n_kl)
    coeffs3 = pm.Normal('coeffs3', 0, 1, shape=n_kl)
    
    # construct latent utility fields based on sampeld prior parameters and KL coeffs
    u1 = pt.sum(eigvecs1_scaled * coeffs1, axis=1) + Beta[0]
    u2 = pt.sum(eigvecs2_scaled * coeffs2, axis=1) + Beta[1]
    u3 = pt.sum(eigvecs3_scaled * coeffs3, axis=1) + Beta[2]
    s = pm.math.stack([u1, u2, u3]).T
    s1 = s[obs_idx].squeeze()
    p_ = pm.Deterministic('p', pm.math.softmax(s1, axis=1), dims=('obs', 'choices'))
    
    # likelihood
    choice = pm.Categorical('choice', p=p_, observed=observed1, dims='obs')
    
    # sampling
    idata = pm.sample_prior_predictive(cores=1)
    idata.extend(pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True}, random_seed=101))
    idata.extend(pm.sample_posterior_predictive(idata))

The model uses K-L expansion to simulate Gaussian random fields, for which I need to calculate covariance matrices and then factorize them to get eigenvalues and eigenvectors.

I’ve tried the suggestion in here by setting cores=1 in pm.sample(), but the sampling process seems still stuck.

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Sequential sampling (2 chains in 1 job)
NUTS: [sof_x, sof_z, stddev, coeffs1, coeffs2, coeffs3]
Sampling chain 0, 0 divergences ---------------------------------- 0% -:–:–

I suspect your computational graph is too large/inefficient and compilation as well as function evaluation takes a prohibitively long time.

My first guess would be to do away with those python loops, and try to think of a way to achieve the same with numpy-like vectorization or if not possible, use a Pytensor scan, which defines symbolic loops without growing the graph in the number of iterations

You also need to be careful about assuming the outputs of eigh are sorted, I think I found they weren’t necessarily here. You can also see some comments about eigenvalue problems in that thread. They’re non-trivial for gradient-based samplers to work with.

Thanks a lot for your suggestion! I’ve replaced the for loop in the generate_cov function with vectorized calculations and it now looks like this. The issue remains though. The initiation stage still seems to take an infinitely long time.

def generate_cov(stddev, coords, sof):
     distance = coords[:, np.newaxis, :] - coords[np.newaxis, :, :]
     squared_norms = np.square(distance[:, :, 0]) / sof[1]**2 + np.square(distance[:, :, 1]) / sof[0]**2
     cov = stddev**2 * np.exp(-np.pi * squared_norms)
     return cov

I tried a simplified model by ignoring the dependencies of p on sof_x, sof_z, and stddev. Pymc sampler works perfectly for this simplified model.

Capture

However, whenever I sample from the full model, the initiation issue appears. I wonder would matrix factorization cause difficulties for calculating gradients or grow the computation graph in an inefficient way?
Capture1

One thing that jumps immediately is that you have way more parameters than observations, I suspect your model is hard to sample regardless of how fast the logp evaluation is

Thank you for bringing this to my attention. As I only need the eigenvectors corresponding to the largest eigenvalues, is there any existing linear algebra OP does the same thing as scipy.sparse.linalg.eigsh so that I can specify the desired number of eigenvalues and eigenvectors to compute. In this way, I don’t need to calculate and discard unnecessary ones.

That’s true but I’m using synthetic data here, so actually I can use as many observations as I want. The problem is even if I tried setting the shape of coeffs to 5, the sampler stuck as the initiation stage anyways. :smiling_face_with_tear:

You can wrap arbitrary scipy code in a pytensor Op for use in PyMC models. The wrinkle will be to compute gradients :slight_smile: