Sampler initiation stuck

JuJuGo · April 26, 2024, 11:59am

Hello,

I’m inferring latent parameters of a generalized linear mixed model. The data is a fraction of a categorical random field, which is dependent on three latent Gaussian random fields. I was following this example notebook. The sampler seems stuck at the initial position and does not move, while the CPU utilisation goes up to 100%.

Sampling: [choice, coeffs1, coeffs2, coeffs3, sof_x, sof_z, stddev]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]
  0%|          | 0/2000 [00:00<?, ?it/s]

Below is a reproducible code snippet. I’m running the script in Spyder on a Windows10 PC.

import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt

#%%
def generate_cov(stddev, coords, sof):
    cov = np.zeros([coords.shape[0], coords.shape[0]])
    distance = np.zeros([coords.shape[0], coords.shape[0], 2])
    for i, coord in enumerate(coords):
        distance[i, :, :] = coords - coord
    cov = stddev**2 * np.exp(-np.pi*(distance[:, :, 0]**2/sof[1]**2+distance[:, :, 1]**2/sof[0]**2))
    return cov

#%% Synthetic data
domain_x, domain_z = 80, 30
dx, dz = 1, 1
x_coords, z_coords = np.arange(0, domain_x, dx), np.arange(0, domain_z, dz)
xpos, zpos = np.meshgrid(x_coords, z_coords)
xpos, zpos = np.atleast_2d(xpos.ravel()).T, np.atleast_2d(zpos.ravel()).T
Coords = np.hstack([xpos, zpos])

nxpoints, nzpoints = x_coords.size, z_coords.size
ncells = nxpoints*nzpoints

dd = np.zeros([ncells, 4])+100
dd[:, 0] = np.tile(x_coords, z_coords.size)
dd[:, 2] = np.repeat(z_coords, x_coords.size)
dd[10::x_coords.size, -1] = [0]*2+[1]*17+[2]*4+[100]*7       
dd[40::x_coords.size, -1] = [0]*7+[1]*8+[2]*13+[100]*2
dd[70::x_coords.size, -1] = [0]*5+[1]*12+[2]*6+[100]*7
obs_idx = np.array(np.where(dd[:, -1]!=100))

data = pd.DataFrame(dd, columns =['X', 'Y', 'Dep', 'Code'])
data = data.astype({'Code': int})

#%% Model
K = 3 
n_kl = 100
Beta = np.zeros(3)
sof_x_mu, sof_x_std = 30, 5
sof_z_mu, sof_z_std = [5,5,5], [1,1,1]

observed1 = np.array(data[data.Code!=100].Code, dtype=int)
coords1 = {'choices': ['type1', 'type2', 'type3'], 'obs': range(observed1.size),}
with pm.Model(coords=coords1) as model:
    
    sof_x = pm.Normal('sof_x', sof_x_mu, sof_x_std, shape=K)
    sof_z = pm.Normal('sof_z', sof_z_mu, sof_z_std, shape=K)
    stddev = pm.HalfCauchy('stddev', beta=1)
    
    cov1 = generate_cov(stddev, Coords, sof=[sof_z[0], sof_x[0]])
    cov2 = generate_cov(stddev, Coords, sof=[sof_z[1], sof_x[1]])
    cov3 = generate_cov(stddev, Coords, sof=[sof_z[2], sof_x[2]])
    
    # K-L expansion of covariance matrices
    eigvals1, eigvecs1 = pt.linalg.eigh(cov1)    # eigvals is in ascending order, keep the last n_kl columns of eigvecs
    eigvals2, eigvecs2 = pt.linalg.eigh(cov2) 
    eigvals3, eigvecs3 = pt.linalg.eigh(cov3)  
    eigvals1 = eigvals1[-n_kl:]
    eigvals2 = eigvals2[-n_kl:]
    eigvals3 = eigvals3[-n_kl:]
    eigvecs1_scaled = eigvecs1[:, -n_kl:] * np.sqrt(eigvals1)  
    eigvecs2_scaled = eigvecs2[:, -n_kl:] * np.sqrt(eigvals2)                               
    eigvecs3_scaled = eigvecs3[:, -n_kl:] * np.sqrt(eigvals3)
    
    # sample proposal K-L coefficients
    coeffs1 = pm.Normal('coeffs1', 0, 1, shape=n_kl)
    coeffs2 = pm.Normal('coeffs2', 0, 1, shape=n_kl)
    coeffs3 = pm.Normal('coeffs3', 0, 1, shape=n_kl)
    
    # construct latent utility fields based on sampeld prior parameters and KL coeffs
    u1 = pt.sum(eigvecs1_scaled * coeffs1, axis=1) + Beta[0]
    u2 = pt.sum(eigvecs2_scaled * coeffs2, axis=1) + Beta[1]
    u3 = pt.sum(eigvecs3_scaled * coeffs3, axis=1) + Beta[2]
    s = pm.math.stack([u1, u2, u3]).T
    s1 = s[obs_idx].squeeze()
    p_ = pm.Deterministic('p', pm.math.softmax(s1, axis=1), dims=('obs', 'choices'))
    
    # likelihood
    choice = pm.Categorical('choice', p=p_, observed=observed1, dims='obs')
    
    # sampling
    idata = pm.sample_prior_predictive(cores=1)
    idata.extend(pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True}, random_seed=101))
    idata.extend(pm.sample_posterior_predictive(idata))

The model uses K-L expansion to simulate Gaussian random fields, for which I need to calculate covariance matrices and then factorize them to get eigenvalues and eigenvectors.

I’ve tried the suggestion in here by setting cores=1 in pm.sample(), but the sampling process seems still stuck.

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Sequential sampling (2 chains in 1 job)
NUTS: [sof_x, sof_z, stddev, coeffs1, coeffs2, coeffs3]
Sampling chain 0, 0 divergences ---------------------------------- 0% -:–:–

ricardoV94 · April 26, 2024, 3:35pm

I suspect your computational graph is too large/inefficient and compilation as well as function evaluation takes a prohibitively long time.

My first guess would be to do away with those python loops, and try to think of a way to achieve the same with numpy-like vectorization or if not possible, use a Pytensor scan, which defines symbolic loops without growing the graph in the number of iterations

jessegrabowski · April 26, 2024, 7:08pm

You also need to be careful about assuming the outputs of eigh are sorted, I think I found they weren’t necessarily here. You can also see some comments about eigenvalue problems in that thread. They’re non-trivial for gradient-based samplers to work with.

JuJuGo · April 29, 2024, 4:22pm

Thanks a lot for your suggestion! I’ve replaced the for loop in the generate_cov function with vectorized calculations and it now looks like this. The issue remains though. The initiation stage still seems to take an infinitely long time.

def generate_cov(stddev, coords, sof):
     distance = coords[:, np.newaxis, :] - coords[np.newaxis, :, :]
     squared_norms = np.square(distance[:, :, 0]) / sof[1]**2 + np.square(distance[:, :, 1]) / sof[0]**2
     cov = stddev**2 * np.exp(-np.pi * squared_norms)
     return cov

I tried a simplified model by ignoring the dependencies of p on sof_x, sof_z, and stddev. Pymc sampler works perfectly for this simplified model.

However, whenever I sample from the full model, the initiation issue appears. I wonder would matrix factorization cause difficulties for calculating gradients or grow the computation graph in an inefficient way?
Capture1

ricardoV94 · April 29, 2024, 4:50pm

One thing that jumps immediately is that you have way more parameters than observations, I suspect your model is hard to sample regardless of how fast the logp evaluation is

JuJuGo · April 29, 2024, 4:58pm

Thank you for bringing this to my attention. As I only need the eigenvectors corresponding to the largest eigenvalues, is there any existing linear algebra OP does the same thing as scipy.sparse.linalg.eigsh so that I can specify the desired number of eigenvalues and eigenvectors to compute. In this way, I don’t need to calculate and discard unnecessary ones.

JuJuGo · April 29, 2024, 6:40pm

That’s true but I’m using synthetic data here, so actually I can use as many observations as I want. The problem is even if I tried setting the shape of coeffs to 5, the sampler stuck as the initiation stage anyways.

jessegrabowski · April 30, 2024, 1:47am

You can wrap arbitrary scipy code in a pytensor Op for use in PyMC models. The wrinkle will be to compute gradients

Topic		Replies	Views
Sampler gets stuck in hierarchical logistic regression modeling	4	70	April 30, 2025
Pm.sample gets stuck after init with cores > 1 Questions	17	3933	January 4, 2021
[SOLVED] Need help debugging a simple GLM spline model v5 bug , modeling	1	387	July 17, 2023
Problems with Categorical and JAX v5 bug	2	339	December 20, 2022
Code Review - 1 out of 4 chains stuck development , bug , modeling , pytensor	0	376	July 17, 2023

Sampler initiation stuck

Related topics