Hierarchical model where convergence is difficult: chains look bad. Any tips?

Dear community,
happy NY!

I tried to fit a CAR model with the Sparse_CAR distribution written here (https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html). However, I have problems with convergence.

I have the CAR variable and 5 other variables in the regression, which are correlated. These variables have Student-t prior and I am sampling with NUTS-advi for around 40K iterations. The chains look bad, and I have also the warning “The estimated number of effective samples is smaller than 200 for some parameters.”.

I tried to move from the Student-t prior to Normal, and then to reparametrize. Nothing changes. I tried the default sampler, and also to have ~80K iterations. Do you have any suggestion?

thank you!

PS: sorry for the general title

Is rho suppose to be the correlation? it is super high… Maybe you can try some custom tuning (dense mass matrix adaptation).
There is an example here:
https://exoplanet.dfm.io/en/latest/tutorials/pymc3-extras/#custom-tuning-schedule

Yes it is the spatial auto-correlation: I’m fitting the crime in micro-areas, so I suppose that they are pretty similar to near areas.

Nice! I did not know it!!! I tried but it seem rather unstable, especially because I’m full of errors of float32:

File “/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/pymc3/step_methods/hmc/integration.py”, line 28, in compute_state
raise ValueError(‘Invalid dtype. Must be %s’ % self._dtype)
ValueError: Invalid dtype. Must be float32

And I do not understand how to debug them. Any tips?
Sorry for all these questions…

Difficult to say without the model and the data :frowning:
Also, you can try setting more informative prior.

I tried to debug the code and the error comes from a non positive definite covariance matrix, as you can see from the error. Any tips? I here attached the code I tried.

import pandas as pd
import numpy as np
import theano.tensor as tt
import theano
import pymc3 as pm
from pymc3.distributions import distribution
import scipy
import exoplanet as xo


class Sparse_CAR(distribution.Continuous):
	"""
	Sparse Conditional Autoregressive (CAR) distribution

	Parameters
	----------
	alpha : spatial smoothing term
	W : adjacency matrix
	tau : precision at each location
	"""

	def __init__(self, alpha, W, tau, *args, **kwargs):
		self.alpha = tt.as_tensor_variable(alpha)
		self.tau = tt.as_tensor_variable(tau)
		D = W.sum(axis=0)
		n, m = W.shape
		self.n = n
		self.median = self.mode = self.mean = 0
		super(Sparse_CAR, self).__init__(*args, **kwargs)

		# eigenvalues of D^−1/2 * W * D^−1/2
		Dinv_sqrt = np.diag(1 / np.sqrt(D))
		DWD = np.matmul(np.matmul(Dinv_sqrt, W), Dinv_sqrt)
		self.lam = scipy.linalg.eigvalsh(DWD)

		# sparse representation of W
		w_sparse = scipy.sparse.csr_matrix(W)
		self.W = theano.sparse.as_sparse_variable(w_sparse)
		self.D = tt.as_tensor_variable(D)

		# Presicion Matrix (inverse of Covariance matrix)
		# d_sparse = scipy.sparse.csr_matrix(np.diag(D))
		# self.D = theano.sparse.as_sparse_variable(d_sparse)
		# self.Phi = self.tau * (self.D - self.alpha*self.W)

	def logp(self, x):
		logtau = self.n * tt.log(self.tau)
		logdet = tt.log(1 - self.alpha * self.lam).sum()

		# tau * ((phi .* D_sparse)' * phi - alpha * (phit_W * phi))
		Wx = theano.sparse.dot(self.W, x)
		tau_dot_x = self.D * x.T - self.alpha * Wx.ravel()
		logquad = self.tau * tt.dot(x.ravel(), tau_dot_x.ravel())

		# logquad = tt.dot(x.T, theano.sparse.dot(self.Phi, x)).sum()
		return 0.5*(logtau + logdet - logquad)


def main():
	print("DEVICE", theano.config.device)

	X = pd.read_csv('pymc_X.csv')
	y = pd.read_csv('pymc_y.csv', header=None).values
	weights = pd.read_csv('pymc_weights.csv').values
	features = X.columns
	print(X.shape, y.shape, weights.shape)
	# Float32
	X = X.astype('float32')
	y = y.astype('float32')
	amat = weights.astype('float32')

	sampler = xo.PyMC3Sampler()  # start=500, finish=500, window=500

	niters = 10000
	with pm.Model() as pooled_model:
		X = tt.as_tensor_variable(X)
		coeffs = []
		for f in features:
			beta = pm.StudentT(f, nu=4)
			coeffs.append(beta)

		coeffs = tt.stack(coeffs, axis=0)

		# Priors for spatial random effects
		alpha = pm.Uniform('rho', lower=0, upper=1)
		tau = pm.Gamma('tau', alpha=2., beta=2.)
		phi = Sparse_CAR('phi2', alpha, amat, tau, shape=(len(y), 1)).ravel()

		m = X.dot(coeffs) + phi

		# Mean model
		mu = pm.Deterministic('mu', tt.exp(m))
		ncrimes = pm.Poisson('ncrimes', mu=mu.ravel(), observed=y)

		# Run the burn-in and learn the mass matrix
		step_kwargs = dict(target_accept=0.9)
		sampler.tune(tune=100, regular_window=50, step_kwargs=step_kwargs)

		pooled_trace = sampler.sample(niters, njobs=2, tune=1000)

	print("RESULTS:")
	waic_result = pm.stats.waic(model=pooled_model, trace=pooled_trace[-2000:])
	print("WAIC", waic_result)


if __name__ == '__main__':
	main()

Lauched gives:

python3 mcmc_pooled3.py

DEVICE cpu
(617, 5) (617, 1) (617, 617)
Sampling 4 chains: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 308/308 [00:04<00:00, 64.60draws/s]
Sampling 4 chains: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:02<00:00, 47.90draws/s]
Traceback (most recent call last):
  File "mcmc_pooled3.py", line 109, in <module>
    main()
  File "mcmc_pooled3.py", line 99, in main
    sampler.tune(tune=100, regular_window=50, step_kwargs=step_kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 176, in tune
    **kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 158, in extend_tune
    **step_kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 84, in get_step_for_trace
    potential = quad.QuadPotentialFull(cov)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 434, in __init__
    self.L = scipy.linalg.cholesky(A, lower=True)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.linalg.LinAlgError: 94-th leading minor of the array is not positive definite

The data is attached to repeat it.

I could set more informative prior but where? The covariates are Student-t in order to do some regularization. The priors of the CAR are standard. I tried also a Beta(2,2) prior on rho. No changes… Is it too difficult to solve?

Thank you for your time

pymc_y.csv (1.8 KB)
pymc_X.csv (59.3 KB)
pymc_weights.csv (1.5 MB)

The error is related to the high correlation (rho). Have you tried standardizing your input X?

yeah, it is already Z-score standardized :frowning: