Hierarchical model where convergence is difficult: chains look bad. Any tips?


#1

Dear community,
happy NY!

I tried to fit a CAR model with the Sparse_CAR distribution written here (https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html). However, I have problems with convergence.

I have the CAR variable and 5 other variables in the regression, which are correlated. These variables have Student-t prior and I am sampling with NUTS-advi for around 40K iterations. The chains look bad, and I have also the warning “The estimated number of effective samples is smaller than 200 for some parameters.”.

I tried to move from the Student-t prior to Normal, and then to reparametrize. Nothing changes. I tried the default sampler, and also to have ~80K iterations. Do you have any suggestion?

thank you!

PS: sorry for the general title


#2

Is rho suppose to be the correlation? it is super high… Maybe you can try some custom tuning (dense mass matrix adaptation).
There is an example here:
https://exoplanet.dfm.io/en/latest/tutorials/pymc3-extras/#custom-tuning-schedule


#3

Yes it is the spatial auto-correlation: I’m fitting the crime in micro-areas, so I suppose that they are pretty similar to near areas.

Nice! I did not know it!!! I tried but it seem rather unstable, especially because I’m full of errors of float32:

File “/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/pymc3/step_methods/hmc/integration.py”, line 28, in compute_state
raise ValueError(‘Invalid dtype. Must be %s’ % self._dtype)
ValueError: Invalid dtype. Must be float32

And I do not understand how to debug them. Any tips?
Sorry for all these questions…


#4

Difficult to say without the model and the data :frowning:
Also, you can try setting more informative prior.


#5

I tried to debug the code and the error comes from a non positive definite covariance matrix, as you can see from the error. Any tips? I here attached the code I tried.

import pandas as pd
import numpy as np
import theano.tensor as tt
import theano
import pymc3 as pm
from pymc3.distributions import distribution
import scipy
import exoplanet as xo


class Sparse_CAR(distribution.Continuous):
	"""
	Sparse Conditional Autoregressive (CAR) distribution

	Parameters
	----------
	alpha : spatial smoothing term
	W : adjacency matrix
	tau : precision at each location
	"""

	def __init__(self, alpha, W, tau, *args, **kwargs):
		self.alpha = tt.as_tensor_variable(alpha)
		self.tau = tt.as_tensor_variable(tau)
		D = W.sum(axis=0)
		n, m = W.shape
		self.n = n
		self.median = self.mode = self.mean = 0
		super(Sparse_CAR, self).__init__(*args, **kwargs)

		# eigenvalues of D^−1/2 * W * D^−1/2
		Dinv_sqrt = np.diag(1 / np.sqrt(D))
		DWD = np.matmul(np.matmul(Dinv_sqrt, W), Dinv_sqrt)
		self.lam = scipy.linalg.eigvalsh(DWD)

		# sparse representation of W
		w_sparse = scipy.sparse.csr_matrix(W)
		self.W = theano.sparse.as_sparse_variable(w_sparse)
		self.D = tt.as_tensor_variable(D)

		# Presicion Matrix (inverse of Covariance matrix)
		# d_sparse = scipy.sparse.csr_matrix(np.diag(D))
		# self.D = theano.sparse.as_sparse_variable(d_sparse)
		# self.Phi = self.tau * (self.D - self.alpha*self.W)

	def logp(self, x):
		logtau = self.n * tt.log(self.tau)
		logdet = tt.log(1 - self.alpha * self.lam).sum()

		# tau * ((phi .* D_sparse)' * phi - alpha * (phit_W * phi))
		Wx = theano.sparse.dot(self.W, x)
		tau_dot_x = self.D * x.T - self.alpha * Wx.ravel()
		logquad = self.tau * tt.dot(x.ravel(), tau_dot_x.ravel())

		# logquad = tt.dot(x.T, theano.sparse.dot(self.Phi, x)).sum()
		return 0.5*(logtau + logdet - logquad)


def main():
	print("DEVICE", theano.config.device)

	X = pd.read_csv('pymc_X.csv')
	y = pd.read_csv('pymc_y.csv', header=None).values
	weights = pd.read_csv('pymc_weights.csv').values
	features = X.columns
	print(X.shape, y.shape, weights.shape)
	# Float32
	X = X.astype('float32')
	y = y.astype('float32')
	amat = weights.astype('float32')

	sampler = xo.PyMC3Sampler()  # start=500, finish=500, window=500

	niters = 10000
	with pm.Model() as pooled_model:
		X = tt.as_tensor_variable(X)
		coeffs = []
		for f in features:
			beta = pm.StudentT(f, nu=4)
			coeffs.append(beta)

		coeffs = tt.stack(coeffs, axis=0)

		# Priors for spatial random effects
		alpha = pm.Uniform('rho', lower=0, upper=1)
		tau = pm.Gamma('tau', alpha=2., beta=2.)
		phi = Sparse_CAR('phi2', alpha, amat, tau, shape=(len(y), 1)).ravel()

		m = X.dot(coeffs) + phi

		# Mean model
		mu = pm.Deterministic('mu', tt.exp(m))
		ncrimes = pm.Poisson('ncrimes', mu=mu.ravel(), observed=y)

		# Run the burn-in and learn the mass matrix
		step_kwargs = dict(target_accept=0.9)
		sampler.tune(tune=100, regular_window=50, step_kwargs=step_kwargs)

		pooled_trace = sampler.sample(niters, njobs=2, tune=1000)

	print("RESULTS:")
	waic_result = pm.stats.waic(model=pooled_model, trace=pooled_trace[-2000:])
	print("WAIC", waic_result)


if __name__ == '__main__':
	main()

Lauched gives:

python3 mcmc_pooled3.py

DEVICE cpu
(617, 5) (617, 1) (617, 617)
Sampling 4 chains: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 308/308 [00:04<00:00, 64.60draws/s]
Sampling 4 chains: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:02<00:00, 47.90draws/s]
Traceback (most recent call last):
  File "mcmc_pooled3.py", line 109, in <module>
    main()
  File "mcmc_pooled3.py", line 99, in main
    sampler.tune(tune=100, regular_window=50, step_kwargs=step_kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 176, in tune
    **kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 158, in extend_tune
    **step_kwargs)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/exoplanet/sampling.py", line 84, in get_step_for_trace
    potential = quad.QuadPotentialFull(cov)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 434, in __init__
    self.L = scipy.linalg.cholesky(A, lower=True)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/nadai/.local/share/virtualenvs/crime-environment-aZbKKJXb/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.linalg.LinAlgError: 94-th leading minor of the array is not positive definite

The data is attached to repeat it.

I could set more informative prior but where? The covariates are Student-t in order to do some regularization. The priors of the CAR are standard. I tried also a Beta(2,2) prior on rho. No changes… Is it too difficult to solve?

Thank you for your time

pymc_y.csv (1.8 KB)
pymc_X.csv (59.3 KB)
pymc_weights.csv (1.5 MB)


#6

The error is related to the high correlation (rho). Have you tried standardizing your input X?


#7

yeah, it is already Z-score standardized :frowning: