Non-convergence of Conditional Autoregressive model

Dear community,
I’m trying to implement a model described in this paper http://onlinelibrary.wiley.com/doi/10.1002/1097-0258(20000915/30)19:17/18<2421::AID-SIM579>3.0.CO;2-C/full#references.

I’m using this model to fit 26 discrete variables (y), with an offset, and a spatial effect component. Refer to equation (1) of the paper and equation (7) and (8). The relative code is:

class CAR3(distribution.Continuous):
"""
Conditional Autoregressive (CAR) with 2nd order neighbors
"""

def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
    super(CAR3, self).__init__(*args, **kwargs)
    n, m = adjacency.shape
    self.n = n

    adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
    self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
    adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
    self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

    self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
    self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
    self.mean = tt.zeros(n)
    self.median = self.mean
    self.tau = tt.as_tensor_variable(tau)
    self.rho = tt.as_tensor_variable(rho)
    self.rho2 = tt.as_tensor_variable(rho2)


with open('dataset.pkl', "rb") as input_file:
    y = pickle.load(input_file)
    offset = pickle.load(input_file)
    adjacency = pickle.load(input_file)
    adjacency2 = pickle.load(input_file)

with pm.Model() as pooled_model:
    b0 = pm.Normal('intercept', mu=0, tau=0.001)

    rho = pm.Uniform('rho', 0, 1)
    rho2 = pm.Uniform('rho2', 0, 1)
    tau = pm.Gamma('tau', alpha=0.001, beta=0.001)
    mu_phi = CAR3('mu_phi', adjacency=adjacency, rho=rho, adjacency2=adjacency2, rho2=rho2, tau=tau, shape=len(y))
    phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi))

    mu = tt.exp(b0 + phi2  + tt.log(offset))

    ncrimes = pm.Poisson('ncrimes', mu=mu, observed=y)

    pooled_trace = pm.sample(3000, njobs=3, tune=5000)  # , , tune=1000

    plot_traces(pooled_trace)

My traces are very weird and I have these warnings

UserWarning: The acceptance probability in chain 1 does not match the target. It is 0.0537608781563, but should be close to 0.8. Try to increase the number of tuning steps.
UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.


Dataset: https://www.dropbox.com/s/a3h1iz02ehqbze0/dataset.pkl?dl=0

Am I doing something very wrong? I am not an expert of reparametrization: is there a way to do it? Do you have any suggestions?

I’m using advi+adapt_diag and the latest version of pymc3

Could you post the whole definition of CAR3? Also, pickle is not a good way to share data. If I open a malicious pickle file it could execute arbitrary code on my machine.

It looks like one of the chains is stuck for some reason. My best guess from the traces is that the high correlation of the mu_phi values are a problem for the sampler. Also, there is a very high level of uncertainty in rho and rho2, the data doesn’t seem to tell us very much about those.

On a side note, using a gamma(ε, ε) for scale parameters is usually not a good idea, although that choice is very common in older literature. It is usually better to use a HalfCauchy or HalfStudentT.

I already posted the whole definition (?) here you have the main equations of the CAR effect.

Here you find the new sources:

class CAR3(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) with 2nd order neighbors
    """

    def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
        super(CAR3, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n

        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
        self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

        self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
        self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tt.as_tensor_variable(tau)
        self.rho = tt.as_tensor_variable(rho)
        self.rho2 = tt.as_tensor_variable(rho2)

    def logp(self, x):
        priorvardenom = 1 - (self.rho + self.rho2) + self.rho * self.n_neighbors + self.rho2 * self.n_neighbors2
        priorvar = self.tau * priorvardenom
    
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        Wx2 = theano.sparse.dot(self.adjacency2, x.reshape((self.n, 1)))
        mu_w = (self.rho * tt.sum(Wx, axis=1) + self.rho2*tt.sum(Wx2, axis=1)) / priorvardenom
    
        return tt.sum(continuous.Normal.dist(mu=mu_w, tau=priorvar).logp(x))

data_df = pd.read_csv('dataset.csv')
adjacency = np.genfromtxt("adjacency.csv", delimiter=",")
adjacency2 = np.genfromtxt("adjacency2.csv", delimiter=",")

y = data_df['ncrimes'].values.astype('float32')
offset = data_df['population'].astype('float32')


with pm.Model() as pooled_model:
    b0 = pm.Normal('intercept', mu=0, tau=0.001)

    rho = pm.Uniform('rho', 0, 1)
    rho2 = pm.Uniform('rho2', 0, 1)
    tau = pm.HalfCauchy('tau', 2.5)
    mu_phi = CAR3('mu_phi', adjacency=adjacency, rho=rho, adjacency2=adjacency2, rho2=rho2, tau=tau, shape=len(y))
    phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi))

    mu = tt.exp(b0 + phi2  + tt.log(offset))

    ncrimes = pm.Poisson('ncrimes', mu=mu, observed=y)

    pooled_trace = pm.sample(3000, njobs=3, tune=5000)  # , , tune=1000

    plot_traces(pooled_trace)

Thanks

adjacency2.csv (14.1 KB)
adjacency.csv (14.1 KB)
dataset.csv (608 Bytes)

Isn’t the logp function of CAR3 missing?

sorry!! I didn’t realize it :). I edited the previous post

I fixed a bug in the definition of the CAR model. Here is the new source

class CAR3(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) with 2nd order neighbors
	"""

	def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
		super(CAR3, self).__init__(*args, **kwargs)
		n, m = adjacency.shape
		self.n = n

		adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
		self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
		adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
		self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

		self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
		self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
		self.mean = tt.zeros(n)
		self.median = self.mean
		self.tau = tt.as_tensor_variable(tau)
		self.rho = tt.as_tensor_variable(rho)
		self.rho2 = tt.as_tensor_variable(rho2)

	def logp(self, x):
		priorvardenom = 1 - (self.rho + self.rho2) + self.rho * self.n_neighbors + self.rho2 * self.n_neighbors2
		#priorvar = self.tau * priorvardenom
		priorvar = tt.sqrt(priorvardenom/self.tau)
        
		Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
		Wx2 = theano.sparse.dot(self.adjacency2, x.reshape((self.n, 1)))
		mu_w = (self.rho * tt.sum(Wx, axis=1) + self.rho2*tt.sum(Wx2, axis=1)) / priorvardenom
        
		return tt.sum(continuous.Normal.dist(mu=mu_w, tau=priorvar).logp(x))

And the new error

Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = 199.77:  15%|█▌        | 30391/200000 [00:15<01:24, 1995.75it/s] 
Convergence archived at 30500
Interrupted at 30,500 [15%]: Average Loss = 12,235
 99%|█████████▉| 5954/6000 [27:56<00:16,  2.83it/s]/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 2 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)
100%|█████████▉| 5999/6000 [28:08<00:00,  3.58it/s]/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)
100%|██████████| 6000/6000 [28:08<00:00,  3.55it/s]
/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 1 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)

And the new traces

Could you help me please?

I think your centering of mu_phi in phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi)) makes the model unidentifiable.

From what I know the sum to zero constraint is there to avoid the linear dependence between the intercept term and the spatial random effects…
However, if I remove it:

I was thinking more of normalising it in the definition of logp:

clase ...
    def logp(self, x):
        ...
        mu_w = mu_w - mu_w.mean()

        return pm.Normal.dist(mu=mu_w, tau=priorvar).logp(x)

and with

     mu = tt.exp(b0 + mu_phi  + tt.log(offset))

However doing so seems to render rho unidentifiable.

In the original paper they use a quasi-likelihood based estimator, because the random effects are not independent. Might ADVI/NUTS have a problem in this?

PS: removing the intercept everything works :confused:

Well, that’s more or less I meant about the unidentifiable parameters. Basically, in the linear model mu = tt.exp(b0 + mu_phi + tt.log(offset)), if the mu_phi is not centered on zeros, the intercept is basically distributed to b0 and the mean of mu_phi. But if you centred mu_phi as in the original code, the trace of mu_phi is not interpretable anymore, as all the the mu_phi can move around up and down together; what matter is the phi.

Both case it doesnt change your model prediction, as when you combine the linear model mu = tt.exp(b0 + mu_phi + tt.log(offset)) they will be the same. To have more identifiable you should reparameterized your model.

I did not understand the last paragraph, sorry :frowning:
However, I have the same problem even if I avoid the centering. Do you have suggestions and/or papers to look at? It seems that in all CAR papers I read everything is fine :confused:

Thanks!

@junpenglao Now I understood the “problem”. Thanks!!

1 Like

@marcodena did you ever find a way of incorperating the constraint on your model? I am having a similar problem implementing the ICAR model here: Including a multivariate constrained improper prior in a hierachical model