Non-convergence of Conditional Autoregressive model

marcodena · August 6, 2017, 6:15pm

Dear community,
I’m trying to implement a model described in this paper http://onlinelibrary.wiley.com/doi/10.1002/1097-0258(20000915/30)19:17/18<2421::AID-SIM579>3.0.CO;2-C/full#references.

I’m using this model to fit 26 discrete variables (y), with an offset, and a spatial effect component. Refer to equation (1) of the paper and equation (7) and (8). The relative code is:

class CAR3(distribution.Continuous):
"""
Conditional Autoregressive (CAR) with 2nd order neighbors
"""

def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
    super(CAR3, self).__init__(*args, **kwargs)
    n, m = adjacency.shape
    self.n = n

    adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
    self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
    adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
    self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

    self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
    self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
    self.mean = tt.zeros(n)
    self.median = self.mean
    self.tau = tt.as_tensor_variable(tau)
    self.rho = tt.as_tensor_variable(rho)
    self.rho2 = tt.as_tensor_variable(rho2)


with open('dataset.pkl', "rb") as input_file:
    y = pickle.load(input_file)
    offset = pickle.load(input_file)
    adjacency = pickle.load(input_file)
    adjacency2 = pickle.load(input_file)

with pm.Model() as pooled_model:
    b0 = pm.Normal('intercept', mu=0, tau=0.001)

    rho = pm.Uniform('rho', 0, 1)
    rho2 = pm.Uniform('rho2', 0, 1)
    tau = pm.Gamma('tau', alpha=0.001, beta=0.001)
    mu_phi = CAR3('mu_phi', adjacency=adjacency, rho=rho, adjacency2=adjacency2, rho2=rho2, tau=tau, shape=len(y))
    phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi))

    mu = tt.exp(b0 + phi2  + tt.log(offset))

    ncrimes = pm.Poisson('ncrimes', mu=mu, observed=y)

    pooled_trace = pm.sample(3000, njobs=3, tune=5000)  # , , tune=1000

    plot_traces(pooled_trace)

My traces are very weird and I have these warnings

UserWarning: The acceptance probability in chain 1 does not match the target. It is 0.0537608781563, but should be close to 0.8. Try to increase the number of tuning steps.
UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.

Dataset: https://www.dropbox.com/s/a3h1iz02ehqbze0/dataset.pkl?dl=0

Am I doing something very wrong? I am not an expert of reparametrization: is there a way to do it? Do you have any suggestions?

I’m using advi+adapt_diag and the latest version of pymc3

aseyboldt · August 6, 2017, 6:44pm

Could you post the whole definition of CAR3? Also, pickle is not a good way to share data. If I open a malicious pickle file it could execute arbitrary code on my machine.

It looks like one of the chains is stuck for some reason. My best guess from the traces is that the high correlation of the mu_phi values are a problem for the sampler. Also, there is a very high level of uncertainty in rho and rho2, the data doesn’t seem to tell us very much about those.

On a side note, using a gamma(ε, ε) for scale parameters is usually not a good idea, although that choice is very common in older literature. It is usually better to use a HalfCauchy or HalfStudentT.

marcodena · August 7, 2017, 8:28am

I already posted the whole definition (?) here you have the main equations of the CAR effect.

Here you find the new sources:

class CAR3(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) with 2nd order neighbors
    """

    def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
        super(CAR3, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n

        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
        self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

        self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
        self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tt.as_tensor_variable(tau)
        self.rho = tt.as_tensor_variable(rho)
        self.rho2 = tt.as_tensor_variable(rho2)

    def logp(self, x):
        priorvardenom = 1 - (self.rho + self.rho2) + self.rho * self.n_neighbors + self.rho2 * self.n_neighbors2
        priorvar = self.tau * priorvardenom
    
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        Wx2 = theano.sparse.dot(self.adjacency2, x.reshape((self.n, 1)))
        mu_w = (self.rho * tt.sum(Wx, axis=1) + self.rho2*tt.sum(Wx2, axis=1)) / priorvardenom
    
        return tt.sum(continuous.Normal.dist(mu=mu_w, tau=priorvar).logp(x))

data_df = pd.read_csv('dataset.csv')
adjacency = np.genfromtxt("adjacency.csv", delimiter=",")
adjacency2 = np.genfromtxt("adjacency2.csv", delimiter=",")

y = data_df['ncrimes'].values.astype('float32')
offset = data_df['population'].astype('float32')


with pm.Model() as pooled_model:
    b0 = pm.Normal('intercept', mu=0, tau=0.001)

    rho = pm.Uniform('rho', 0, 1)
    rho2 = pm.Uniform('rho2', 0, 1)
    tau = pm.HalfCauchy('tau', 2.5)
    mu_phi = CAR3('mu_phi', adjacency=adjacency, rho=rho, adjacency2=adjacency2, rho2=rho2, tau=tau, shape=len(y))
    phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi))

    mu = tt.exp(b0 + phi2  + tt.log(offset))

    ncrimes = pm.Poisson('ncrimes', mu=mu, observed=y)

    pooled_trace = pm.sample(3000, njobs=3, tune=5000)  # , , tune=1000

    plot_traces(pooled_trace)

Thanks

adjacency2.csv (14.1 KB)
adjacency.csv (14.1 KB)
dataset.csv (608 Bytes)

aseyboldt · August 7, 2017, 8:30am

Isn’t the logp function of CAR3 missing?

marcodena · August 7, 2017, 9:15am

sorry!! I didn’t realize it :). I edited the previous post

marcodena · August 14, 2017, 10:49am

I fixed a bug in the definition of the CAR model. Here is the new source

class CAR3(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) with 2nd order neighbors
	"""

	def __init__(self, adjacency, tau, rho, adjacency2, rho2, *args, **kwargs):
		super(CAR3, self).__init__(*args, **kwargs)
		n, m = adjacency.shape
		self.n = n

		adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
		self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
		adjacency_sparse2 = scipy.sparse.csr_matrix(adjacency2)
		self.adjacency2 = theano.sparse.as_sparse_variable(adjacency_sparse2)

		self.n_neighbors = tt.as_tensor_variable(adjacency.sum(1))
		self.n_neighbors2 = tt.as_tensor_variable(adjacency2.sum(1))
		self.mean = tt.zeros(n)
		self.median = self.mean
		self.tau = tt.as_tensor_variable(tau)
		self.rho = tt.as_tensor_variable(rho)
		self.rho2 = tt.as_tensor_variable(rho2)

	def logp(self, x):
		priorvardenom = 1 - (self.rho + self.rho2) + self.rho * self.n_neighbors + self.rho2 * self.n_neighbors2
		#priorvar = self.tau * priorvardenom
		priorvar = tt.sqrt(priorvardenom/self.tau)
        
		Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
		Wx2 = theano.sparse.dot(self.adjacency2, x.reshape((self.n, 1)))
		mu_w = (self.rho * tt.sum(Wx, axis=1) + self.rho2*tt.sum(Wx2, axis=1)) / priorvardenom
        
		return tt.sum(continuous.Normal.dist(mu=mu_w, tau=priorvar).logp(x))

And the new error

Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Average Loss = 199.77:  15%|█▌        | 30391/200000 [00:15<01:24, 1995.75it/s] 
Convergence archived at 30500
Interrupted at 30,500 [15%]: Average Loss = 12,235
 99%|█████████▉| 5954/6000 [27:56<00:16,  2.83it/s]/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 2 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)
100%|█████████▉| 5999/6000 [28:08<00:00,  3.58it/s]/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)
100%|██████████| 6000/6000 [28:08<00:00,  3.55it/s]
/home/nadai/.local/lib/python3.4/site-packages/pymc3/step_methods/hmc/nuts.py:459: UserWarning: Chain 1 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
  'reparameterize.' % self._chain_id)

And the new traces

Could you help me please?

junpenglao · August 14, 2017, 11:36am

I think your centering of mu_phi in phi2 = pm.Deterministic('phi', mu_phi - tt.mean(mu_phi)) makes the model unidentifiable.

marcodena · August 14, 2017, 12:12pm

From what I know the sum to zero constraint is there to avoid the linear dependence between the intercept term and the spatial random effects…
However, if I remove it:

junpenglao · August 14, 2017, 12:18pm

I was thinking more of normalising it in the definition of logp:

clase ...
    def logp(self, x):
        ...
        mu_w = mu_w - mu_w.mean()

        return pm.Normal.dist(mu=mu_w, tau=priorvar).logp(x)

and with

     mu = tt.exp(b0 + mu_phi  + tt.log(offset))

However doing so seems to render rho unidentifiable.

marcodena · August 14, 2017, 12:42pm

In the original paper they use a quasi-likelihood based estimator, because the random effects are not independent. Might ADVI/NUTS have a problem in this?

PS: removing the intercept everything works

junpenglao · August 14, 2017, 1:21pm

Well, that’s more or less I meant about the unidentifiable parameters. Basically, in the linear model mu = tt.exp(b0 + mu_phi + tt.log(offset)), if the mu_phi is not centered on zeros, the intercept is basically distributed to b0 and the mean of mu_phi. But if you centred mu_phi as in the original code, the trace of mu_phi is not interpretable anymore, as all the the mu_phi can move around up and down together; what matter is the phi.

Both case it doesnt change your model prediction, as when you combine the linear model mu = tt.exp(b0 + mu_phi + tt.log(offset)) they will be the same. To have more identifiable you should reparameterized your model.

marcodena · August 16, 2017, 7:25am

I did not understand the last paragraph, sorry
However, I have the same problem even if I avoid the centering. Do you have suggestions and/or papers to look at? It seems that in all CAR papers I read everything is fine

Thanks!

marcodena · August 17, 2017, 5:25pm

@junpenglao Now I understood the “problem”. Thanks!!

MusicAddict · August 14, 2018, 1:00am

@marcodena did you ever find a way of incorperating the constraint on your model? I am having a similar problem implementing the ICAR model here: Including a multivariate constrained improper prior in a hierachical model

Topic		Replies	Views
Hierarchical model where convergence is difficult: chains look bad. Any tips? Questions	6	1029	January 10, 2019
Including a multivariate constrained improper prior in a hierachical model Questions	13	2455	January 12, 2020
Hierarchical logistic regression giving non-sensible results? Do I formulate it correctly? Questions	15	852	August 4, 2020
Can't get this CAR example to work v3	0	358	July 28, 2022
Sampler occasionally gets static in pymc3.1, but not in pymc3.0 Questions development	19	2201	July 5, 2018

Non-convergence of Conditional Autoregressive model

Related topics