Including a multivariate constrained improper prior in a hierachical model

MusicAddict · August 12, 2018, 4:16pm

icar
I am wondering about the most effective way of incorporating the above improper prior for eta in a hierarchical model as a statial random effect. The Q matrix is singular and has rank (n-1) where n is the number of data points. The rho has a prior following a Gamma(0.5, 0.005). The reason why the constraint is added onto the improper prior joint distribution is so it can be identifiable. This is what i’ve tried to do.

`with pm.Model() as m:
    .... # other parameter definitions before this line
    tau = pm.Gamma('tau', alpha=0.5, beta=0.005)
    #eta = pm.MvNormal('eta', mu=0, tau=tau*Q, shape=(1, n))
    eta = pm.DensityDist('eta', lambda value: 0.5*(n-1)*tt.log(tau)-0.5*tau* 
                                       tt.dot(tt.transpose(value),tt.dot(Q, value)) )
    #eta_c = pm.Deterministic('eta_c', eta-tt.mean(eta))
    eta = eta - tt.mean(eta)
    .... #line using the generated eta vector as an input to expression for the distribution of another parameter`

AT first it seems to run and shortly after while give this output to the console:
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [eta, tau__log__, alpha, beta] Could not pickle model, sampling singlethreaded. Sequential sampling (2 chains in 1 job) NUTS: [eta, tau__log__, alpha, beta]

A then shortly after will throw this error:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

From the output it seems to be generating log(tau) instead of tau, plus the eta posterior samples are not needed.

My question now is, what am I doing wrong in the implementation? I cant use the available pyMC3 distributions since the prior is improper.

BTW this formulation could be used as an extension to a model like the one on this link (Problem with hierachical occupancy model) where the eta is added as a spatial random effect on the x*beta expression modelling psi.

junpenglao · August 13, 2018, 5:49am

This is an error usually when you have prior that are too uninformative, which results in small gradient, you can have a look at some usually suggestion: Search results for 'Mass matrix contains zeros on the diagonal' - PyMC Discourse

MusicAddict · August 13, 2018, 9:12am

I tried different values, it doesnt seem to be helping at all. it just jams and throws the error at around 7% everytime.

Do you think I implemented the improper prior and constraint correctly given my code above? maybe I could change some things to make it work better?

junpenglao · August 13, 2018, 2:13pm

It is possible the logp is not implemented correctly (I did not check). Did you verify the implementation in numpy/scipy?

MusicAddict · August 13, 2018, 3:26pm

I removed the coefficient (tau^(n-1)/2) since it doesnt contain the variable eta. I ran the model without the line eta = eta - tt.mean(eta) which I believe enforces the constraint (sum(eta) = 0). It ran without error but the output was wrong for the beta parameter which includes the spatial random effect introduced by the improper prior. The traces were very wide or very tall.

If I included the line eta = eta - tt.mean(eta) to enforce the constraint (which I believe should fix the bad output), I get an error at around 30% which reads: ValueError: Mass matrix contains non-finite values.

Do you have any suggestion regarding an effective way of including the [ sum(eta) = 0 ] constraint without encountering errors?

junpenglao · August 13, 2018, 3:39pm

Usually you don’t include a normalization within the model block, as it makes the parameters unidentifiable. Plus in your case eta is from a zero mean distribution and multiple by a matrix which should still satisfied that mean is 0.

It would be easier if you could share your code and data in a notebook.

MusicAddict · August 13, 2018, 6:59pm

the data and code is the same as the one here (https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/discourse_1702.ipynb), except for the following changes:

class ICAR(distribution.Continuous):

    def __init__(self, tau, Q, *args, **kwargs):
        super(ICAR, self).__init__(*args, **kwargs)
        self.tau = tau
        self.Q = Q
        self.mode = 0.

    def logp(self, x):
        tau = self.tau
        Q = self.Q
        return -0.5 * tau * tt.dot(tt.transpose(x), tt.dot(Q, x)) #I removed the rho^(n-1)/2 coefficient from the likelihood

with pm.Model() as m:
    beta = pm.Normal('beta', mu=0, sd=10, shape=(2, 1)) 
    alpha = pm.Normal('alpha', mu=0, sd=10, shape=(2, 1))
    tau = pm.Gamma('tau', alpha=0.5, beta=0.005)
    eta = ICAR('eta', tau=tau, Q=Q)
    #eta = pm.DensityDist('eta', lambda value: 0.5*(n-1)*tt.log(tau)-0.5*tau* 
    #                    tt.dot(tt.transpose(value),tt.dot(Q, value)))
    x_b = pm.math.dot(X, beta)
    #constr = pm.Potential('constr', eta - tt.mean(eta))
    #eta = eta - tt.mean(eta)
    psi = pm.math.invlogit(x_b + eta).flatten() #note the eta spatial random effect added to x_b
    w_a = pm.math.dot(W_r, alpha)
    d = pm.math.invlogit(w_a)
    # z = pm.Bernoulli('z', psi, shape=n)
    # prob = z[index] * d
    prob = psi[index] * d
    obs = pm.Bernoulli('y', p=prob, observed=y_r)
    trace = pm.sample(10000, tune=1000, cores=1, chains=1) #it throws a Processor error if I dont set cores to 1.

So I ran the model without the constraint and the alpha0, alpha1 and beta1 parameters return reasonable values, but beta0 samples are completely off plus the standard error is huge, as can be seen from the above plots. Plus the tail of tau is unusually long because the mean should be very close to zero instead of being that large. I am sure correctly including the constraint somehow in the model without errors being thrown should improve the samples of beta0 (As confirmed by the direct gibbs model I built a while ago using mathematical derivations to obtain the full conditional distributions to sample from).

So what do you think can be done to improve this sampling while making sure to incorporate the constraint?

junpenglao · August 13, 2018, 7:11pm

looking at the beta_0 is seems it is just sampled from the prior. I dont have time right now, but for CAR model you can have a look at this detail doc: http://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html. Maybe you can get some inspirations.

MusicAddict · August 13, 2018, 7:59pm

Unfortunately for me I’ve already read the doc you linked and it hasn’t been of much help. CAR is fairly easy to implement since the prior for eta has a proper Normal distribution. The ICAR model im dealing with is much more of a challenge since the prior is improper, and implementing that sum-to-zero constraint is my biggest concern right now.

Thanks for your time, you can always come back and offer input when you can.

junpenglao · August 14, 2018, 5:18am

I tried to run your code but I dont have the information of Q…
There are some good discussion on this on the Stan discourse: http://discourse.mc-stan.org/t/test-soft-vs-hard-sum-to-zero-constrain-choosing-the-right-prior-for-soft-constrain/3884/9

Maybe you can try similar strategy here to put a soft constraint, using potential class (this is something that added to the model logp):

s = theano.shared(0.001)
with pm.Model() as m:
    eta = ... # the definition of your eta that is unconstrained
    pm.Potential('soft_sum', pm.Normal.dist(0, s*n).logp(eta.sum()))

MusicAddict · August 14, 2018, 1:11pm

junpenglao:

I tried to run your code but I dont have the information of Q …
There are some good discussion on this on the Stan discourse: Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain - #9 by mitzimorris - Modeling - The Stan Forums

Maybe you can try similar strategy here to put a soft constraint, using potential class (this is something that added to the model logp):
s = theano.shared(0.001)
with pm.Model() as m:
    eta = ... # the definition of your eta that is unconstrained
    pm.Potential('soft_sum', pm.Normal.dist(0, s*n).logp(eta.sum()))

Sorry, I forgot to include the code I used to generate the Q matrix, Here it is:

    A = np.random.choice([0,1], size=n*n, p=[0.995,0.005]).reshape(n, n)
    A = np.tril(A).T + np.tril(A)
    A[np.diag_indices_from(A)] = 0
    D = np.diag(A.sum(axis=1))
    Q = D - A
    Q = Q.astype(np.float64, copy=False)

I looked into the link you posted and the discussion was very interesting. I incorperated the soft-sum constraint as you suggested and it seems to have fixed my issue, atleast for the simulated dataset I used.

When I tried it on a real dataset, it provided very different estimates compared to the ICAR model I built and the Restricted Spatial Regression model output (both of which shared very similar posterior estimates). I will have to experiment with different soft-sums on the real data and see if estimation can be improved.

by the way, is there a way to run the sampler in parallel without running into this error?
BrokenProcessPool: A process in the executor was terminated abruptly while the future was running or pending.

Thank you so much for your help!

huangy22 · August 14, 2018, 9:36pm

Just want to jump in and ask a question about the unidentifiable issue regarding to including normalization within the model block. In the book “Data Analysis Using Regression and Multilevel/Hierarchical Models” Chap 19.4 and 19.5, they suggest adding “Redundant parameters” to intentionally make hierarchical models non-identifiable to speed up the convergence of the Gibbs sampler. And in this blog post( http://twiecki.github.io/blog/2017/02/08/bayesian-hierchical-non-centered/ ), a similar reparameterization trick was applied to the solve the “funnel” problem in the MC convergence. I think I’m a little fuzzy about when will these reparameterization tricks cause a problem and how to implement them properly? Thank you!

junpenglao · August 15, 2018, 4:25am

In thomas’ blog post the reparameterization is not a normalization. And I have not read the book, although speeding up Gibbs sampler is quite different than speeding up HMC, likely it is depending on different heuristics. Maybe you should open a new post to discuss this more in depth.

kappie · January 12, 2020, 8:10am

Hi,
I am trying to implement an ICAR model as well at the moment. As it seems you have fixed the issue, would you mind pasting your working code as an example? I am convinced many other users of pymc3 will benefit greatly from it.
Thank you very much!

Topic		Replies	Views
Hierarchical linear model with two parent distributions Questions	16	1300	September 15, 2018
Multinomial hierarchical regression with multiple observations per group ("Bad energy issue") Questions	22	3352	March 14, 2019
Non-convergence of Conditional Autoregressive model Questions	13	1849	August 14, 2018
Hierarchical logistic regression giving non-sensible results? Do I formulate it correctly? Questions	15	845	August 4, 2020
Bad initial energy: inf. The model might be misspecified Questions	5	3565	June 29, 2018

Including a multivariate constrained improper prior in a hierachical model

Related topics