Poor sample result and slow sampling speed on Multivariate normal distribution

Hi! I’m estimating a longitudinal item response theory model, and the speed of sampling is very slow.
My model is as follows:

\theta_{tn} \sim MN(\mu_{\theta}, \Sigma_\theta)
logit(P_{tni})= \alpha_i\theta_{tn}+\beta_{i}
Y_{tni} \sim Bernoulli(P_{tni})

Y_{tni} is the response of student n(1,…,N) on item i(1,…,I),time point t(1,…,T).

For identify of the model, I need to constrain the mean and the variance of first time point. So I’m constrain the first element of the mean vector to be 0, and the first element of the covariance matrix to be 1. like that:

\mu_\theta = (0,\mu_{\theta2},...,\mu_{\theta T})

And disassemble the covariance matrix as diagonal standard deviation matrix S and correlation matrix \Omega ,then constrain the first element of diagonal standard deviation matrix as 1.

\mathbf\Sigma = \textbf S \mathbf\Omega \textbf S
\textbf S = \left(\begin{array}{l} 1\\ & \sigma_{\theta 2}\\ & & \ddots& \\ & && \sigma_{\theta T} \end{array}\right)
\mathbf\Omega= \left(\begin{array}{l} 1 &\rho_{12}& \cdots & \rho_{1T}\\ \rho_{12} &1 &\cdots &\rho_{2T} \\ \vdots &\vdots &\ddots &\vdots\\ \rho_{1T}& \rho_{2T}&\cdots & 1 \end{array}\right)\\

The observation variable Y is generated by the following code

T = 5
N = 300
I = 10
theta_mean = np.zeros(T)
theta_cov = [[1.    , 0.9   , 0.81  , 0.729 , 0.6561],
            [0.9   , 1.    , 0.9   , 0.81  , 0.729 ],
            [0.81  , 0.9   , 1.    , 0.9   , 0.81  ],
            [0.729 , 0.81  , 0.9   , 1.    , 0.9   ],
            [0.6561, 0.729 , 0.81  , 0.9   , 1.    ]]
theta = np.random.multivariate_normal(theta_mean,theta_cov,size=(N))
beta = np.random.normal(0,1,(1,1,I))
alpha = np.random.normal(1,0.15,(1,1,I))
logit_p = alpha*theta.T.reshape(T,N,1)+beta
Y = np.random.binomial(1,np.exp(logit_p)/(1+np.exp(logit_p)))

Pymc3 model code. LKJcorr as the prior of correlation matrix.

triu_idx = np.triu_indices(T,k=1)
with pm.Model() as long_irt:
    corr_array = pm.LKJCorr("corr", n=T, eta=1.0)
    cor_tri = tt.set_subtensor(tt.zeros((T,T))[triu_idx],corr_array)
    corr_mat = cor_tri + tt.eye(5) + cor_tri.T 
    # The standard deviation of theta, constrain sigma[0,0]=1
    sigma = tt.concatenate([[1],pm.HalfNormal("s",1,shape=(T-1))]) 
    sigma_mat = tt.diag(sigma)
    cov = pm.Deterministic("cov",(sigma_mat).dot(corr_mat).dot(sigma_mat))
    # mean of MvNormal, constrain theta_mu[0]=0
    theta_mu = tt.concatenate([[0],pm.Normal("theta_mu",0,10,shape=(T-1))])
    
    theta = pm.MvNormal("theta",theta_mu,cov=cov,shape=(N,T))
    
    alpha_mu = pm.HalfNormal("alpha_mu",5)
    beta_mu = pm.Normal("beta_mu",0,5)
    beta_sigma = pm.HalfNormal("beta_sigma",5)
    alpha_sigma = pm.HalfNormal("alpha_sigma",5)
    alpha = pm.Lognormal("alpha",alpha_mu,alpha_sigma,shape=(1,1,I))
    beta = pm.Normal("beta",beta_mu,beta_sigma,shape=(1,1,I))
    
    pm.Bernoulli("Y",logit_p=theta.T.reshape((T,N,1))*alpha+beta,observed=Y)

With this pymc3 model, I got poor sample result and slow sampling speed.

But, the model seems work well when I setting the independent normal distribution as the priors of theta. I also noticed that the tutorials setting prior like this.Fitting a Multidimensional Item-Response Theory Model in Stan

with pm.Model() as long_irt:
    theta = pm.Normal("theta",0,1,shape=(N,T))

    alpha_mu = pm.HalfNormal("alpha_mu",5)
    beta_mu = pm.Normal("beta_mu",0,5)
    beta_sigma = pm.HalfNormal("beta_sigma",5)
    alpha_sigma = pm.HalfNormal("alpha_sigma",5)
    alpha = pm.Lognormal("alpha",alpha_mu,alpha_sigma,shape=(1,1,I))
    beta = pm.Normal("beta",beta_mu,beta_sigma,shape=(1,1,I))
    
    pm.Bernoulli("Y",logit_p=theta.T.reshape((T,N,1))*alpha+beta,observed=Y)

However, I need to modeling correlation of the theta by covariance matrix structure. Why MvNormal sampling so slow and how can I get valid samples from multivariate normal distribution in this model?

I have a strong suspicion that this all stems from the use of a centered parameterization. What happens if instead of using a MvNormal you did something like:

L_mat = cor_tri + tt.diag(sigma)
theta_offsets = pm.normal('theta_offsets', 0, 1, shape=(T, N))
theta = pm.Deterministic('theta', theta_mu + tt.dot(L_mat, theta_offsets))

which will have row-covariance LL^T = \Sigma and row means \theta_\mu.

3 Likes

Thanks, It helps me a lot! I never thought that multivariate distribution could use non-centered parameterization.
I would like to know more about non-centered parameterization, is there any recommended reference material?

The STAN user guide includes an excellent discussion on reparameterization, including centered/non-centered. This blog post by Michael Betancourt might also interest you, it touches on reparameterization in a broader discussion of model identification, sampler degeneracy, and what to do when NUTS is going slow or diverging.

For PyMC specific, the case study on multilevel modeling example notebook has examples of centered and non-centered implementations for the univariate and multivariate cases. I got a lot out of this notebook, but it’s a bit hidden!

2 Likes