Poor sample result and slow sampling speed on Multivariate normal distribution

qipengchen · April 14, 2022, 1:48pm

Hi! I’m estimating a longitudinal item response theory model, and the speed of sampling is very slow.
My model is as follows:

\theta_{tn} \sim MN(\mu_{\theta}, \Sigma_\theta)

logit(P_{tni})= \alpha_i\theta_{tn}+\beta_{i}

Y_{tni} \sim Bernoulli(P_{tni})

Y_{tni} is the response of student n(1,…,N) on item i(1,…,I),time point t(1,…,T).

For identify of the model, I need to constrain the mean and the variance of first time point. So I’m constrain the first element of the mean vector to be 0, and the first element of the covariance matrix to be 1. like that:

\mu_\theta = (0,\mu_{\theta2},...,\mu_{\theta T})

And disassemble the covariance matrix as diagonal standard deviation matrix S and correlation matrix \Omega ,then constrain the first element of diagonal standard deviation matrix as 1.

\mathbf\Sigma = \textbf S \mathbf\Omega \textbf S

\textbf S = \left(\begin{array}{l} 1\\ & \sigma_{\theta 2}\\ & & \ddots& \\ & && \sigma_{\theta T} \end{array}\right)

\mathbf\Omega= \left(\begin{array}{l} 1 &\rho_{12}& \cdots & \rho_{1T}\\ \rho_{12} &1 &\cdots &\rho_{2T} \\ \vdots &\vdots &\ddots &\vdots\\ \rho_{1T}& \rho_{2T}&\cdots & 1 \end{array}\right)\\

The observation variable Y is generated by the following code

T = 5
N = 300
I = 10
theta_mean = np.zeros(T)
theta_cov = [[1.    , 0.9   , 0.81  , 0.729 , 0.6561],
            [0.9   , 1.    , 0.9   , 0.81  , 0.729 ],
            [0.81  , 0.9   , 1.    , 0.9   , 0.81  ],
            [0.729 , 0.81  , 0.9   , 1.    , 0.9   ],
            [0.6561, 0.729 , 0.81  , 0.9   , 1.    ]]
theta = np.random.multivariate_normal(theta_mean,theta_cov,size=(N))
beta = np.random.normal(0,1,(1,1,I))
alpha = np.random.normal(1,0.15,(1,1,I))
logit_p = alpha*theta.T.reshape(T,N,1)+beta
Y = np.random.binomial(1,np.exp(logit_p)/(1+np.exp(logit_p)))

Pymc3 model code. LKJcorr as the prior of correlation matrix.

triu_idx = np.triu_indices(T,k=1)
with pm.Model() as long_irt:
    corr_array = pm.LKJCorr("corr", n=T, eta=1.0)
    cor_tri = tt.set_subtensor(tt.zeros((T,T))[triu_idx],corr_array)
    corr_mat = cor_tri + tt.eye(5) + cor_tri.T 
    # The standard deviation of theta, constrain sigma[0,0]=1
    sigma = tt.concatenate([[1],pm.HalfNormal("s",1,shape=(T-1))]) 
    sigma_mat = tt.diag(sigma)
    cov = pm.Deterministic("cov",(sigma_mat).dot(corr_mat).dot(sigma_mat))
    # mean of MvNormal, constrain theta_mu[0]=0
    theta_mu = tt.concatenate([[0],pm.Normal("theta_mu",0,10,shape=(T-1))])
    
    theta = pm.MvNormal("theta",theta_mu,cov=cov,shape=(N,T))
    
    alpha_mu = pm.HalfNormal("alpha_mu",5)
    beta_mu = pm.Normal("beta_mu",0,5)
    beta_sigma = pm.HalfNormal("beta_sigma",5)
    alpha_sigma = pm.HalfNormal("alpha_sigma",5)
    alpha = pm.Lognormal("alpha",alpha_mu,alpha_sigma,shape=(1,1,I))
    beta = pm.Normal("beta",beta_mu,beta_sigma,shape=(1,1,I))
    
    pm.Bernoulli("Y",logit_p=theta.T.reshape((T,N,1))*alpha+beta,observed=Y)

With this pymc3 model, I got poor sample result and slow sampling speed.

But, the model seems work well when I setting the independent normal distribution as the priors of theta. I also noticed that the tutorials setting prior like this.Fitting a Multidimensional Item-Response Theory Model in Stan

with pm.Model() as long_irt:
    theta = pm.Normal("theta",0,1,shape=(N,T))

    alpha_mu = pm.HalfNormal("alpha_mu",5)
    beta_mu = pm.Normal("beta_mu",0,5)
    beta_sigma = pm.HalfNormal("beta_sigma",5)
    alpha_sigma = pm.HalfNormal("alpha_sigma",5)
    alpha = pm.Lognormal("alpha",alpha_mu,alpha_sigma,shape=(1,1,I))
    beta = pm.Normal("beta",beta_mu,beta_sigma,shape=(1,1,I))
    
    pm.Bernoulli("Y",logit_p=theta.T.reshape((T,N,1))*alpha+beta,observed=Y)

However, I need to modeling correlation of the theta by covariance matrix structure. Why MvNormal sampling so slow and how can I get valid samples from multivariate normal distribution in this model?

chartl · April 14, 2022, 6:50pm

I have a strong suspicion that this all stems from the use of a centered parameterization. What happens if instead of using a MvNormal you did something like:

L_mat = cor_tri + tt.diag(sigma)
theta_offsets = pm.normal('theta_offsets', 0, 1, shape=(T, N))
theta = pm.Deterministic('theta', theta_mu + tt.dot(L_mat, theta_offsets))

which will have row-covariance LL^T = \Sigma and row means \theta_\mu.

qipengchen · April 15, 2022, 12:39pm

Thanks, It helps me a lot! I never thought that multivariate distribution could use non-centered parameterization.
I would like to know more about non-centered parameterization, is there any recommended reference material?

jessegrabowski · April 16, 2022, 3:26am

The STAN user guide includes an excellent discussion on reparameterization, including centered/non-centered. This blog post by Michael Betancourt might also interest you, it touches on reparameterization in a broader discussion of model identification, sampler degeneracy, and what to do when NUTS is going slow or diverging.

For PyMC specific, the case study on multilevel modeling example notebook has examples of centered and non-centered implementations for the univariate and multivariate cases. I got a lot out of this notebook, but it’s a bit hidden!

Topic		Replies	Views
Multivariate Time Series very slow with sampling Questions	6	1121	December 14, 2019
Sampling from multilevel multivariable regression model is very slow Questions	15	1380	May 19, 2020
Slow speed of NUTS in Hierarchical model Questions	14	2034	October 20, 2020
Using a single parameter covariance matrix Questions	1	441	February 8, 2019
Hierarchical Model - Slow Sampling ( 1.8 sec per draw ) - Primary suspect is how the multi-level hierarchy is coded Questions	5	834	November 2, 2021

Poor sample result and slow sampling speed on Multivariate normal distribution

Related topics