Mixed multivariate Gauss distribution

junpenglao · April 2, 2018, 7:08am

Did you see this error at the first sample? If so setting the init=adapt_diag should help. The jitter sometimes makes the starting point go out of support - we are currently fixing it.

Cabbagee · April 2, 2018, 7:14am

Thank you again,It didn’t appear for the first sample.and I tried

with model:
trace=pm.sample(2000,init=‘adapt_diag’,njobs=1)

but get the same error:

NUTS: [beta_logodds__, rho_interval__, sigma_w_interval__, M_log__]
0%| | 0/2500 [00:00<?, ?it/s]
… … … …some content is omitted here
File “C:\Users\NPhard\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py”, line 117, in astep
‘might be misspecified.’ % start.energy)
ValueError: Bad initial energy: inf. The model might be misspecified.

junpenglao · April 2, 2018, 7:49am

You are right, a closer look at the model and follow the diagnostic (see our FAQ):

for var in model.basic_RVs:
    print(var.name, var.logp(model.test_point))

The obs returns inf. Then the second thing is to check the input to obs:

w.tag.test_value
np.sum(w.tag.test_value)

Turns out the weight is not summed to one. A quick workaround is to normalized the weight before input into Mixture:

    w0 =stick_breaking(beta)
    w = pm.Deterministic('w', w0/w0.sum())

Otherwise you can use Dirichlet distribution for w, which use stick-breaking internally.

Cabbagee · April 2, 2018, 8:24am

Thank you very much! it works!
but set njobs=1 will affect the convergence and make my model’s performance bad，Is there a better way to do it ？

junpenglao · April 2, 2018, 8:34am

You can still sample multiple chains with njobs=1, the chains will be sampled sequentially.

The convergence problem is more related to the difficulty of doing inference of mixture model. You can do a search of the forum here - there are quite a few posts discuss the issue of mixture model. The key recommended reading is Identifying Bayesian Mixture Models (Stan case study), you can find a PyMC3 port here.

And multivariate mixture is even more difficult to do inference, as it is offen difficult to put constrain on the mixture components. A WIP notebook of the multivariate mixture could be found on my Gist.

Cabbagee · April 3, 2018, 1:55am

I tried to embed my mixed model above into a layered model ，The specific code is as follows：

import numpy as np
import pymc3 as pm
from theano import tensor as tt

K = 6        # the number of mixed components
N = 200      # the number of observed individuals
n= 5         # each individual was observed 5 times

e=np.ones(n)
I=np.eye(n)
mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data
C1 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        if i == j:
            C1[i, j] = 10.0
        else:
            C1[i, j] = 7.0

dataSet1 = np.random.multivariate_normal(mu0, C1, size=N)

# observation time(all individuals are observed at the same times)
time_obseved = [-4.0, -1.0, 0.0, 1.5, 2.0]
time = np.array(time_obseved)

H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

def stick_breaking(beta):
    portion_remaining = tt.concatenate(
        [[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remaining


with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)

    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w0 =stick_breaking(beta)
    w = pm.Deterministic('w', w0/w0.sum())

    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    omega = pm.Mixture('omega', w, compdist,shape=N)     #n dimensional random variable omega obeys a multicomponent mixed distribution 

    mu=pm.Normal('mu',mu=0.0,sd=1.0)
    sigma=pm.Uniform('sigma',0.0,1.0)
    xi=pm.InverseGamma('xi',1.0,1.0)
    b=pm.Normal('b',mu=0.0,sd=tt.sqrt(xi),shape=N)

    mu2=[]
    for i in range(N):
        mu2.append((mu+b[i])*e+omega[i])

    y=pm.MvNormal('y',mu=mu2,cov=sigma**2*I,observed=dataSet1)  #y_i~N(mu=(mu+b_i)*e+omega_i,cov=sigma**2*I)

with model:
    trace=pm.sample(1000,njobs=1)

but I get the following error:

ValueError: operands could not be broadcast together with shapes (200,) (5,)

junpenglao · April 3, 2018, 6:50am

Shape error is the most challenging thing in the beginning.
Since in the code above, it works fine with the observed shape, here you can similarly do:

omega = pm.Mixture('omega', w, compdist, shape=dataSet1.shape)

Also, using a for-loop to create mu2 by appending is inefficient, doing matrix multiplication is usually much better:

# notice the shape chage here for the multiplication to work.
b = pm.Normal('b', mu=0.0, sd=tt.sqrt(xi), shape=(N, 1)) 

mu2 = (mu + b) * e + omega

Finally, since the likelihood is a MvNormal with a identity covariance matrix, replace it with a Normal will be much more efficient:

y = pm.Normal(
        'y', mu=mu2, sd=sigma**2,
        observed=dataSet1)  #y_i~N(mu=(mu+b_i)*e+omega_i,cov=sigma**2*I)

junpenglao · April 3, 2018, 6:54am

I will expect you to have quite a lot of difficulty inferencing this model. Using a mixture distribution as a latent RV will be challenging as the model is likely unidentifiable.
There are a few places you should consider improve:
1, Priors, I guess you are trying to replicate another model, but Uniform prior (or inverse gamma for that matter) are not good for sigma
2, mu2 = (mu + b) * e + omega but both mu and b is gaussian distribution, you should consider combining the two into one gaussian.

Cabbagee · April 3, 2018, 9:20am

Thank you very much for your help and advice，because my theoretical deduction proves that using such a priori and latent variables will make Bias model simulation more accurate，so I uesd them.and I’ll improve them for the code to work.
In addition, in order to adapt to the model, I modified the code about variable omega

with pm.Model() as model:
… … …
omega = pm.Mixture(‘omega’, w, compdist)
b = pm.Normal(‘b’, mu=0.0, sd=tt.sqrt(xi), shape=(N, 1))
mu2 = (mu + b) * e + omega
y = pm.Normal(‘y’, mu=mu2, sd=sigma,observed=dataSet1)

In other words，y_i~N(mu=(mu+b_i)*e+omega,cov=sigma**2*I) ，but why is the following error:

TypeError: For compute_test_value, one input test value does not have the requested type.
The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (5,).

junpenglao · April 3, 2018, 9:47am

I think it makes more sense to do this as well - ie having one latent vector for all subject reduce the number of latent parameters and also helps with the model inference.
However, you still need to be careful of the shape (this part is not very automatic still and we are still working on that). Try omega = pm.Mixture('omega', w, compdist, shape=(1, n))

Cabbagee · April 4, 2018, 1:26am

You are so smart !
when compiling code, I am always confused by the shape of variables, which is so different from the expression in pure theory.thank you for checking and answering my doubts so patiently.

Cabbagee · April 4, 2018, 8:35am

I have a question about how to improve model sampling speed. I sampled the model I mentioned above, and the speed is as follows：

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Sequential sampling (2 chains in 1 job)
NUTS: [b, xi_log__, sigma_interval__, mu, omega, beta_logodds__, rho_interval__, sigma_w_interval__, M_log__]
1%| | 11/1500 [00:35<1:20:52, 3.26s/it]

   3.26s/it

It’s too slow !!! where is the problem? and the An effective sample is less than 25%,

junpenglao · April 4, 2018, 8:58am

The problem is that you have too many free parameters and too little information from your data/prior. As I said above, mixture models are difficult to identify, so you need to be very careful of the model assumption.

You should start with:

0, use simulation data. Simulate datasets with known parameters.
1, not use mixture. start with a single MvNormal latent (even a MvNormal with diagonal cov ie a Normal distribution)
2, fix some of the parameters. for example, fix the sigma to a constant and see whether you can improve the convergence.
3, improve your prior. See https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
4, profile your model. See http://docs.pymc.io/notebooks/profiling.html
5, use other inference methods, which is not recommended but sometimes could give you some insight - also if you know what you are doing you can design your custom inference. However, it is useful to keep the The folk theorem of statistical computing in mind - your model is slow most likely is because there is something wrong with it.

Cabbagee · April 10, 2018, 7:59am

I try to avoid using uniform distribution priori,and improve them ,the specific changes are as follows:

with pm.Model() as model:
    
    M = pm.Gamma('M', 1., 1.)
    
    sigma_rho= pm.Normal('sigma_w', mu=np.array([2.0,0.5]), sd=0.5,shape=(K,2))
    #rho =pm.Uniform('rho',0.0,1.0,shape=K)
    
    w = pm.Dirichlet('w', np.ones(K))
    
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_rho[i,0]**2 * tt.pow(sigma_rho[i,1], H0)))

    omega = pm.Mixture('omega', w, compdist,shape=(1,n))
    
    mu=pm.Normal('mu',mu=0.0,sd=1.0)
    xi=pm.InverseGamma('xi',1.0,1.0)
    b = pm.Normal('b', mu=0.0, sd=tt.sqrt(xi), shape=(N, 1)) 
    
    mu2 = (mu + b) * e + omega
    
  
    y = pm.Normal('y', mu=mu2, sd=2.0,observed=dataSet1)
    
    with model:
    trace=pm.sample(2000,njobs=1)

I met such a mistake,the first chain was not sampled, and the second chain quickly stopped sampling.and an error occurred:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [b, xi_log__, mu, omega, w_stickbreaking__, sigma_w, M_log__]
  0%|          | 0/2500 [00:00<?, ?it/s]  C:\Users\Administrator\Anaconda3\lib\site-packages\numpy\core\numeric.py:492: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)
  4%|▍         | 103/2500 [00:19<07:44,  5.16it/s]
Traceback (most recent call last):
......  ... ..
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

What’s wrong with it? can you give me some specific suggestions for revision?

junpenglao · April 10, 2018, 8:17am

There are some discussion about mass matrix zero problem here: Model diagnostics for "Mass matrix contains zeros on the diagonal"

In short, you have some numerical overflow problem, likely some parameter are too large - I would try changing the prior for xi (Try xi = HalfNormal('xi', 1.) and sd=xi in b) or fixing xi to a constant.

Cabbagee · April 10, 2018, 8:32am

Ehh,set xi = HalfNormal('xi', 1.) and sd=xi in b,or xi=constant,there’s the same mistake as above,

junpenglao · April 10, 2018, 9:30am

Well, check other part of your model eg sigma_rho[i,0]**2 * tt.pow(sigma_rho[i,1], H0). I would try to simulate in numpy of sigm_rho and plug it into cov=sigma_rho[i,0]**2 * tt.pow(sigma_rho[i,1], H0) to check how large/small cov can get.

Otherwise, try reparameterized the cov using its cholesky component, which usually numerically more stable.

Cabbagee · April 11, 2018, 7:51am

When the covariance is known, how to use the Cholesky decomposition to get the lower triangular matrix？In my model, the covariance is as follows：

with pm.Model() as model:
 ...  ... ....
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
        mu=mu0, cov=sigma_w[i]**2*tt.pow(rho[i], H0)))

how can I to specify the cholesky factor of the covariance instead cov,to make numerically more stable.

junpenglao · April 11, 2018, 8:32am

Actually, are you sure you want to scale the whole cov matrix with sigma_w**2, and not just the diagonal?

I think the key is to find the cholesky factor or eign decomposition of \rho^{H_0}, but I dont know how as well (declaimer: not a mathametician)

Cabbagee · April 12, 2018, 7:58am

Oh, I’m sorry, I asked a bad question. I didn’t know that theano didn’t contain such a matrix processing method.

Topic		Replies	Views
Multivariate Gaussian Processes Questions	2	560	July 27, 2020
Mixture of Linear models Questions	3	1186	July 21, 2018
Modelling a Mixture Model modeling	1	88	June 19, 2024
Several seconds per iteration with GPU Questions	5	731	June 9, 2018
2D Gaussian Mixture Model Questions	7	2068	August 25, 2017

Mixed multivariate Gauss distribution

Related topics