Mixed multivariate Gauss distribution

Cabbagee · March 27, 2018, 2:47am

Hello,everyone.
I’m trying to carry out a sampling of the multiple Gauss mixed model，The distribution of variables is as follows(Mixed multivariate Gauss distribution)：

y_i=∑(k=1)^K{ω_kN(O,σ_k^2 H(ρ_k ))} ----multivariate Gauss distribution mixture_
ω_k=β_k∏(j=1)^(k-1){1-β_j } ----weights,stick-breaking process_
β_i iid Beta(1,M)
σ_k iid U(0,1.5) prior
ρ_k iid U(0,1) prior

I did the following the code but it is not working：

import numpy as np
import numpy as np
import pymc3 as pm
from theano import tensor as tt

K = 6        # the number of components
N = 200  # the number of observed individuals
n = 9  # each individual was observed 9 times

mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data

C1 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        if i == j:
            C1[i, j] = 10.0
        else:
            C1[i, j] = 7.0

dataSet1 = np.random.multivariate_normal(mu0, C1, size=N)

# observation time(all individuals are observed at the same times)

time_obseved = [-5.0, -4.5, -3.0, -2.5, -1.0, 0.0, 1.5, 2.5, 3.0, 4.0, 5.5]
time = np.array(time_obseved)

#I missed it before, and I'm mending it now
def H(rho):
    H=np.zeros((n,n))
    for i in range(0,n):
        for j in range(0,n):
            H[i,j]=np.power(rho,np.abs(time[j]-time[i]))
    return H


# stick-breaking process
def stick_breaking(beta):
    portion_remaining = tt.concatenate(
        [[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remaining


with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    obs = pm.Mixture('obs', w, pm.MvNormal.dist(
        mu=mu0, cov=sigma_w**2 * H(rho)), observed=dataSet1)

I get the following error:

    runfile('F:/spyder/Mixed multivariate Gauss distribution.py')
    WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
    C:\Users\Administrator\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
      warnings.warn("DeprecationWarning: there is no c++ compiler."
    WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
    WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
    C:\Users\Administrator\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Traceback (most recent call last):

    File "<ipython-input-1-a62ff05957e7>", line 1, in <module>
    runfile('F:/spyder/Mixed multivariate Gauss distribution.py')

    File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

    File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

    File "F:/spyder/Mixed multivariate Gauss distribution.py", line 75, in <module>
    obs=pm.Mixture('obs',w,pm.MvNormal.dist(mu=mu0,cov=sigma_w**2*H(rho)),observed=dataSet1)

    File "F:/spyder/Mixed multivariate Gauss distribution.py", line 54, in H
    H[i,j]=np.power(rho,np.abs(time[j]-time[i]))

    ValueError: setting an array element with a sequence.

How could I make this work?

junpenglao · March 27, 2018, 5:06am

There seems to be some information missing: what is the H in H(rho)?

Cabbagee · March 27, 2018, 5:25am

Oh, I missed it.

def H(rho):
    H=np.zeros((n,n))
    for i in range(0,n):
        for j in range(0,n):
             H[i,j]=np.power(rho,np.abs(time[j]-time[i]))
    return H

It is used to construct covariance matrix of mixed components.

junpenglao · March 27, 2018, 6:05am

OK I see, and this is where the error comes from. You need to rewrite the function H into a theano function. Something like below should work:

# instead of computing H(rho), do most of the computation outside of theano
H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    # MvNormal only accept one cov at a time, you need to construct 
    # each component one by one
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

You can find a similar example discussed in this post: Properly sampling mixture models

Cabbagee · March 27, 2018, 6:30am

Thank you very much for answering my doubts，I just tried the modified code, and there was such a mistake,It’s related to the dimension I set up

runfile('F:/spyder/Mixed multivariate Gauss distribution.py')
Traceback (most recent call last):

File "<ipython-input-2-a62ff05957e7>", line 1, in <module>
runfile('F:/spyder/Mixed multivariate Gauss distribution.py')

File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "F:/spyder/Mixed multivariate Gauss distribution.py", line 75, in <module>
obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 36, in __new__
dist = cls.dist(*args, **kwargs)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 47, in dist
dist.__init__(*args, **kwargs)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 93, in __init__
comp_mode_logps = self.logp(comp_modes)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 141, in logp
return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
          
File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 112, in _comp_logp
return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 112, in <listcomp>
return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\multivariate.py", line 270, in logp
quaddist, logdet, ok = self._quaddist(value)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\multivariate.py", line 89, in _quaddist
delta = value - mu

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\tensor\var.py", line 147, in __sub__
return theano.tensor.basic.sub(self, other)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\gof\op.py", line 674, in __call__
required = thunk()

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\tensor\elemwise.py", line 790, in perform
variables = ufunc(*ufunc_args, **ufunc_kwargs)

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)

I want to realize multiple observations of multiple individuals，These individuals are independent, but their distribution is not the same,and each individual, observed several times.

junpenglao · March 27, 2018, 6:38am

In this case you should build separate models for each individual.

Cabbagee · March 27, 2018, 7:01am

Thank you for your advice，But I’m still confused about why the revised code is wrong with the dimensions,It is a multicomponent Gauss mixture
why not work?

import numpy as np
import pymc3 as pm
from theano import tensor as tt

K=6       # the number of mixed components
N=200     # number of samples
n=9       #each sample point is multiple，9 dimensional
mu0=np.linspace(0.0,0.0,num=n)

#simulate observation data
C1=np.zeros((n,n))
for i in range(0,n):
    for j in range(0,n):
        if i==j:
             C1[i,j]=10.0
        else:
             C1[i,j]=7.0

dataSet1=np.random.multivariate_normal(mu0,C1,size=N)


#observation time(all individuals are observed at the same times)
time_obseved=[-5.0,-4.5,-3.0,-2.5,-1.0,0.0,1.5,2.5,3.0,4.0,5.5]
time=np.array(time_obseved)


H0 = np.zeros((n, n))
 for i in range(0, n):
      for j in range(0, n):
          H0[i, j] = np.abs(time[j] - time[i])

def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remainin
with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

   # MvNormal only accept one cov at a time, you need to construct 
   # each component one by one
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

there was such a mistake

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)

junpenglao · March 27, 2018, 7:04am

Oh right to use the MvNormal mixture you need to update to master branch of PyMC3 (the fix is recently push).

Cabbagee · March 27, 2018, 7:10am

Thank you very much for helping me to solve the problems I met，

I’m going to update and allow code.

Cabbagee · March 27, 2018, 8:04am

Well, excuse me, could you tell me how to update to the main branch of pymc3，I am using the following environment：
windows7
Python: 3.6
theano: 1.0.1
pymc3: 3.3

enter the following commands in the Anaconda prompt？？？

conda install git+https://github.com/pymc-devs/pymc3

junpenglao · March 27, 2018, 8:17am

I think you cannot conda install master.

pip3 install git+https://github.com/pymc-devs/pymc3

Cabbagee · March 27, 2018, 11:23am

I have also tried ,but get the same error:

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)

junpenglao · March 27, 2018, 11:52am

Oops, sorry it should be:

pip3 install https://github.com/pymc-devs/pymc3@branch

Cabbagee · March 27, 2018, 12:08pm

error

Error

junpenglao · March 27, 2018, 12:14pm

LOL, sorry I should really double check what I pasted down…
pip3 install git+https://github.com/pymc-devs/pymc3@master

Cabbagee · March 28, 2018, 1:38am

As shown above，I’ve updated it，But when I run the code, I still have athe mistake，It is a part of the code is wrong?：

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)

junpenglao · March 28, 2018, 5:53am

make sure that the pip installed version is in your conda enviorment. I am running the same code (copied below) and there is no problem.

import numpy as np
import numpy as np
import pymc3 as pm
from theano import tensor as tt

K = 6        # the number of components
N = 200  # the number of observed individuals
n = 9  # each individual was observed 9 times

mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data

C1 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        if i == j:
            C1[i, j] = 10.0
        else:
            C1[i, j] = 7.0

dataSet1 = np.random.multivariate_normal(mu0, C1, size=N)

# observation time(all individuals are observed at the same times)

time_obseved = [-5.0, -4.5, -3.0, -2.5, -1.0, 0.0, 1.5, 2.5, 3.0, 4.0, 5.5]
time = np.array(time_obseved)

H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

def stick_breaking(beta):
    portion_remaining = tt.concatenate(
        [[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remaining


with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

Cabbagee · April 2, 2018, 3:55am

Hi, I think I am running into another bug where when I try to fit this model it stops at 0 iterations,just like this:

with model:
trace=pm.sample(2000)

The running results are as follows

runfile(‘E:/Cabbage/spyder/DPM of multi guass.py’)
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain
C:\Users\NPhard\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
warnings.warn(“DeprecationWarning: there is no c++ compiler.”
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
C:\Users\NPhard\Anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters
Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_logodds, rho_interval__, sigma_w_interval__, M_log__ ]

It stops here and doesn’t make any progress，I don’t know why is the program running so slow,could you please give me some advice,thank you.

junpenglao · April 2, 2018, 6:13am

It stop and no output error? Try setting the njobs to 1:

trace = pm.sample(2000, njobs=1)

Cabbagee · April 2, 2018, 6:50am

Not completely stopped, is always in the running state，but it worked just like the following,and no further progress:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_logodds, rho_interval__, sigma_w_interval__, M_log__ ]

I tried add njobs=1,

but there’s a new error:

File “C:\Users\NPhard\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py”, line 117, in astep
‘might be misspecified.’ % start.energy)

ValueError: Bad initial energy: inf. The model might be misspecified.

Topic		Replies	Views
Multivariate Gaussian Processes Questions	2	560	July 27, 2020
Mixture of Linear models Questions	3	1187	July 21, 2018
Modelling a Mixture Model modeling	1	88	June 19, 2024
Several seconds per iteration with GPU Questions	5	731	June 9, 2018
2D Gaussian Mixture Model Questions	7	2071	August 25, 2017

Mixed multivariate Gauss distribution

Related topics