Mixed multivariate Gauss distribution


#1

Hello,everyone.
I’m trying to carry out a sampling of the multiple Gauss mixed model,The distribution of variables is as follows(Mixed multivariate Gauss distribution):

y_i=∑(k=1)^K{ω_kN(O,σ_k^2 H(ρ_k ))} ----multivariate Gauss distribution mixture_
ω_k=β_k∏(j=1)^(k-1){1-β_j } ----weights,stick-breaking process_
β_i iid Beta(1,M)
σ_k iid U(0,1.5) prior
ρ_k iid U(0,1) prior

I did the following the code but it is not working:

import numpy as np
import numpy as np
import pymc3 as pm
from theano import tensor as tt

K = 6        # the number of components
N = 200  # the number of observed individuals
n = 9  # each individual was observed 9 times

mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data

C1 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        if i == j:
            C1[i, j] = 10.0
        else:
            C1[i, j] = 7.0

dataSet1 = np.random.multivariate_normal(mu0, C1, size=N)

# observation time(all individuals are observed at the same times)

time_obseved = [-5.0, -4.5, -3.0, -2.5, -1.0, 0.0, 1.5, 2.5, 3.0, 4.0, 5.5]
time = np.array(time_obseved)

#I missed it before, and I'm mending it now
def H(rho):
    H=np.zeros((n,n))
    for i in range(0,n):
        for j in range(0,n):
            H[i,j]=np.power(rho,np.abs(time[j]-time[i]))
    return H


# stick-breaking process
def stick_breaking(beta):
    portion_remaining = tt.concatenate(
        [[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remaining


with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    obs = pm.Mixture('obs', w, pm.MvNormal.dist(
        mu=mu0, cov=sigma_w**2 * H(rho)), observed=dataSet1)

I get the following error:

    runfile('F:/spyder/Mixed multivariate Gauss distribution.py')
    WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
    C:\Users\Administrator\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
      warnings.warn("DeprecationWarning: there is no c++ compiler."
    WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
    WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
    C:\Users\Administrator\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Traceback (most recent call last):

    File "<ipython-input-1-a62ff05957e7>", line 1, in <module>
    runfile('F:/spyder/Mixed multivariate Gauss distribution.py')

    File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

    File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

    File "F:/spyder/Mixed multivariate Gauss distribution.py", line 75, in <module>
    obs=pm.Mixture('obs',w,pm.MvNormal.dist(mu=mu0,cov=sigma_w**2*H(rho)),observed=dataSet1)

    File "F:/spyder/Mixed multivariate Gauss distribution.py", line 54, in H
    H[i,j]=np.power(rho,np.abs(time[j]-time[i]))

    ValueError: setting an array element with a sequence.

How could I make this work?


#2

There seems to be some information missing: what is the H in H(rho)?


#3

Oh, I missed it.

def H(rho):
    H=np.zeros((n,n))
    for i in range(0,n):
        for j in range(0,n):
             H[i,j]=np.power(rho,np.abs(time[j]-time[i]))
    return H

It is used to construct covariance matrix of mixed components.


#4

OK I see, and this is where the error comes from. You need to rewrite the function H into a theano function. Something like below should work:

# instead of computing H(rho), do most of the computation outside of theano
H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    # MvNormal only accept one cov at a time, you need to construct 
    # each component one by one
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

You can find a similar example discussed in this post: Properly sampling mixture models


#5

Thank you very much for answering my doubts,I just tried the modified code, and there was such a mistake,It’s related to the dimension I set up

runfile('F:/spyder/Mixed multivariate Gauss distribution.py')
Traceback (most recent call last):

File "<ipython-input-2-a62ff05957e7>", line 1, in <module>
runfile('F:/spyder/Mixed multivariate Gauss distribution.py')

File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "F:/spyder/Mixed multivariate Gauss distribution.py", line 75, in <module>
obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 36, in __new__
dist = cls.dist(*args, **kwargs)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 47, in dist
dist.__init__(*args, **kwargs)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 93, in __init__
comp_mode_logps = self.logp(comp_modes)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 141, in logp
return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
          
File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 112, in _comp_logp
return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\mixture.py", line 112, in <listcomp>
return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\multivariate.py", line 270, in logp
quaddist, logdet, ok = self._quaddist(value)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\pymc3\distributions\multivariate.py", line 89, in _quaddist
delta = value - mu

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\tensor\var.py", line 147, in __sub__
return theano.tensor.basic.sub(self, other)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\gof\op.py", line 674, in __call__
required = thunk()

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\gof\op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)

File "C:\Users\Administrator\Anaconda3\lib\site-packages\theano\tensor\elemwise.py", line 790, in perform
variables = ufunc(*ufunc_args, **ufunc_kwargs)

ValueError: operands could not be broadcast together with shapes (9,6) (1,9) 

I want to realize multiple observations of multiple individuals,These individuals are independent, but their distribution is not the same,and each individual, observed several times.


#6

In this case you should build separate models for each individual.


#7

Thank you for your advice,But I’m still confused about why the revised code is wrong with the dimensions,It is a multicomponent Gauss mixture
why not work?

import numpy as np
import pymc3 as pm
from theano import tensor as tt

K=6       # the number of mixed components
N=200     # number of samples
n=9       #each sample point is multiple,9 dimensional
mu0=np.linspace(0.0,0.0,num=n)

#simulate observation data
C1=np.zeros((n,n))
for i in range(0,n):
    for j in range(0,n):
        if i==j:
             C1[i,j]=10.0
        else:
             C1[i,j]=7.0

dataSet1=np.random.multivariate_normal(mu0,C1,size=N)


#observation time(all individuals are observed at the same times)
time_obseved=[-5.0,-4.5,-3.0,-2.5,-1.0,0.0,1.5,2.5,3.0,4.0,5.5]
time=np.array(time_obseved)


H0 = np.zeros((n, n))
 for i in range(0, n):
      for j in range(0, n):
          H0[i, j] = np.abs(time[j] - time[i])

def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remainin
with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

   # MvNormal only accept one cov at a time, you need to construct 
   # each component one by one
    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

there was such a mistake

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)

#8

Oh right to use the MvNormal mixture you need to update to master branch of PyMC3 (the fix is recently push).


#9

Thank you very much for helping me to solve the problems I met,

I’m going to update and allow code.:hugs:


#10

Well, excuse me, could you tell me how to update to the main branch of pymc3,I am using the following environment:
windows7
Python: 3.6
theano: 1.0.1
pymc3: 3.3

enter the following commands in the Anaconda prompt???

conda install git+https://github.com/pymc-devs/pymc3

#11

I think you cannot conda install master.

pip3 install git+https://github.com/pymc-devs/pymc3

#12

I have also tried ,but get the same error:

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)


#13

Oops, sorry it should be:

pip3 install https://github.com/pymc-devs/pymc3@branch

#14

error

Error
:joy::joy::joy:


#15

LOL, sorry I should really double check what I pasted down…
pip3 install git+https://github.com/pymc-devs/pymc3@master


#16


As shown above,I’ve updated it,But when I run the code, I still have athe mistake,It is a part of the code is wrong?:

ValueError: operands could not be broadcast together with shapes (9,6) (1,9)


#17

make sure that the pip installed version is in your conda enviorment. I am running the same code (copied below) and there is no problem.

import numpy as np
import numpy as np
import pymc3 as pm
from theano import tensor as tt

K = 6        # the number of components
N = 200  # the number of observed individuals
n = 9  # each individual was observed 9 times

mu0 = np.linspace(0.0, 0.0, num=n)

# simulate observation data

C1 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        if i == j:
            C1[i, j] = 10.0
        else:
            C1[i, j] = 7.0

dataSet1 = np.random.multivariate_normal(mu0, C1, size=N)

# observation time(all individuals are observed at the same times)

time_obseved = [-5.0, -4.5, -3.0, -2.5, -1.0, 0.0, 1.5, 2.5, 3.0, 4.0, 5.5]
time = np.array(time_obseved)

H0 = np.zeros((n, n))
for i in range(0, n):
    for j in range(0, n):
        H0[i, j] = np.abs(time[j] - time[i])

def stick_breaking(beta):
    portion_remaining = tt.concatenate(
        [[1], tt.extra_ops.cumprod(1 - beta)[:-1]])

    return beta * portion_remaining


with pm.Model() as model:
    M = pm.Gamma('M', 1., 1.)
    sigma_w = pm.Uniform('sigma_w', 0.0, 1.5, shape=K)
    rho = pm.Uniform('rho', 0.0, 0.1, shape=K)

    beta = pm.Beta('beta', 1., M, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    compdist = []
    for i in range(K):
        compdist.append(pm.MvNormal.dist(
            mu=mu0, cov=sigma_w[i]**2 * tt.pow(rho[i], H0)))

    obs = pm.Mixture('obs', w, compdist, observed=dataSet1)

#18

Hi, I think I am running into another bug where when I try to fit this model it stops at 0 iterations,just like this:

with model:
trace=pm.sample(2000)

The running results are as follows

runfile(‘E:/Cabbage/spyder/DPM of multi guass.py’)
WARNING (theano.configdefaults): g++ not available, if using conda: conda install m2w64-toolchain
C:\Users\NPhard\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
warnings.warn(“DeprecationWarning: there is no c++ compiler.”
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
C:\Users\NPhard\Anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters
Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_logodds
, rho_interval__, sigma_w_interval__, M_log__ ]

It stops here and doesn’t make any progress,I don’t know why is the program running so slow,could you please give me some advice,thank you.


#19

It stop and no output error? Try setting the njobs to 1:

trace = pm.sample(2000, njobs=1)

#20

Not completely stopped, is always in the running state,but it worked just like the following,and no further progress:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_logodds, rho_interval__, sigma_w_interval__, M_log__ ]

I tried add njobs=1,

but there’s a new error:

File “C:\Users\NPhard\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py”, line 117, in astep
‘might be misspecified.’ % start.energy)

ValueError: Bad initial energy: inf. The model might be misspecified.