Indexing using a free random variable

Alright thanks!

I have run this model with 8 millions iterations with 1 millions tuning steps and still have a Gelman Rubin above 1.4 and a lot of divergences. Do you know have any idea of what causes that problem? Increasing target_accept doesn’t help either.

More iterations doesn’t help when there are problems in your model. In this case, a few suggestion:

  1. Increase target_accept leads to smaller step size, but it only helps when you are seeing <10 divergences. When there are lots of divergent, it means something is wrong with the model
  2. looking at the output warning, one chain output 49 divergences, while the other one output 5e6 divergence(!). It seems the chain is stuck in some high curvature area. A better starting point could help
  3. However, the sampling is likely still problematic as you are mixing CategoricalGibbsSampler with NUTS. Mixing discrete parameters with continuous using NUTS is not well tested. Reading from your code it seems you are trying to learn a sparse weight coefficient in
    aCH=aCH_[0, ncomp_aCH]
    aCOH=aCOH_[0, ncomp_aCOH]
    out= b1*aCH+aCOH
  • A better way to do is model b1 as a vector using a regulated prior to impose sparseness, eg regularized horseshoe, and do matrix multiplication.

Thanks for the advices! I’m going to try to mix Categorical GibbsSampler with Metropolis instead of Nuts as these sampling methods are closer. Concerning the b1 coefficient, it really should be a scaler and not a vector. It is a physical property that should be the same for each sample.

If I use Metropolis, then the variables updated by Categorical Gibbs Sampler are not updated at all. This is very strange. Do you have any idea why?

Try different start values.

I tried. Doesn’t move from the starting values

I had another look at the model, with the following code:

with pm.Model() as basic_model:
    # Priors for unknown model parameters
    b1 = pm.Uniform('b1', lower=0.3, upper=0.5, testval=0.45)
    ncomp_aCH = pm.Categorical('ncomp_aCH', p=np.ones(n)/n)
    ncomp_aCOH = pm.Categorical('ncomp_aCOH', p=np.ones(n)/n)

    aCH=aCH_[0, ncomp_aCH]
    aCOH=aCOH_[0, ncomp_aCOH]

    out= b1*aCH+aCOH

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=out, tau=sigma, observed=Y)
    trace = pm.sample(discard_tuned_samples=False)

which gives me a trace:

So looking at the trace, The problem seems to be that ncomp_aCH is stuck after a few step. Either it is highly likely (think of the posterior being almost like a delta function) or there is some problem with the sampler.

I verify the logp of ncomp_aCH using the following:

point = trace[-1]
point
# {'b1': 0.4980582094475776,
#  'b1_interval__': 4.624950462026797,
#  'ncomp_aCH': 14,
#  'ncomp_aCOH': 17}
for i in range(n):
    point['ncomp_aCH']=i
    print(point['ncomp_aCH'], basic_model.logp(point))

0 -1110.9799301118057
1 -1118.7101039762993
2 -1128.3039141889512
3 -1125.591418583891
4 -1133.5723103615805
5 -1134.252739856408
6 -1131.6183448546024
7 -1132.7872247881817
8 -1111.4298570895123
9 -1134.4647708083562
10 -1144.1176821443162
11 -1141.1537636543512
12 -1097.8710713143184
13 -1086.384805832985
14 -1038.9504119754274
15 -1053.7949113856769
16 -1102.021203844887
17 -1102.374012701112
18 -1104.585462412219
19 -1148.0232245001032

Looking from the output, ncomp_aCH=14 does have the lowest logp. In fact, the dlogp between ncomp_aCH=14 and ncomp_aCH=15 is np.exp(1038.9504119754274-1053.7949113856769) = 3.573681299085053e-07 (the acceptance proability), which is why all the proposal value are rejected.
I think you should double check your model, as the output from the sampler seems to be correct.

I think there is a way to marginalized instead of indexing. The thought behind it is that when you computed the posterior expectation of ncomp_aCH or ncomp_aCOH, it’s a weight vector with n elements that summed to 1. So instead of restricting the vector to take value only 0 or 1 (ie an index vector), let it take values between 0 and 1:

with pm.Model() as basic_model:
    # Priors for unknown model parameters
    b1 = pm.Uniform('b1', lower=0.3, upper=0.5)
    ncomp_aCH = pm.Dirichlet('ncomp_aCH', a=np.ones(n)/n)
    ncomp_aCOH = pm.Dirichlet('ncomp_aCOH', a=np.ones(n)/n)

    aCH = aCH_.dot(ncomp_aCH)
    aCOH = aCOH_.dot(ncomp_aCOH)

    out = b1 * aCH + aCOH

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=out, tau=sigma, observed=Y)
    trace = pm.sample(1000, tune=2000)

By using a Dirichlet with alpha < 1., the resulting weight will be sparse (with the extreme being a index vector).
However, the above model is difficult to sample (lots of divergences), with some reparameterization:

import theano.tensor as tt
Dirprior = 1./n
with pm.Model() as basic_model:
    # Priors for unknown model parameters
    b1 = pm.Uniform('b1', lower=0.3, upper=0.5)

    beta1 = pm.Gamma('beta1', alpha=Dirprior, beta=1., shape=n)
    ncomp_aCH = pm.Deterministic('ncomp_aCH', beta1/tt.sum(beta1))
    beta2 = pm.Gamma('beta2', alpha=Dirprior, beta=1., shape=n)
    ncomp_aCOH = pm.Deterministic('ncomp_aCOH', beta2/tt.sum(beta2))

    aCH = aCH_.dot(ncomp_aCH)
    aCOH = aCOH_.dot(ncomp_aCOH)

    out = b1 * aCH + aCOH

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=out, tau=sigma, observed=Y)
    trace = pm.sample(1000, tune=2000)

pm.forestplot(trace,varnames=['b1', 'ncomp_aCH', 'ncomp_aCOH']);

image
The result is quite similar with using indexing but IMO this model is much better.

This sounds like a very intersting option. I will give it a try, will come back to you. Thank you!

Could you precise how you modified aCH_ and aCOH_ because how I send it to you it doesn’t have the right dimension. It has the dimension:

shapes (20,92)
So I tried to transpose it using numpy.transpose but it doesn’ work.

Also you set n=20 right?

I put everything in a gist: https://gist.github.com/junpenglao/fde55c4ac0f4607e47ea98305c8ffa8c

That’s great! Thank you very much for all the time you spend playing with the model!

I had written a similar code using the above suggestion.

sigma_list = np.linspace(0.1,0.5,6)
obs =[1,2,3,5]
with pm.Model() as model:
    z = pm.Categorical('z', p=np.ones(6)/6)
    x = sigma_list[z]
    Y = pm.Normal('Y',mu=0,tau=x,observed=obs)
    trace=pm.sample(1000) 

But I get the following error:

IndexError                                Traceback (most recent call last)
<ipython-input-38-c73f2c1a7037> in <module>
     23     #z = pm.DiscreteUniform('z',lower =0,upper=4)
     24     z = pm.Categorical('z', p=np.ones(6)/6)
---> 25     x = sigma_list[z]
     26     Y = pm.Normal('Y',mu=0,tau=x,observed=obs)
     27     #like = pm.DensityDist('like', my_likelihood(x), observed=obs)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Could you please tell me what is wrong here?

z is a theano tensor (all pymc3 random variables are theano tensors), and indexing a numpy array with a theano tensor is not allowed.
Wrapping your numpy array into theano should fix it.

import theano
sigma_list = theano.shared(np.linspace(0.1,0.5,6))

Perfect. That works. Thanks for the quick reply.

Having a similar question with a slightly different use case.

I have a tensor which stores a bunch of potential new alpha parameters for a given model (temp_dirchlet_info in the code sample). My model generates a free random variable index z_ij and then tries to use the output of temp_dirchlet_info[z_ij, j, :] as the alpha parameter in the Dirichlet. Pymc3 doesn’t like alpha as this subtensor - so was wondering if there was another approach to achieve the same effect.

Repro and error below

import pymc3 as pm
import numpy as np
import theano

#Some parameters
n_features = 5
n_cluster = 5
n_fval = 2
n_Points = 10

#Some fake data for now
in_dat = np.random.randint(0, n_fval, size = (nPoints, n_features))
#Dummy potential dirchlet alpha vectors
temp_dirchlet_info = np.random.random(size=(n_clusters, n_features, n_fval))
temp_dirchlet_info = theano.shared(temp_dirchlet_info)
with pm.Model() as m:
  #Sampleing per point
  for i in range(0, n_Points):
      #Point mixture
      pi_i = pm.Dirichlet(f"pi_{i}", a = alpha, shape = n_clusters)        
      for j in range(0, n_features):
        #Pick out cluster
        z_ij = pm.Categorical(f"z_{i}{j}", p=pi_i)
        print(z_ij)
        #Get distribution of feature values
        phi_sj = pm.Dirichlet(f"phi_{i}{j}", a = temp_dirchlet_info[z_ij, j])
        #Sample feature value 
        x_ij = pm.Categorical(f"x_{i}{j}", p = phi_sj, shape = 1, observed = in_data[i, j])
/usr/local/lib/python3.6/dist-packages/pymc3/distributions/distribution.py in __init__(self, shape, dtype, testval, defaults, transform, broadcastable)
     61         self.shape = np.atleast_1d(shape)
     62         if False in (np.floor(self.shape) == self.shape):
---> 63             raise TypeError("Expected int elements in shape")
     64         self.dtype = dtype
     65         self.type = TensorType(self.dtype, self.shape, broadcastable)

TypeError: Expected int elements in shape

Looks like you are trying to do latent Dirichlet allocation. You should try to reparameterized your model into a marginalized version: Frequently Asked Questions

@junpenglao - sort of! Looking at a slight modification of LDA

image

The part of I’m having problem on is the $$x_{ij} $$ - I tried to store the the $$\phi$$ vectors in an S x J matrix but that can’t really be indexed into - and creating each i,j on the fly seems a bit expensive - us there at least a way to compute x_i from z_i in this setup easily?

Narrowed it down a bit - see below.
`Preformatted text``

#Assuming input data is all discrete - one hot encoding - can be expanded once the one hot case is working
alpha = np.ones(n_clusters) 
max_n_val = 2 #Discrete one hot encoding 
num_values = [max_n_val] * n_features # Extend later 
f_ind = np.tile(np.arange(dat.shape[1]), nPoints)
x_index = np.repeat(np.arange(dat.shape[0]), dat.shape[1])

with pm.Model() as model:
    # Picking out the important features per cluster
    omega = pm.Bernoulli('omega', p=q, shape=(n_clusters, n_features))
    # Picking out prototypes per cluster
    ps = pm.DiscreteUniform('p', 1, nPoints, shape = n_clusters)
    #Pick dirchlet (mixing potentials) for each point (weight for each npoint, what's weighting goes to each cluster)
    pis = pm.Dirichlet('pi', a = alpha, shape = (nPoints, n_clusters))
    # For each feature for each point, sample a cluster from which to choose from. 
    z = pm.Categorical("z", p = pis[x_index, :], shape = (nPoints * n_features))
    phis_clusters = np.zeros((n_clusters, n_features, 2))
    #Generate the distribution of feature outcomes phi_s
    for s in range(n_clusters):
      for j in range(n_features):
        n_val = num_values[j]
        g_vc = np.zeros(n_val)
        # Note that this only really works right now for binary/discrete features - has to be a bit more 
        # general for the many discrete feature case and for the continous case will have to come up with something else
        for v in range(n_val):
          g_vc[v] = lam *(1 + c *(omega[s, j] == 1 and all_data[ps[s]][j] == v))
        phis_clusters[s, j] = g_vc    
    #Flatten
    phis_flat = np.reshape(phis_clusters, (phis_clusters.shape[0] * phis_clusters.shape[1], phis_clusters.shape[2]))
    phi_index = np.repeat(np.arange(phis_clusters.shape[0]), phis_clusters.shape[1])
    #Generate relevant dirchlet's for each cluster x feature pair
    g = pm.Dirichlet("g", a = phis_flat, shape = phis_flat.shape)
    # Cateogrical datapoints given the observed one hot encoding - for each x, sample based on the selected
    # cluster/feature
    x = pm.Categorical("x", p = g[z * n_features + f_ind], observed = np.reshape(dat, dat.shape[0] * dat.shape[1]))

Struggling with how to marginalize over z here - given that z is generated from a Dirichlet with some dimension k and then used to generate a Categorical variable from 0-k-1