Stick-breaking indian buffet process for an infinite latent feature model

I already asked a question related to this topic in this forum. I am implementing the stick-breaking construction of the Indian buffet process. I have a couple of problems regarding the implementation and I will appreciate if somebody can explain solutions for them:

  1. I use the adaptive rejection sampling (ARS) to sample the stick lengths mu(k) for sampling new features (k), using equation 25 of above paper. The log function that I feed into the ARS is:

    cdef double muARS(double x, double alpha, long N):
         #equation 25 Teh, Gorur, Ghahramani 2007
         #Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning Dilan Görür-simplification using B.16 equation
         cdef double val = <double>N*log(1-x)-log(x)
         return val
    cdef double primeMuARS(double x, double alpha, long N):
        #the derivative of quation 25 Teh, Gorur, Ghahramani 2007
        cdef double val =  -1/x - <double>N/(1-x)       
        return val 
    def NewMuARS(int ns, int m, double emax,
                 np.ndarray[ndim=1, dtype=np.float64_t] x,
                 np.ndarray[ndim=1, dtype=np.float64_t] hx,
                 np.ndarray[ndim=1, dtype=np.float64_t] hpx,
                 int num,
                 double alpha,
                 long N):
        #calling the adaptive rejection sampler
        cdef double xlb = 0.0
        cdef double xub = 0
        cdef int lb= 1
        cdef int ub= 0
        return py_ars(ns, m, emax, x, hx, hpx, num, wrap_f,wrap_fprime, alpha, xlb, xub, lb, ub, N)
    cdef double mu_new_features_update( double mu_prev, double alpha, long N):
          #equation 25 Teh et al. 2007 (Stick-breaking construction for the IBP)
               double mu
               int ns = 100
               int m  = 2
               double emax= 64
               np.ndarray[FTYPE_t, ndim=1, mode='c'] x   = np.zeros(2, float)
               np.ndarray[FTYPE_t, ndim=1, mode='c'] hx  = np.zeros(2, float)
               np.ndarray[FTYPE_t, ndim=1, mode='c'] hpx = np.zeros(2, float)
          x[0]   = 0.1
          x[1]   = 0.98
          hx[0]  = muARS(x[0], alpha, N)
          hx[1]  = muARS(x[1], alpha, N)
          hpx[0] = primeMuARS(x[0], alpha, N)
          hpx[1] = primeMuARS(x[1], alpha, N)
          while True: 
                mu = NewMuARS(ns, m, emax, x, hx, hpx, 1, alpha, N)[0]
                if ( 0 <= mu <= mu_prev):
          return mu

when the mu_{previous} is very small (e.g. 1e-9), the mu_new_features_update tends to gets stuck in sampling small stick-sizes. How can I avoid this problem?

  1. Since the stick-breaking implementation is given in order to deal with non-conjugacy in the model, I am wondering what would happen if someone uses the stick-breaking IBP prior to testing it on the case that there is a conjugacy and some public codes are available such as the linear-Gaussian infinite latent feature model to examine the performances of them?

  2. In the algorithm 15 of this thesis, it is mentioned to “Sample parameters for the new represented features from the prior”. Does it refer to using the Bernoulli prior to sample Z values for the new stick sizes or it refers to other free parameters in the model?


This is pretty much outside of my knowledge so I will just provide some comments:

  1. If I understand correctly, when mu_ is small it does not contribute much to the final mixture, that’s why the stick-size (ie the component weight) is near 0. It is a common thing when you have more component/feature than you need.

  2. When you are doing sampling I dont think conjugacy is important. You cannot have a closed form solution but you can still design efficient sampler for your problem.

  3. It is referring to other free parameters in the model, as it basically adds a new mixture component/feature into the model.

Thanks @junpenglao for the reply and explanation. About your first point, the main difference of the stick-breaking construction with other implementations is that you do not force your sampler on any truncation level of the stick sizes. Your sampler supposedly should learn during the sampling process the proper lower limit of the stick size but what is surprising me is that the equation 25 in the paper is not log-concave for small values close to zero, where the log-concavity is the pre-requirement of using the adaptive rejection sampling. For small x values, the adaptive rejection sampler returns this error message when I run the sampler:
Trap: non-logcocavity detected by ARS update function.

Here is the plot I made to show that this function is not log-concave:

import numpy as np
import pylab as plt
def g(x, N, alpha):
     for i in xrange(1, N, 1):
     return alpha*s +(alpha-1)*np.log(x)+N*np.log(1.-x)

def h(x, N, alpha): 
      val = N*np.log(1-x)-np.log(x)
      return val 



h function is based on the simplification given in (A.11) eq. of the thesis.
Do you have any suggestion to avoid the problem of getting stuck in log-convex part?


This is the address of my bitbucket repository. The main structure of the code is in the sampling.pyx code. The main body of sampler starts at UnCollapseSampler class. I was trying to follow the instruction given in the section 4 of the paper for sampling and add comments for each function. I was also looking at the thesis of the second author (Dilan Gorur), page 93 (algorithm 15). I have a real problem with the pseudo code because one is supposed to sample the stick length for every object in the data. Should this part be placed in the loop?

I will appreciate if someone could help me to be able to implement the stick-breaking IBP.



This github repo has implemented stick-breaking IBP using variational inference.

1 Like

I’m also trying stick-breaking IBP for my research.
Too sample new stick-size mu from its conditional probability (eq.4.51 of the thesis), I recommend to use naive rejection sampler because it does not require log-concavity.
In concrete, in eq.4.51, the first and second terms

mu ^ {a - 1} * (1 - mu) ^ N

are regarded as an unnormalized pdf of Beta(a, N + 1). Then, the third term

exp(a \sum (1 - mu) ^ i / i)

has an upper bound

exp(a * Hn),

where Hn is an N-th harmonic number, since 0 <= mu <= 1 is guaranteed.
In summary, you can sample new stick-size mu using rejection sampler with proposal distribution Beta(a, N + 1) and constant coefficient B(a, N + 1) * exp(a * Hn). B(x, y) is Beta function.

@wasyro @neuron do either of you have a working implementation?