How to define a custom likelihood on a bounded support (on a simplex)

Hi, I am a new to pymc3. I want to define a cumstom likelihood in my model. The likelihood that I need should be constrained on a simplex (all dimensions are bounded on [0,1], and sum to 1). However, I don’t know how to impose this kind of constraints. Is it possible in pymc3 to define the support of custom likelihood ?

My questions are:

  1. How can I define the support of my custom probability density function? (for instance, what should I do if I want to definle a constrained Multivariate Normal likelihood which have a positive density iff the input vector x be on a simplex (all dimensions are bounded on [0,1], and sum to 1) )?
  2. Is it OK that my cumstom likelihood does not integrated to 1 (the normalize constant is intractable)?

Thanks for any kind of replies. And suggestions with code are greatly preffered!

The short answer is that you should use dirichlet distribution.
The longer answer:

In theory, you can use any based distribution that defined on real, and add transformation (you can either use SumTo1 or stickbreaking, for more see:

Unnormalized density is valid for MCMC so yes you can.

1 Like

Thanks for the reply!!
Dirichlet is of course a default option. However, It does not work in my model.Here I would like to briefly introduce my model. It is a hidden markov chain whose latent states are simplexes, and obervable data is a chain of real numbers. To model the data, I imposed a dirichlet transition distribution and a normal emission distribution. like: P(wt|wt-1)~Dirichlet(wt-1); P(Xt|wt)~N(wt*zt), wt , the latent state at time t, should be on a d dimension simplex. zt is a d dimension verctor and Xt is a number. zt and Xt are observable. However, with this paraterimization, the dirichlet transition tends to ‘pull’ wt to uniform distribution when t is large, and fail to fit the data. It is perhapes because Dirichlet distribution is most stable at a uniform distribution, like[1/3, 1/3, 1/3]. Since the variance of Dirichlet(alpha_1,alpha_2,…alpha_d) is large on the dimensions whose alpha_k is large, and small whose alpha_k is small. So I want to turn to truncated normal constrained on a simplex.

your advise of adding stickbreaking transformation sounds great. I am a new to pymc, Can you give some detailed suggestion? Like how to write code to define a truncated normal distribution constrained on a simplex, with given mean vector MU and dignal covariance matrix I . My previous code which define the structure of my model is pasted here for your reference. so grateful for your help!!!

with pm.Model() as BGM:
Fund_Return_ser = []

w_ser = [industry_df.loc[date_list[0]].values]
for i, date in enumerate(date_list):
    if i==0:
    #local_alphas = w_ser[i-1]* (alpha_zero- len(w_ser[0]))+ 1
    w_ser.append (pm.Dirichlet('w_'+ str(date)[:10], (w_ser[i-1]* (alpha_zero- len(w_ser[0]))+ 1), shape = len(w_ser[0])))
    #R_mu =[i], ind_return.loc[date].values)
    Fund_Return_ser.append( pm.Normal('Fund_Return'+ str(date)[:10], mu =[i], ind_return.loc[date].values),\
                                           sigma = (sigma_sqr**0.5)/10, observed = factitious_return.loc[date]))

A simple example goes like this:

with pm.Model() as m:
        'x', 0., 1., shape=3, 
        testval=[0.5, .25, .25], 
#         transform=pm.distributions.transforms.sum_to_1
    trace = pm.sample()
np.testing.assert_almost_equal(trace['x'].sum(axis=1), np.ones(len(trace['x'])))

I would not truncated it for now, as you are doing transformation already and I am not sure exactly how that would interact with simplex-like transformation. However, if you want to also truncated it, it’s also possible by using a chain transformation linking the sum_to_1 with a interval transformation.

Also, since you are working with HMM, make sure you check out this post:

Thanks for the reply.
I have tried your example and have some questions about it.

  1. How can I get the probability density of a given point in the transformed distribution? ‘TransformedRV’ object has no attribute ‘logp’.
  2. It is not clear for me that what the transformation is doing here. Is there any document about what is the stick-breaking transformation that I can refer to?
  3. Is the following statement true and is exactly what the example is doing? We have a random varialbe x whose support is a simplex.And x have a pdf propotional to p(x). We transform it to another varialbe y with function y=f(x) and it can be transformed back to x (on the one-to-one manner). Then a distribution q is automatically imposed to the transformed varible y such that p(x1)/p(x2) == q(y1)/q(y2) for any x1 and x2 (where y1=f(x1), y2=f(x2)). Finally, drawing y from q(y) and transform it back to x is equavalent to drawing x from p(x). The program automatically use q(y) to generate the joint likelihood and draw y’s from the posterior of y, and transform it back to x. These samples can be viewed as drawn from the posterior of x. As a result, working with the d-1 dimension transformed variable is exactly the same as working with the original d dimension random variable whose support is a siplex.