Sampling from simplex with inequality constraints

I’m trying to uniformly sample a vector of random variables a such that \sum_i a_i = 1 and a_i < c_i

I can easily fulfill the equality constraint by sampling x \sim \mathrm{Exponential}(\lambda=1) and then normalizing a = x / \sum(x).

However, I’m having problems implementing the inequality constraint with pm.Potential

with pm.Model() as model:
    x = [pm.Exponential(f"x{i}", lam=1) for i in range(3)]
    a = pm.Deterministic("a", x / tt.sum(x))
    c0 = pm.Potential('c0', tt.switch(tt.le(a[0], 0.3), 0, -np.inf))  # require that a0 < 0.3
    trace = pm.sample()

gives

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [x2, x1, x0]
Sampling 2 chains:   0%|          | 0/2000 [00:00<?, ?draws/s]/usr/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)

Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
Series([], )

Any ideas what could be wrong?

Is using Dirichlet better than using Exponential and then normalize? Those two should be equivalent when the concentration parameter of Dirichlet are all 1. https://math.stackexchange.com/questions/502583/uniform-sampling-of-points-on-a-simplex
https://stats.stackexchange.com/questions/244917/what-exactly-is-the-alpha-in-the-dirichlet-distribution

Can you use https://docs.pymc.io/api/bounds.html?

I think using that kind of infinite and discrete potential can cause problem. What if one of the starting point is beyond 0.3?

The document says potential adds a term to the overall potential energy. I would think solving the KKT condition (Lagrange multiplier method for inequality) can work. That means solving equations by hand to convert a constrained problem to an unconstrained problem. I asked a question about Lagrange multiplier a few days ago: Add linear constraint with Lagrange multiplier method

Judging by the error message which says:

check any log probabilities that are inf or -inf

It may have something to do with your potential being -inf:

c0 = pm.Potential('c0', tt.switch(tt.le(a[0], 0.3), 0, -np.inf))

It is almost always better to solve this kind of problem by re-parameterizing rather than trying to impose constraints. You can easily adapt the stick_breaking approach used here:

https://docs.pymc.io/notebooks/dp_mix.html

to do what you want. You’ll need to do it as a process (as the constraint on beta will change as pieces are drawn: c for the first piece, \min(1,\frac{c}{\mathrm{remaining}}) every subsequent piece.

Thanks for pointing me to the Dirichlet distribution. I wasn’t aware of it. It reads more concise, however the problems with setting the inequality constraint remain.

I tried bounds. Apart from not knowing how to specify individual upper bounds for each variable, the following error is thrown

with pm.Model() as model:
    a = pm.Bound(pm.Dirichlet, upper=[0.7])('a', a=np.array([1, 1, 1]))
    trace = pm.sample()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)    
NUTS: [a]
Sampling 2 chains:   4%|▍         | 87/2000 [00:00<00:04, 415.28draws/s]/usr/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
Sampling 2 chains:   6%|▌         | 122/2000 [00:00<00:04, 430.42draws/s]
Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
a_upperbound__   -inf    

Thanks! I still have to wrap my head around the Dirichlet process bit. I’m not sure if the following sampling procedure correctly implements the stick_breaking approach you propose.
To give a concrete example, I’m drawing 3 random variables, with x_0 < 0.6, x_1 < 0.8 and x_2 < 0.1. I’m repeating 1000 times to scatter-plot the resulting distribution.

n = 1000
a = np.zeros((n, 3))
for i in range(n):
    x0 = np.random.uniform(0, 0.6)
    x1 = np.random.uniform(0, min(1-x0, 0.8))
    x2 = np.random.uniform(0, min(1-x1-x0, 0.1))
    a[i] = [x0, x1, x2]

stick_breaking
While the constraints are indeed fulfilled, the sampling is not uniform anymore. For 3 variables, this would be kind of ok, however I need to go to 15 variables and that case the variables that are being sampled later in the procedure have no stick left to break.
Am I implementing the idea correctly?

I also tried fullfilling the inequality constraints via rejection sampling, but again in 15 dimensions the acceptance rate is ~0.

Your procedure works but generates a non-uniform density subject to the constraints.

My procedure does neither – in particular the last piece (1-\sum_{j=1}^{k-1}x_j) is not actually constrained.

The Dirichlet distribution has an efficient sampler because of a bijection between the k-simplex and \mathbb{R}^{k-1}. This is not true for a general polytope (which is what it seems you’d like to have).

Honestly, I would use your approach, but swap the uniform distributions with hyper-parametrized betas:

x_i \sim \mathrm{Beta}(\alpha_i, \beta_i) \times \mathrm{min}(\mathrm{resid}, c_i)
\alpha_i \sim \Gamma(a_i^{\alpha}, b_i^{\alpha})
\beta_i \sim \Gamma(a_i^{\beta}, b_i^{\beta})

[something like gamma(2, 2); or you could use an optimizer to make the distribution look uniform] and live with the fact that the prior isn’t uniform.

The most general case: sampling from an arbitrary (convex, hopefully!) polytope; is not trivial; and even estimating volumes is hard. (see http://www.math.cmu.edu/~af1p/Teaching/MCC17/Papers/BF.pdf).

This could be one of those cases where using a potential would work; you could use something like

tt.switch(tt.le(f_constr(x), val), 0, -(val-f_constr(x))*K)

then you could subsequently prune the trace for values outside of the constraint set (since the potentials are imperfect). The alternative is to implement your own interior point method see https://mathoverflow.net/questions/9854/uniformly-sampling-from-convex-polytopes/162327 for a relevant discussion, and a recent paper:

http://www.jmlr.org/papers/volume19/18-158/18-158.pdf

discussing interior point methods for polytope sampling.

Edit:

And using the StickBreaking transformation won’t work either. A simple constraint on the k-simplex winds up being not so simple on the k-1 embedding:

X = sp.stats.dirichlet.rvs(0.45*np.ones(3), 15000)
max_lim = 0.7 * np.ones(3)
within_max = [all(X[i,:] < max_lim) for i in range(X.shape[0])]
Z = forward_val(X)

i = np.where(within_max)[0]
j = np.where(np.logical_not(within_max))[0]

matplotlib.pyplot.scatter(Z[j,0], Z[j,1])
matplotlib.pyplot.scatter(Z[i,0], Z[i,1])

image

1 Like

Thanks Chris for your detailed reply!

  • The paper http://www.jmlr.org/papers/volume19/18-158/18-158.pdf does look very promising. I’ll follow up on this, using one of the two implementations on github.
  • I had tried with a simple finite potential, say pm.Potential('c0', tt.switch(tt.le(a[0], 0.5), 0, -1E10)), which as you say leads to samples outside of the constraints. They could be pruned, however even for a low dimensional problem the mixing looked dubious to me and the acceptance rate (for the pruning) was low.
  • For constraints Ax = b and Cx < d the resulting polytope should be always convex.
  • Out of curiosity, I don’t understand the last code snippet, specfically what the forward_val does. If I take it out, the result looks as expected.
X = sp.stats.dirichlet.rvs(0.45*np.ones(3), 1000)
max_lim = 0.7 * np.ones(3)
within_max = [all(X[i,:] < max_lim) for i in range(X.shape[0])]
# Z = forward_val(X)
Z = X

i = np.where(within_max)[0]
j = np.where(np.logical_not(within_max))[0]

plt.scatter(Z[j,0], Z[j,1])
plt.scatter(Z[i,0], Z[i,1])

sampling

1 Like

The forward_val is the same as pm.transforms.StickBreaking.forward_val which is the bijection from the simplex to \mathbb{R}^{k-1}. I thought that, perhaps, a simple axis-parallel constraint on the simplex would correspond to an easy-to-implement constraint on \mathbb{R}^{k-1}; but the surface looks somewhat nasty.

The stick breaking transform is what defines the Dirichlet likelihood for pymc3:

def __init__(self, a, transform=transforms.stick_breaking,
                 *args, **kwargs):

Also, try using a non-flat potential for the constraint set – this will give NUTS a gradient to follow, and it should help acceptance.