I think something based on cumsum can not work unfortunately. The argument why involves some convex geometry, and that’s not really my field, so take this with a grain of salt…
We can find out a bit more about the space of vectors u that fulfill your requirements:
First, we transform u into a one dimensional vector x (ie just u.ravel()
). We can then find a matrix A such that x is valid if and only if Ax>0. The matrix A has one row per constraint, and each row will contain one -1 and one 1:
N = 50
idxs = np.arange(2 * N, dtype=int).reshape((N, 2))
# Find the indices of `v` for each inequality
ineqs = []
ineqs.extend(zip(idxs[:, 0].ravel(), idxs[:, 1].ravel()))
ineqs.extend(zip(idxs[0:-1, 0].ravel(), idxs[1:, 0].ravel()))
ineqs.extend(zip(idxs[0:-1, 1].ravel(), idxs[1:, 1].ravel()))
# Collect all inequalities in a sparse matrix
ineq = sparse.lil_array((2*N, 2*N), dtype=np.int8)
A = sparse.lil_array((len(ineqs), 2*N), dtype=np.int8)
for i, (a_idx, b_idx) in enumerate(ineqs):
A[i, a_idx] = -1
A[i, b_idx] = 1
Is this case A.shape = (148, 100)
, so we have 148 inequalities, and n=100 = 2N dimensions.
Spaces like C = \{x\in \mathbb{R}^n\mid Ax > 0\} are called polyhedral (convex) cones and are studied in convex geometry, and there is quite a bit of literature (and software) for those. (Convex cone - Wikipedia).
There are two common ways to represent those cones: One is using the matrix A, but we can also find a set of generating vectors v_i, such that
C = \{ x\in \mathbb{R}^n \mid \sum \lambda_i v_i, \lambda_i > 0 \}
So those v_i act like a basis, and you can reach every point it C using a linear combination of the v_i with positive coefficients. But there is a crucial difference between a basis and this set of generating vectors: With a basis, there is only ever one way to reach each point in the space, while for a polyhedral cone the coefficients might not be unique. There might be (considerably) more vectors v_i than n. We can find the minimal set of v_i using some special software like ppl or normaliz, for instance:
import ppl
variables = [ppl.Variable(i) for i in range(2*N)]
cs = ppl.Constraint_System()
for idx1, idx2 in ineqs:
cs.insert(variables[idx2] - variables[idx1] >= 0)
generators = ppl.C_Polyhedron(cs).minimized_generators()
line, point, *rays = generators
assert line.is_line()
assert point.is_point()
for ray in rays:
assert ray.is_ray()
rays = np.array([ray.coefficients() for ray in rays]).astype(float)
line = np.array(line.coefficients()).astype(float)
And here rays.shape == (1324, 100)
, so unfortunately we have quite a few more generators than n. (We also have an additional line generator, with that ppl is telling us that we can also just add a real number to all elements of x, and a point generator. I have no clue what that one is about, it looks like this: point(0/1, 0/1, 0/1, 0/1,...)
(edit It means the origin point, and uses rational numbers to show it…) and I’d be curios if someone knows more). For each individual point x\in C we can always use only n of those generators to reach it (ie set all but n coeficients to zero), but which one we are allowed to set to zero will depend on x.
Ideally we would like to find a linear bijective function that maps C to \mathbb{R}_+^n, that we can use as a transformation during sampling. (cumsum
is linear, so what you are trying to do with that should be a special case of the linear transformation). But because the number of generators is larger than n such a linear function can’t exist (if it did, than the image of the basis vectors e_i would be a smaller generating set than the one we found).
So I guess that leaves 3 options to solve this problem:
- Simply don’t map the parameter space to something that is unconstrained. You can use a custom dist or a potential to set the logp to -inf if a draw is outside the allowed parameter space (so something like
pm.Potential("constrain_u", pt.switch(pt.all(pt.geq(A @ u.ravel(), 0), 0, -inf))
and provide a starting point for the sampler that is valid. You than pretty much have to pray that the posterior is far away from the border of the valid parameter space, because otherwise the sampler will diverge.
- Use the generating set of vectors (especially if N is not too big). This is an overparametrization, so if your dataset constrains the parameters
u
very well, this will probably lead to sampling issues because of posterior correlations. You can get draws from u
using something like this (but you should probably do some prior predicive sampling to figure out good priors on lam
and const
):
lam_log = pm.Normal("lam_log", shape=len(rays))
lam = pt.exp(lam_log)
const = pm.Normal("const")
u = ((rays.T @ lam) + const * line).reshape((N, 2))
# or plain numpy to get a feel for it:
lam = np.exp(np.random.randn(len(rays)))
const = np.random.randn()
(rays.T @ lam + const * line).reshape((N, 2))
array([[-23.42348561, -19.39325472],
[-21.19383067, -12.53969962],
[-19.18474236, -0.54548926],
[-12.09226225, 11.05295243],
[ 1.92443848, 13.05133014]])
- Somehow figure out a non-linear transformation?
- Prove me wrong and show that there is a linear one after all
Personally, I’d probably try to get away with the second if N isn’t too large, and only if that doesn’t work try to do the first.