How to uniformly sample two variables in a lower triangular area (y<x)?

I am sampling multiple parameters associated with a 3D Gaussian:

with pm.Model() as model:
    
    mu_1 =  pm.Uniform('mu_1', lower=0,upper=1) 
    mu_2 =  pm.Uniform('mu_2', lower=0,upper=1) 
    mu_3 = pm.Uniform('mu_3', lower=-1,upper=1)
    std_1 = pm.Uniform('std_1',lower=0.01,upper=1.0) 
    std_2 = pm.Uniform('std_2',lower=0.01,upper=1.0)
    std_3 = pm.Uniform('std_3',lower=0.01,upper=1.0)
    rho = pm.Uniform('rho', lower=-1,upper=1) 
        
    ##### rest of the model code afterwards... 

My problem is that I want mu_2 <= mu_1 whereas the above code assumes a uniform prior over 0-1 for both independently (so you could end up with mu_2 > mu_1). mu_3 can be anything between -1 to 1. What I want is illustrated in the following figure: if you plot mu_2 vs. mu_1, the sampled points should always fall below the one-to-one diagonal line, i.e., uniformly in the lower triangular area.

Does PyMC have such a lower triangular Uniform prior that’s easy to use to enforce mu_2 <= mu_1?

For example, can I simply use mu_1 as the upper bound for mu_2 such that in any iteration, the sampler first picks mu_1, then chooses mu_2 to be lower or is that unallowed / ill-behaved? My guess is this would require sequential sampling whereas I want to use NUTS so I need something else that jointly draws (mu_1,mu_2) from a uniform lower triangular support.

    mu_1 =  pm.Uniform('mu_1', lower=0,upper=1) 
    mu_2 =  pm.Uniform('mu_2', lower=0,upper=mu_1) 
1 Like

Sampling from a triangle without a builtin sampler/distribution is a bit tricky. Your naive idea can actually be sampled with NUTS but you will probably get biased samples towards the (0,0) corner.

I also tried to just use the ordered transform, but it still doesn’t look uniform.

I got interesting pattern by sampling from a uniform Dirichlet with 3 and ignoring one of the dimensions. This is not what you wanted, but it’s a triangle :smiley:

The best though was by sorting two unconstrained uniforms and calling that my variable.

1 Like

Oh nice on the Dirichlet! How/why does that work? Does it mean I’m sampling 1 extra unnecessary variable, and will that mess up my convergence / increase runtime?

Another idea: here is a formula that generates two random uniform numbers and transforms them to return a point uniformly within a triangle defined by its three vertices: python - Generate random locations within a triangular domain - Stack Overflow

In our case of a lower triangle, A = (0,0), B = (1,0), C = (1,1)

I don’t think it actually works, but the nice looking result comes from the fact that the values can’t sum to more than one.

Ah, okay in your case of sorting two uniforms, how do I actually access/use mu_1 and mu_2 inside the model? Would that work forme? For example I need to do:

with pm.Model() as m:
    x_latent = pm.Uniform("x_latent", 0, 1, shape=(2,))
    x = pm.Deterministic("x", x_latent.sort())

    ### set up and evaluate MvNormal PDF within the rest of my model
    mu_vector = pt.stack([mu_1,mu_2,mu_3])
    
    ### now build the covariance matrix using the above stochastic RVs
    cov_matrix = pt.stack([[std_1**2, 0.0, rho*std_1*std_3], 
                           [0.0, std_2**2, 0.0],
                           [rho*std_1*std_3, 0.0, std_3**2]])    
    
    mvn = pm.MvNormal.dist(mu=mu_vector,cov=cov_matrix)

    probs_params = pt.exp(pm.logp(mvn,points)) # I need to evaluate MvNormal PDF at list of points
    


I updated my message, the Dirichlet is not what you wanted. I wonder if it can be convert to what you want by some trivial transformation.

You can use the x just like you would use latent_x.

mu_1 = x[0] if I understand you correctly

ah yes, that makes sense, let me try

If you have only 3 variables you can also define them separately and then mu is something like pt.stack([pt.min(x0, x1), pt.max(x0, x1), x2]). Doesn’t really matter how you combine the variables

that won’t lead to any weird convergence issues or problems with the gradient for NUTS?

That’s a good question and honestly I don’t know. I could imagine multi-modality arising easily.

The naive solution may also work just fine if the posterior is well informed.

Hmm yeah I’m specifically testing the impact of a uniform vs non-uniform prior over this triangular region in the limit of small sample sizes, so I need a uniform lower triangular sampling.

Why wouldn’t your Dirichlet idea work? Because of the extra unnecessary variable?

I’m running your sorted approach now. I will also try this formula based on re-scaling two uniform randoms: python - Generate random locations within a triangular domain - Stack Overflow

One last dumb question: in your sorted approach, you have an upper triangular matrix whereas I need a lower one. Is there a simple way to change x = pm.Deterministic("x", x_latent.sort()) to achieve this? Currently I am using your pt.stack([pt.min(x), pt.max(x), mu_3]) idea.

@ricardoV94 I think I may have got it based on python - Generate random locations within a triangular domain - Stack Overflow

This is just a simple algebraic transformation of two Uniform(0,1) randoms into a new point (x,y) uniformly within a triangle given its 3 vertices. For our lower triangle, the vertices are tri_A = (0,0); tri_B = (1,0); tri_C = (1,1).

    r1 =  pm.Uniform('r1', lower=0,upper=1) 
    r2 =  pm.Uniform('r2', lower=0,upper=1) 
    
    s1 = pt.sqrt(r1)

    mu_2 = tri_A[0] * (1.0 - s1) + tri_B[0] * (1.0 - r2) * s1 + tri_C[0] * r2 * s1
    mu_1 = tri_A[1] * (1.0 - s1) + tri_B[1] * (1.0 - r2) * s1 + tri_C[1] * r2 * s1

Note the reversed order of assignment of mu_2 and mu_1 (this ensures mu_2 <= mu_1 always).

NUTS was auto-assigned and is currently running.

I guess in this case r1 and r2 are latent variables?

1 Like

Nevermind, the simple algebraic transformation version is taking forever to sample for some reason…

Surprising this is so hard since it’s such a simple thing to sample 2 variables below the 1:1 diagonal line…

Edit: anyone know if I can add some kind of if/else statement I can add so that if mu_2 > mu_1, then the model just returns logp=-inf for the current parameter set? Of course this wouldn’t work with NUTS but Slice sampler is fine…

You can add a Potential, but I’m sure that will yield the same bias when sampling from the prior as your naive solution.

This need not be a problem.Your posterior draws, which is what you care about may not be biased just because the prior ones are when using mcmc

Sampling constrained variables with mcmc without a custom gibbs sampler can be quite tricky. You are thinking about how easy forward/random draws are, but for mcmc sampling I wouldn’t expect it to be easy.

Even forward draws can be tricky, such as when trying to draw uniformly from a disc: https://m.youtube.com/watch?v=4y_nmpv-9lI&t=276

Maybe this is helpful? Defining priors for dependent random variables in STAN - #2 by jsocolar - Modeling - The Stan Forums

The suggestion in model 1 is to model x2 as the fraction of x1. I think that doesn’t cut it.

The suggestion is model 2 is to add a penalty term for lower values of x1, so that the joint probability is uniform, not the conditionals, which you could do with a Potential?

As mentioned this means the marginal prior probability won’t look uniform which I think makes sense.

The potential adjustment seems to do the trick!

import pymc as pm

with pm.Model() as m:
    x1 = pm.Uniform("x1", 0, 1)
    x2 = pm.Uniform("x2", 0, x1)
    # Penalize low x1, so that joint of x1,x2 is uniform
    pm.Potential("pot", pm.math.log(x1)) 
    x = pm.Deterministic("x", pm.math.stack([x1, x2]))
    idata = pm.sample()

pm.plots.plot_pair(idata, var_names=["x"], marginals=True);
1 Like

THANK YOU @ricardoV94 ! I will start a run with this now.

My actual model assumes a simple Poisson likelihood at the end – do I need to do anything else to my model with-block for it to take into account the new Potential?

Also, can Potentials introduce convergence/divergence/slow-down issues with NUTS?

    mu_1 =  pm.Uniform('mu_1', lower=0,upper=1) 
    mu_2 =  pm.Uniform('mu_2', lower=0,upper=mu_1) 
    # Ricardo's Potential idea to penalize small mu_2 so joint (mu_1,mu_2) is uniform
    pm.Potential("pot", pm.math.log(mu_2)) 

    mu_3 = pm.Uniform('mu_3', lower=-1,upper=1)
    std_1 = pm.Uniform('std_1',lower=0.01,upper=1.0) 
    std_2 = pm.Uniform('std_2',lower=0.01,upper=1.0)
    std_3 = pm.Uniform('std_3',lower=0.01,upper=1.0)
    rho = pm.Uniform('rho', lower=-1,upper=1) 

    # use current sample of mu and std's to evaluate MvNormal at some pre-defined points
    mu_vector = pt.stack([mu_1,mu_2,mu_3])

    cov_matrix = pt.stack([[std_1**2, 0.0, rho*std_1*std_3], 
                           [0.0, std_2**2, 0.0],
                           [rho*std_1*std_3, 0.0, std_3**2]])    
    
    mvn = pm.MvNormal.dist(mu=mu_vector,cov=cov_matrix)

    probs_params = pt.exp(pm.logp(mvn,points))

    ### Other model code here to compute expected value for Poisson ### 

    # finally: define Poisson likelihood
    z_obs = pm.Poisson('z_obs', mu=mu_expected, observed=data)

    ### does Potential "pot" have to be used anywhere else after being defined?

No, it should work fine and you don’t need to use potential anywhere.

1 Like