Fitting multiple cones to 2D data, how to impose ordering constraint on parameters

I am trying to model data which comes in the form of “two cones”. You can imagine this as two cones of different negative slopes and heights positioned on a x,y grid and at every x,y the observed value is the maximum between two cones + some noise (so kind of like maximum of two mountains over a grid). Everything works good when it is just a single cone but when I tried to do double cones it initially did not work very well. Looking at the posteriors revealed the problem. Since there was no ordering constraint, in some chains the first apex coordinate parameter would sample from the first real apex coordinate and in some other chains from the second real apex.

So next what I tried to do was to impose an order on the x variables (so basically the x coordinate of the second cone would be first + positive value). In this case now some of the chains sample as expected (ac0 samples the apex coordinate with smaller x etc) and some of the chains are stuck in some possibly local optima where both apex x coordinates are very close to each other. Is there are more efficient way to re-parametrize this model?

In fact the most efficient way seems to be to run the unordered model but then go through each chain and shuffle the parameters so that they agree with each other. However this seems a bit impractical for large number of cones. So I am wondering if there is way to do it more efficiently by better priors?

import pymc as pm
import arviz as az
import numpy as np
import pytensor.tensor as ptt
from scipy.spatial.distance import cdist


def fit_cones(ncones, observed_vals, measurement_coordinates):


  with pm.Model():

    sd = pm.InverseGamma("sd", 2, 6)

    x0 = pm.Uniform("x0", -7, 7, size=(1,))
    offsets = pm.Uniform("offsets", 0, 12, size=(ncones-1,))

    xcoordinates = ptt.concatenate([x0, x0+offsets])
    ycoordinates = pm.Normal("ycoordinates", np.zeros((ncones,)),
                             10, shape=(ncones, ))

    apex_coordinates =\
      pm.Deterministic("apex_coordinates",
                       ptt.concatenate([xcoordinates[:,None],
                                        ycoordinates[:,None]], axis=1))

    heights = pm.Uniform("heights", np.zeros((ncones,)),
                         10*np.ones((ncones,)), shape=(ncones))

    slopes = 0.1 + pm.HalfNormal("slopes", np.ones((ncones,)), shape=(ncones))

    #difference between apex_coordinates and all the measurement_coordinates
    #apex coordinates is ncones x None x ndims
    #measurement coordinates is None x npoints x ndims
    dif_c = apex_coordinates[:,None,:] - measurement_coordinates[None, :, :]

    #distance between apex corodinates and all the measurement_coordinates
    #norm_dif_c is ncones x npoints
    norm_dif_c = pm.math.sqrt(pm.math.sqr(dif_c[:,:,0])
                              + pm.math.sqr(dif_c[:,:,1]))


    fitted_y = heights[:, None] - slopes[:, None]*norm_dif_c

    #fitted_y is npoints
    fitted_y = fitted_y.max(axis=0)

    #observed_vals is nrepeats x npoints
    pm.Normal("likelihood", mu=fitted_y[None,:], sigma=sd, observed=observed_vals)

    trace = pm.sample(draws=2000, tune=1000, chains=6)


  return trace


def make_cone(s, h, acs, coordinates):

  dists = cdist(acs, coordinates)
  cones = h[:,None] - dists*s[:,None]


  return np.max(np.array(cones), axis=0)


np.random.seed(139111)
apex = np.array([[0,0],[5,-2]]).T

ncones = 2
npoints = 100
nrepeats = 10
real_sd = 1

measurement_coordinates = np.random.rand(npoints,2)*10 - 5
s = np.random.rand(ncones)*0.25 + 0.75 #slopes
h = np.random.uniform(3,8, (ncones,)) #heights
acs = measurement_coordinates[np.random.randint(0, npoints+1, ncones),:]

real_vals = make_cone(s, h, acs, measurement_coordinates)
observed_vals = real_vals[None,:] + np.random.normal(0, real_sd, size=(nrepeats,npoints))

trace=\
  fit_cones(ncones, observed_vals, measurement_coordinates)

There is a transformation, pm.distributions.transforms.univariate_ordered that I think is what you are looking for. You could use it as follows:

with pm.Model() as m:
    x0 = pm.Uniform("x0", -7, 7, size=(1,))
    offsets = pm.Uniform("offsets", 0, 12, 
                         size=(ncones-1,), 
                         transform=pm.distributions.transforms.univariate_ordered,
                         initval=np.arange(ncones-1))
    xcoordinates = pt.concatenate([x0, x0+offsets])

Some gotchas:

  1. You probably have to set the initial values of the sampler yourself, because the defaults won’t respect the ordering constraint and you’ll get -inf log probability.
  2. The transform does not apply to samples generated by means that aren’t MCMC, i.e. pm.sample_prior_predictive and pm.sample_posterior_predictive. This can be illustrated as follows:
with m:
    prior = pm.sample_prior_predictive()
    idata = pm.sample()

# Check monotonicity by taking differences along the cone dimension and
# check everything is strictly positive
(prior.prior.offsets.diff(dim='offsets_dim_0') > 0).all()
>>> Out: False

(idata.posterior.offsets.diff(dim='offsets_dim_0') > 0).all()
>>> Out: True
1 Like

Hmm thanks I was not aware of this transformation. If I am going to use it however, shouldn’t I use it on for x directly? if ncones=2 then offsets is already just a 1 dimensional vector. So wouldnt it be better to get rid of offsets directly and set up a prior for x coordinates using univariate_ordered transformation?

Sure. I thought you were doing some kind of partial pooling, so I put it on the offsets. Whatever make sense for your model is the right way to go.

works nicely thanks

1 Like

One more related question, how you would you achieve the same effect for discrete variables? it seems that this transformation only works for continuous variables (I receive the some what vague error ValueError: Transformations for discrete distributions)

Transformations for discrete variables aren’t supported, that’s what the cryptic error is referring to. If it’s strictly positive, you could just take the .cumsum() of the variable? Internally, the transform just does something like np.exp(x).cumsum().

1 Like

unfortunately cumsum would not work since I am using the discrete variable as indices, it is something of the form

apex_indices = pm.Categorical("apex_indices", p=np.ones((npoints,))/npoints,
                                  size=(ncones))

a generalization of the model above to cases when there are not so many measurement points and I have a pretty good guess that the apex is one of them

Are you saying the cumsum casts the variables into float and then you can’t index with them? If that’s the case, just cast the output of cumsum back to integer with cumsum(x).astype(int)

No what I mean is that since I am using apex_indices as indices, say something of the form

measurement_coordinates[apex_indices,:]

cumsum will exceed the bounds of the array measurement_coordinates. As an example lets say measurement_coordinate is 6 x 2 and apex_indices sampled from 0 to 5 two independent numbers and it gets [2,5] then cumsum would be [2,7] with the last index exceeding the array. I guess I could try capping it with pm.math.minimum but I wonder if there is a less hacky way to get this done. One thing that comes to mind is to make the array circular but then that beats the purpose of the order constraint completely :slight_smile:

You often need to clip when sampling indicies anyway, see here.

Yea similar thing seems to have happened to me with DiscreteUniform but Categorical was working fine. Interesting. Anyways if there is no otherway I guess I will go this way than thanks.

An alternative is to model the “fractions” directly with a Dirichlet and then scale and discretize them so they can be used as indexes. That avoids the ordering and bounds issue and the parameters are a bit more interpretable as well.?

2 Likes

Is this what you were suggesting (I used Uniform instead of Dirichlet for starters)?

index_proportions = pm.Uniform("index_proportions", 0, 1,
                                  transform=pm.distributions.transforms.multivariate_ordered,
                                  initval=[0.4,0.6], size=2)
  
indices = pm.Deterministic("indices",
                           pm.math.floor(N*index_proportions))

This seems to achieve the effect I wanted (first index has most weight at 0 and decreases as it increases and vice versa for the other). At least almost always. Since there is rounding involved, even if the index_proportions satisfy x<y it is possible that pm.math.floor(x) = pm.math.floor(y). I guess when there are only two indices, one can floor the first one and ceil the other but that would not work beyond sampling two indices. In any case, I can work with the off chance of indices being the same once in a while.

I also tried Dirichlet as such:

index_proportions = pm.Dirichlet("index_proportions", a=[1/4,3/4],
                                   transform=pm.distributions.transforms.multivariate_ordered,
                                   initval=[1/4,3/4])

but this was sampling almost exclusively 0 for the first index and others for the second index.

For the Dirichlet you shouldn’t need ordering, and you should sample one more entry and do a cumsum before scaling and discretizing. If you have ten entries to select from, the Dirichlet would give you something like [0.3, 0.2, 0.5]. Cumsum it and you get [0.3, 0.5, 1.0]. You always ignore the last entry (it’s one), and this gives you indexes 3 and 5 after you multiply by 10 and round.

Careful with the a parameter, if the entries are smaller than one you get a funny sparse prior (which is why you may have gotten only zeros). To have a prior of 3/10, 2/10, 5/10, you would use a=np.array([3, 2, 5]) * concentration, with concentration > 1. Higher values of concentration correspond to more and more informed priors. Uniform would simply be [1, 1, 1].

1 Like

Got it thanks, works also very well like this. And thanks for the tip on the Dirichlet

1 Like