Order statistics in PyMC3

Has anyone here done work on modelling order statistics within PyMC3? Suppose my data comes in a set of rankings (or real numbers) for n items X_1 < X_2 < … < X_n over multiple observations and im interested in determining the distribution of the ranking itself such as:

  1. given item i, what is the distribution of its ranking?
  2. what is the marginal distribution of X_i where i is the ith smallest item?

Has anyone done any work on this and would like to point me in the right direction?

1 Like

I have not fit this kind of model before, but you can try to replicate the solution in Stan http://discourse.mc-stan.org/t/thurstonian-model/1735/5

You might need the ordered transformed from here.

1 Like

OK, I cooked up something that kind of work:

import numpy as np
import pymc3 as pm
import theano.tensor as tt

# data
K = 5 # number of items being ranked
J = 20 # number of raters
yreal = np.argsort(np.random.randn(K, 1), axis=0)
print(yreal)
y = np.argsort(yreal + np.random.randn(K, J), axis=0)
print(y)

class Ordered2D(pm.distributions.transforms.ElemwiseTransform):
    name = "ordered"

    def forward(self, x):
        out = tt.zeros(x.shape)
        out = tt.inc_subtensor(out[0,:], x[0,:])
        out = tt.inc_subtensor(out[1:,:], tt.log(x[1:,:] - x[:-1,:]))
        return out
    
    def forward_val(self, x, point=None):
        x, = pm.distributions.distribution.draw_values([x], point=point)
        return self.forward(x)

    def backward(self, y):
        out = tt.zeros(y.shape)
        out = tt.inc_subtensor(out[0,:], y[0,:])
        out = tt.inc_subtensor(out[1:,:], tt.exp(y[1:,:]))
        return tt.cumsum(out, axis=0)

    def jacobian_det(self, y):
        return tt.sum(y[1:,:], axis=0)

# transformed data{
y_argsort = np.argsort(y, axis=0)

with pm.Model():
    mu_hat = pm.Normal('mu_hat', 0, 1, shape=K-1)
    # set first value to 0 to avoid unidentified model
    mu = tt.concatenate([[0.], mu_hat])
    #sd = pm.HalfCauchy('sigma', 1.)
    latent = pm.Normal('latent',
                       mu=mu[y_argsort],
                       sd=1., # using sd does not work yet
                       transform=Ordered2D(), 
                       shape=(K,J),
                       testval=np.repeat(np.arange(K)[:,None], J, axis=1))
                        # There are some problems with Ordered 
                        # right now, you need to give a testval
    trace = pm.sample()
pm.traceplot(trace, varnames=['mu_hat']);

print('The true ranking is: ')
print(yreal.flatten())
print('The Latent mean is: ')
latentmu = np.hstack(([0], pm.summary(trace, varnames=['mu_hat'])['mean'].values))
print(np.round(latentmu,2))
print('The estimated ranking is: ')
print(np.argsort(latentmu))

The Jacobian of the Ordered transformation might not be correct (awaiting for @aseyboldt to confirm as he mentioned to me once).

3 Likes

Thank you, I will check out both, we should get a version of the thurstonian model into our example notebooks repo.

1 Like

@junpenglao
Hello Thanks for this example. I am actually trying to reproduce this.
I got this error. pymc v3.8 on Window machine…

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-5266fc4ced7f> in <module>
     29     mu = tt.concatenate([[0.], mu_hat])
     30     #sd = pm.HalfCauchy('sigma', 1.)
---> 31     latent = pm.Normal('latent',
     32                        mu=mu[y_argsort],
     33                        sd=1., # using sd does not work yet

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
     45             total_size = kwargs.pop('total_size', None)
     46             dist = cls.dist(*args, **kwargs)
---> 47             return model.Var(name, dist, data, total_size)
     48         else:
     49             raise TypeError("Name needs to be a string but got: {}".format(name))

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size)
    924             else:
    925                 with self:
--> 926                     var = TransformedRV(name=name, distribution=dist,
    927                                         transform=dist.transform,
    928                                         total_size=total_size,

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\model.py in __init__(self, type, owner, index, name, distribution, model, transform, total_size)
   1653             transformed_name = get_transformed_name(name, transform)
   1654 
-> 1655             self.transformed = model.Var(
   1656                 transformed_name, transform.apply(distribution), total_size=total_size)
   1657 

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size)
    919             if getattr(dist, "transform", None) is None:
    920                 with self:
--> 921                     var = FreeRV(name=name, distribution=dist,
    922                                  total_size=total_size, model=self)
    923                 self.free_RVs.append(var)

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\model.py in __init__(self, type, owner, index, name, distribution, total_size, model)
   1368             self.tag.test_value = np.ones(
   1369                 distribution.shape, distribution.dtype) * distribution.default()
-> 1370             self.logp_elemwiset = distribution.logp(self)
   1371             # The logp might need scaling in minibatches.
   1372             # This is done in `Factor`.

~\anaconda3\envs\building_normalization\lib\site-packages\pymc3\distributions\transforms.py in logp(self, x)
    166         if logp_nojac.ndim > jacobian_det.ndim:
    167             logp_nojac = logp_nojac.sum(axis=-1)
--> 168         return logp_nojac + jacobian_det
    169 
    170     def logp_nojac(self, x):

~\anaconda3\envs\building_normalization\lib\site-packages\theano\tensor\var.py in __add__(self, other)
    126     def __add__(self, other):
    127         try:
--> 128             return theano.tensor.basic.add(self, other)
    129         # We should catch the minimum number of exception here.
    130         # Otherwise this will convert error when Theano flags

~\anaconda3\envs\building_normalization\lib\site-packages\theano\gof\op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

~\anaconda3\envs\building_normalization\lib\site-packages\theano\gof\op.py in rval()
    860 
    861         def rval():
--> 862             thunk()
    863             for o in node.outputs:
    864                 compute_map[o][0] = True

~\anaconda3\envs\building_normalization\lib\site-packages\theano\gof\cc.py in __call__(self)
   1737                 print(self.error_storage, file=sys.stderr)
   1738                 raise
-> 1739             reraise(exc_type, exc_value, exc_trace)
   1740 
   1741 

~\anaconda3\envs\building_normalization\lib\site-packages\six.py in reraise(tp, value, tb)
    701             if value.__traceback__ is not tb:
    702                 raise value.with_traceback(tb)
--> 703             raise value
    704         finally:
    705             value = None

ValueError: Input dimension mis-match. (input[0].shape[0] = 5, input[1].shape[0] = 20)

No worries. I got this in this way.

import numpy as np
import pymc3 as pm
import theano.tensor as tt

# data
K = 5 # number of items being ranked
J = 20 # number of raters
yreal = np.argsort(np.random.randn(K, 1), axis=0)
print(yreal)
y = np.argsort(yreal + np.random.randn(K, J), axis=0)
print(y)
# transformed data{
y_argsort = np.argsort(y, axis=0)

oorder=pm.distributions.transforms.Ordered()
with pm.Model():
    mu_hat = pm.Normal('mu_hat', 0, 1, shape=K-1)
    # set first value to 0 to avoid unidentified model
    mu = tt.concatenate([[0.], mu_hat])
    #sd = pm.HalfCauchy('sigma', 1.)
#     latent = pm.Normal('latent',
#                        mu=mu[y_argsort],
#                        sd=1., # using sd does not work yet
#                        transform=Ordered2D(), 
#                        shape=(K,J),
#                        testval=np.repeat(np.arange(K)[:,None], J, axis=1))
#                         # There are some problems with Ordered 
#                         # right now, you need to give a testval
    latent_probs=[]
    for j in np.arange(J):
        _latent=pm.Normal('latent%i' %j, mu[y_argsort[:,j]],1,shape=K,transform=oorder,testval=np.arange(K))
        latent_probs.append(_latent)
    #_latent=pm.Normal('latent%i' %j, mu[y_argsort[:,j]],1,shape=,transform=oorder,testval=np.arange(K))
    latent=pm.Deterministic("latent",tt.stack(latent_probs,axis=1))
    trace = pm.sample(init="adapt_diag")

You should avoid for loop as it grow the graph size in theano (and make it slow).
Here you can fix the example by changing the jacobian_det to

    def jacobian_det(self, y):
        return tt.sum(y[1:,:], axis=0, keepdims=True)  # <== keepdims to make sure broadcasting happens correctly
1 Like

I’m having trouble understanding how the model, which doesn’t have any observed RVs, is able to fit to the data

Can you help me understand this, @junpenglao ?

The model still conditioned on the information from the data with mu=mu[y_argsort], combining with ordered transformation (the latent mu need to satisfy the ordering condition after indexing) is how it fit to the data.

1 Like

Thank you so much for your response

Am I understanding correctly that the “ordered” constraint on the “latent” causes the “mu_hat” to prefer an ordering that the data finds most agreeable? i.e. if mu_hat isn’t ordered optimally, the likelihood of the samples from “latent” will be less than under the optimal ordering?

Yep - in another word the ordering is part of the likelihood that informs the posterior of latent

1 Like

Wow thanks again for the quick reply

I want to understand really well how this works, because it was puzzling me how to handle this and the lack of “observed” variable had me confused.

If I understand the “Ordered2D” transform intuitively, it basically takes the first latent dimension from “latent” as is, and then the [1:] dimensions from latent get transformed so that they represent the increase in magnitude from one dimension to the next one. Then, even though my mu_hat samples might be out of order, the “latent” samples will be ordered.

So my question is, why does the model care about the possible mis-ordering of mu_hat, if the Ordered2D transform by definition corrects this?

Sorry if my question isn’t making any sense! I’ve appreciated your help greatly so far

mu_hat is not ordered - they are priors. So you can say the model does not care about the mis-ordering of mu_hat.

Sorry. My message wasn’t very clear

I know mu_hat isn’t ordered, but the model prefers the correct ordering of mu_hat in that it prefers mu_hat to have the optimal estimate of “yreal”. i.e. the items ranked have values in mu_hat ordered in the same way as yreal once the model is fit.

When we sample mu_hat, and index it (mu[y_argsort]), we use the sampled mu_hat values as the means for the K x J normal prior “latent” from which we sample and apply the Ordered2D transform to obtain the per_rater latent values. If I understand correctly, we then score the ordered sample from “latent” in terms of the probability of such a sample under the prior. We also score the parent mu_hat sample based on the mu_hat prior.

I’m having trouble understanding how mu_hat comes to learn about “yreal” when by definition every sample from “latent” will be ordered. It’s clear to me that somehow the model penalizes the distance from the correct values for mu_hat, but I still don’t quite understand how.

Are there any other models that can be trained in such a way, or further explanation I can find for this type of model?

I had a lot of fun trying to figure this out so far and you have helped me a lot to understand how to handle ranked data in bayesian models