How to set the covariance matrix in hierarchical correlated t-test

I’m trying to implement the Bayesian hierarchical correlated t-test from Corani et. al. (see: https://link.springer.com/content/pdf/10.1007%2Fs10994-017-5641-9.pdf) in which they build a generative model describing the generation of an array of performance measures for two classifiers across multiple datasets.
In short, the model looks as follows:
Given an array of observations of shape (n_datasets, n_obs), where each column holds measurement of performance for a given train/test split, and each row refers to the observations for a given dataset, the generative model is as follows:
Assume 3 datasets and 5 observations of performance, i.e. arr.shape=(3, 5).
For each dataset (i.e. each row) we have;

x ~ MVN(mu, cov)
mu ~ t(mu_0, sigma_0, nu)
sigma ~ unif(0, 100),

Where x is the vector of observations (i.e. 5 dimensional) for the given dataset which are jointly multivariate-normal distributed with the same mean, mu, and the same variance, sigma^{2}, and the same correlation, p.
More specifically the mean is a vector of size 5 whose elements are all equal to mu and the covariance matrix is defined to have diagonal elements sigma^{2} and out-of-diagonal elements p * sigma^{2}, with the correlation, p, fixed a priori.

My question is: how do I define such a covariance matrix, i.e. one based on a random variable sampled from a uniform distribution and the given correlation? I would like to be able to sample the sigma and then create the covariance matrix which will then be fed to the multivariate likelihood. Would a good way to do this be:

cov = math.dot(np.eye(n_obs), math.sqr(sigma))) +
math.dot(np.ones((n_obs, n_obs)) - np.eye(n_obs), p * math.sqr(sigma))

This is my first question on this forum so apologies if I’m not formulating the question in the correct manner, happy to amend if required.
Thanks!

I think your code snippet is a good direction, however, I would recommend you to compare your implementation with the Stan code: https://github.com/BayesianTestsML/tutorial/blob/master/hierarchical/stan/hierarchical-t-test.stan
(you will need to translate the stan code into pymc3 first - feel free to follow up)

Nice, thank you. I’ve ended up using Theano’s Cholesky decomposition and feeding that which seems to work nicely.

Looking at the implementation in Stan it seems like they are able to vectorise the operation across the different datasets (vectorise the computation of the likelihood over the n datasets) while I’m currently running for loops over each dataset which makes it pretty slow. Would you have any suggestions as to how best I go about vectorising across the datasets?

Finally, the model contains a Gamma prior with associated uniform hyper priors for the alpha and beta parameters. I’m however getting a fairly large MC error (specifically for the Gamma distributed parameter) on the inference due to that (the posterior over the alpha and beta doesn’t behave nicely). Is this, by any chance, a common issue with uniform distributions due the the truncations at the boundaries of the uniform distributions when running NUTS? (Interestingly though the models seems to give me results which are good, maybe indicating that such hyper priors are not strictly necessary for the data I’m using)

You can vectorized in a similar way in Theano, see eg https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html

Yes especially if the posterior density is concentrated near the boundary - after transformation you end up with a lot of extreme (large) value. A general strategy here is to assign more informative prior (avoid Uniform). See for example Prior Choice Recommendations · stan-dev/stan Wiki · GitHub

1 Like

Thanks for the prompt reply!
About the vectorisation:
If I understand correctly what you propose is to write a custom likelihood function which aggregates the indpendent likelihoods and run them using the scan function finally summing the log likelihoods - I’ll try do that. Should it not be possible, however, to use the MVN distribution with a given shape parameter and the corresponding multidimensional location and scale parameters?

Thanks for the tip on the priors, I’ll see whether that helps!

I’ve managed to get a 98% reduction in sampling time by defining my own custom loss function and doing some linear algebra. Thanks for the tips!

1 Like

Linear algebra never disappoint :wink: