I’m trying to implement the Bayesian hierarchical correlated t-test from Corani et. al. (see: https://link.springer.com/content/pdf/10.1007%2Fs10994-017-5641-9.pdf) in which they build a generative model describing the generation of an array of performance measures for two classifiers across multiple datasets.
In short, the model looks as follows:
Given an array of observations of shape (n_datasets, n_obs), where each column holds measurement of performance for a given train/test split, and each row refers to the observations for a given dataset, the generative model is as follows:
Assume 3 datasets and 5 observations of performance, i.e. arr.shape=(3, 5).
For each dataset (i.e. each row) we have;
x ~ MVN(mu, cov)
mu ~ t(mu_0, sigma_0, nu)
sigma ~ unif(0, 100),
Where x is the vector of observations (i.e. 5 dimensional) for the given dataset which are jointly multivariate-normal distributed with the same mean, mu, and the same variance, sigma^{2}, and the same correlation, p.
More specifically the mean is a vector of size 5 whose elements are all equal to mu and the covariance matrix is defined to have diagonal elements sigma^{2} and out-of-diagonal elements p * sigma^{2}, with the correlation, p, fixed a priori.
My question is: how do I define such a covariance matrix, i.e. one based on a random variable sampled from a uniform distribution and the given correlation? I would like to be able to sample the sigma and then create the covariance matrix which will then be fed to the multivariate likelihood. Would a good way to do this be:
Nice, thank you. I’ve ended up using Theano’s Cholesky decomposition and feeding that which seems to work nicely.
Looking at the implementation in Stan it seems like they are able to vectorise the operation across the different datasets (vectorise the computation of the likelihood over the n datasets) while I’m currently running for loops over each dataset which makes it pretty slow. Would you have any suggestions as to how best I go about vectorising across the datasets?
Finally, the model contains a Gamma prior with associated uniform hyper priors for the alpha and beta parameters. I’m however getting a fairly large MC error (specifically for the Gamma distributed parameter) on the inference due to that (the posterior over the alpha and beta doesn’t behave nicely). Is this, by any chance, a common issue with uniform distributions due the the truncations at the boundaries of the uniform distributions when running NUTS? (Interestingly though the models seems to give me results which are good, maybe indicating that such hyper priors are not strictly necessary for the data I’m using)
Yes especially if the posterior density is concentrated near the boundary - after transformation you end up with a lot of extreme (large) value. A general strategy here is to assign more informative prior (avoid Uniform). See for example Prior Choice Recommendations · stan-dev/stan Wiki · GitHub
Thanks for the prompt reply!
About the vectorisation:
If I understand correctly what you propose is to write a custom likelihood function which aggregates the indpendent likelihoods and run them using the scan function finally summing the log likelihoods - I’ll try do that. Should it not be possible, however, to use the MVN distribution with a given shape parameter and the corresponding multidimensional location and scale parameters?
Thanks for the tip on the priors, I’ll see whether that helps!