Data augmentation and latent variable

tgx · October 23, 2025, 4:43pm

Hi everyone,

I am a Bayesian beginner, and I am having some difficulty with a high-dimensional problem.

My goal is to fit a multivariate distribution (the r largest order statistics (GEVr) model) called f to an observational dataset drawn from this distribution + some Gaussian random noise. This distribution depends on 3 free parameters (noted \Theta) and takes as input a vector \mathbf{u} = (u_1,u_2,...,u_r) of size r whose components are ranked i.e., u_1>u_2>...>u_r.

I have already managed to fit it to an idealistic observational dataset (without random noise), but now I’d like to move on to a more realistic case by adding some random Gaussian noise. In this scenario, the likelihood is obtained by convolving the GEVr f with a Gaussian distribution G, which is very costly and does not seem feasible in a reasonable amount of time. In that case the likelihood is given by;
\mathcal{L}(\mathbf{U}^{obs}|\Theta) = \prod_{i = 1}^{N}\int G(\mathbf{u}^{obs}_i|\mathbf{u}) f(\mathbf{u}|\Theta)\rm{d}\mathbf{u}

I have read that a Bayesian way to solve this problem would be to avoid doing the integral and treat the vector \mathbf{u} as a latent variable and that one need to sample \mathbf{u} in addition to the rGEV free parameters. In that case the posterior would just be the product of this rGEV with the Gaussian distribution and the prior i.e.
P(\theta|\mathbf{u}) \propto \prod_{i = 1}^{N}G(\mathbf{u}^{obs}_i|\mathbf{u}) f(\mathbf{u}|\Theta)\Pi(\Theta)

Given that I have N = 60 measurements of size r = 5, it means that the dimension of the problem is = 303. Can PyMC handle this dimensionality?

And my other difficulty is to take into account the fact that when sampling \mathbf{u}, I must take into account that u_1>u_2>...>u_r, , is there a way to efficiently do that with PyMC?

Thank you very much for your time.

Topic		Replies	Views
Modeling multivariate distributions - pymc does poorly on highly peaked latents v3 modeling	0	340	July 3, 2023
Order statistics in PyMC3 Questions	17	3588	January 18, 2024
Mixed multivariate Gauss distribution Questions	39	5042	April 12, 2018
Sampling Error: GEV PyMC3 Questions bug	22	2325	August 7, 2022
How to apply bayesian to find posterior of multivariable? Questions	1	460	April 9, 2019

Data augmentation and latent variable

Related topics