Inference parallelization using gpu


I’m kinda new to pyMC, and so far I’ve been able to use it a do some Bayesian fitting with it.
In my case I have a simple model:

with basic_model:
        # Priors for unknown model parameters
        a = pm.Uniform("a", 0, 10)
        alpha = pm.Uniform("alpha", 0, 5)
        b = pm.Uniform("b", 0, 3000)
        # Model of the spectral density
        Sj =  a*frequency**(-alpha) + b
        # Likelihood
        likelihood ="likelihood", scale=Sj, observed=periodogram[:,i,j]), keepdims=True)
        trace = pm.sample(4000, tune=400, cores=1, chains=4, progressbar=False, nuts_sampler='blackjax')

frequency = np.linspace(1.2e-5, 1/120, 2000)
periodogram is an array with shape (2000, 2048, 2048)

Basically, I’m fitting a simple power-law model (Sj) to my 2048*2048 periodograms (in reality I only do it for like 2.5M points, not all of them).

With nutpie (cpu) or the gpu back-ends I have a quite fast sampling, so I think I need to do some parallelization.
On cpu I tried the multiprocessing library and works fine but I don’t have access to that many cpu cores to really speed things up (I would like to get this huge arrays done in less than half a day if possible, right now even with cpu multiprocessing it takes like 3 days). On the other hand, I have access to 2 NVIDIA RTX A6000 gpus, is there a native way to perform parallelization of the calculations with them with PyMC?

The question comes from seeing many people using gpu for huge datasets, but in my case I have a lot of small independent data, that need a different model fitting. So I wonder if gpu parallelization is doable.

I’m happy to provide any additional information and to follow any advise you will have.