I am developing a model that uses measurements of a handful of observables (~10 atomic emission-lines) to infer the radiation and chemical conditions inside a cloud of gas (~6 parameters). I want to run this model on a few-thousand batches of a few-thousand sets of observations at a time – each batch is, say, 1000-by-10. My understanding is this is feasible–if not, I may have bigger issues.
The first 3 parameters in the model (logZ, logU, Age) are very expensive to simulate, so I have pre-run a few-hundred cases over a regularly-spaced grid, and then fit a set of 10 GP surrogate models (one per observable). I believe I am satisfied with this step.
\mathcal{F}_i = ({\rm GP}_i (\log Z, \log U, {\rm Age}) + 1) ~ c_i
(c_i is a constant for each observable that lets us normalize the GP and make it computationally friendlier)
The final three parameters dictate the absolute scaling of the observables and how much the observables are dimmed by dust along the line of sight (determined by the wavelength of the atomic transition, \lambda_i):
G_i = Q ~ \mathcal{F}_i \times (\tau_V (1 - \mu))(\frac{\lambda_i}{5000})^{-1.3} \times (\tau_V \mu)(\frac{\lambda_i}{5000})^{-0.7}
\tau_V = 5 ~ \xi ~ Z_0 ~ 10^{\rm \log Z} ~ \Gamma
where \xi, Z_0 are constants, and \Gamma is how much gas & dust is blocking our view (generally between 10^{-1} and 10^2. I’ve log-transformed this parameter. \tau_V is positive and generally less than 2, \mu must lie in [0, 1], and Q is large and essentially unconstrained (in reality, there is not enough signal if \log Q is below ~47, and values higher than ~50 are rare, so I’ve log-transformed Q pre-emptively, and made a prior that seems realistic.
After 10 chains (each with tune=1000, burn=1000, draws=5000), and testing on just one set of observations (shape (1, 9)), I notice a couple problems:
- The posterior tends to be multimodal in \rm Age, which is at this point unavoidable, I think.
- I’m more concerned that at high values of \log Q (>50), the model tries to go for high values of \Gamma.
Here’s a pairplot (divergences in red–here, Gamma is notated logGMSD, and Q as logQH):
After upping target_accept to 0.95 and trying again, the whole thing runs about half as fast, but with about 100x fewer divergences and more effective samples. The accompanying pairplot:

The energy-plots are basically the same, and neither look problematic (widths of marginal and energy transition distributions are very similar to one another in both cases). Still, I think that I’m seeing a funnel here, and I’m wondering how I might reparametrize. I’d greatly appreciate any thoughts or advice y’all could offer.

