Funnel caused by scaling parameters?

I am developing a model that uses measurements of a handful of observables (~10 atomic emission-lines) to infer the radiation and chemical conditions inside a cloud of gas (~6 parameters). I want to run this model on a few-thousand batches of a few-thousand sets of observations at a time – each batch is, say, 1000-by-10. My understanding is this is feasible–if not, I may have bigger issues.

The first 3 parameters in the model (logZ, logU, Age) are very expensive to simulate, so I have pre-run a few-hundred cases over a regularly-spaced grid, and then fit a set of 10 GP surrogate models (one per observable). I believe I am satisfied with this step.

\mathcal{F}_i = ({\rm GP}_i (\log Z, \log U, {\rm Age}) + 1) ~ c_i

(c_i is a constant for each observable that lets us normalize the GP and make it computationally friendlier)

The final three parameters dictate the absolute scaling of the observables and how much the observables are dimmed by dust along the line of sight (determined by the wavelength of the atomic transition, \lambda_i):

G_i = Q ~ \mathcal{F}_i \times (\tau_V (1 - \mu))(\frac{\lambda_i}{5000})^{-1.3} \times (\tau_V \mu)(\frac{\lambda_i}{5000})^{-0.7}

\tau_V = 5 ~ \xi ~ Z_0 ~ 10^{\rm \log Z} ~ \Gamma

where \xi, Z_0 are constants, and \Gamma is how much gas & dust is blocking our view (generally between 10^{-1} and 10^2. I’ve log-transformed this parameter. \tau_V is positive and generally less than 2, \mu must lie in [0, 1], and Q is large and essentially unconstrained (in reality, there is not enough signal if \log Q is below ~47, and values higher than ~50 are rare, so I’ve log-transformed Q pre-emptively, and made a prior that seems realistic.

After 10 chains (each with tune=1000, burn=1000, draws=5000), and testing on just one set of observations (shape (1, 9)), I notice a couple problems:

  • The posterior tends to be multimodal in \rm Age, which is at this point unavoidable, I think.
  • I’m more concerned that at high values of \log Q (>50), the model tries to go for high values of \Gamma.

Here’s a pairplot (divergences in red–here, Gamma is notated logGMSD, and Q as logQH):

After upping target_accept to 0.95 and trying again, the whole thing runs about half as fast, but with about 100x fewer divergences and more effective samples. The accompanying pairplot:

trace2

The energy-plots are basically the same, and neither look problematic (widths of marginal and energy transition distributions are very similar to one another in both cases). Still, I think that I’m seeing a funnel here, and I’m wondering how I might reparametrize. I’d greatly appreciate any thoughts or advice y’all could offer.

There is definatly a funnel there (between logQH_0 and mu_0), also some multi-modality in age_0.
I would first try more informative prior that cuts out the tail area, in GP you can almost never have too informative prior :wink:

Like I said above, I think the multimodality in age_0 is pretty much unavoidable at present: at certain ages, different sources of energy dominate, which can drive atomic transitions in different proportions. There is little additional discriminating power at present that I can incorporate into a prior.

As for the funnel, I have tried a couple approaches… rearranging the third equation above, and express \Gamma in terms of \tau_V \mu and \tau_V (1 - \mu):

\tau_V = \tau_V (1 - \mu) + \tau_V \mu

\Gamma = \frac{0.2 ~ \tau_V}{\xi ~ Z_0 ~ 10^{\log Z}}

In theory, I can also use some related data to get more informative priors on \tau_V \mu, \tau_V (1 - \mu), and \log Q_H (uncertainties at ~.5, .5, and .3 levels, respectively); but for instance, my estimates of \log Q_H are almost certainly overestimates, so correcting them would require another (only weakly-constrained-by-observation) scaling parameter—which is what we were trying to avoid in the first place.

I’ve also tried exploiting some intrinsic structure in my data: we have positional information associated with each set of observations, and present theories suggest that \log Z should vary smoothly with position. So I have defined a GP that governs the positional variation of \log Z, along with the scatter about that mean trend. There is no similar theory for the other parameters, though—and in fact, \log U has recently been shown to have very little spatial correlation at all.

It could be that I am missing something (I will run once again overnight), but none of the above changes seem to be doing me a whole lot of favors (and now it’s much more difficult to debug, since I have to run a large fraction of the data at once). I’ll see what tomorrow brings. Thanks to those of you following along at home :wink: