Predictions from Latent GP in hierarchical model appear not conditioned on anything

I’m using gp.Latent as a (top-level) prior in a hierarchical GLM to estimate parameter values in a mechanistic model of timeseries data (experimental instrument readings). The parameter estimates are well-defined and reasonable, but when I go to predict new parameters using gp.conditional the GP draws don’t appear conditioned on the estimates. Even supplying the posterior estimates as given I just get flat predictions with broad, uniform variance. However, if I then use gp.Marginal to regress a GP on the posteriors using the same priors on the hyperparameters, the GP draws fit the data nicely. I’ve put a “toy” example in this colab notebook. (Sorry, it takes ~30-60 minutes to run on colab, it was the simplest I could make it while retaining the structure.)

The “true” parameter values (as estimated without the GP) are very close to the fake ones I generate in the notebook. (Note, despited the appearance of dim2, I don’t want to use a periodic kernel; I don’t expect to observe much beyond the extents of this plot.)

Fake experimental measurements, ~50 timepoints each:

The model looks like this. I have 14 groups (reaction conditions), each of which has 2 replicates. Each group is assumed to have a typical parameter value from which individual reactions can vary slightly. The groups are defined by two components, called dim1 and dim2 below. The parameter τ (under an appropriate transform) varies very close to linearly along dim1, but has more complex behavior along dim2, hence the GP. My actual model has four parameters, each estimated with a similar hierarchical GLM structure, albeit without a GP component (for now).

S_obs = np.hstack(data.sig)
grp = data.group.values
n_grps = len(set(grp))
n_pts = data.sig.map(len)
rxn = np.hstack([int(i)*np.ones(n, dtype=int) for i,n in enumerate(n_pts)])
n_rxns = len(set(rxn))
t = np.hstack(data.time)

dim2_lst = np.sort(list(set(dim2)))[:,None]
dim2_idx = [list(dim2_lst).index(v) for v in dim2]
dim2_new = np.linspace(dim2_lst.min()-0.25,dim2_lst.max()+0.25,100)[:,None]
coord1 = theano.shared(dim1)
coord2 = theano.shared(dim2)

with pm.Model() as latent_model:
    # Dim1 behaves nearly linearly
    τ_lgt_β_dim1 = pm.Normal('τ_lgt_β_dim1', mu=0, sigma=0.5)
    
    # Dim2 behavior is more complicated, requiring a GP
    ℓ = pm.InverseGamma('ℓ', mu=1.7, sigma=0.5)
    η = pm.Gamma('η', alpha=2, beta=1)
    cov = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    gp = pm.gp.Latent(cov_func=cov)
    τ_lgt_β_dim2 = gp.prior('τ_lgt_β_dim2', X=dim2_lst)
    
    ## Group-level hyperpriors
    # Expected value of τ based on dim1 and dim2
    τ_lgt_μ_grp = pm.Deterministic('τ_lgt_μ_grp', τ_lgt_β_dim1*coord1 + τ_lgt_β_dim2[dim2_idx])
    # Non-centered parameterization of actual value of τ for each group
    τ_lgt_μ_grp_σ = pm.Exponential('τ_lgt_μ_grp_σ', lam=1)
    τ_lgt_μ_grp_nc = pm.Normal('τ_lgt_μ_grp_nc', mu=0, sigma=1, shape=n_grps)
    τ_lgt_μ = pm.Deterministic('τ_lgt_μ', τ_lgt_μ_grp_nc*τ_lgt_μ_grp_σ+τ_lgt_μ_grp)

    # Non-centered parameterization of actual value of τ for each reaction
    τ_lgt_σ = pm.Exponential('τ_lgt_σ', lam=1)
    τ_lgt_z_nc = pm.Normal('τ_lgt_z_nc', mu=0, sigma=1, shape=n_rxns)
    τ_lgt_z = pm.Deterministic('τ_lgt_z', τ_lgt_z_nc*τ_lgt_σ+τ_lgt_μ[grp])
    # Un-transform τ back to the natural scale
    τ_lgt = pm.Deterministic('τ_lgt', τ_lgt_z*scale+shift)
    τ = pm.Deterministic('τ', pm.invlogit(τ_lgt))
    
    σ = pm.Exponential('σ', lam=1)
    
    # Simplified mechanistic model
    S = 1 / (1+pm.math.exp(t/τ[rxn]))
    S_lik = pm.StudentT('S_lik', nu=3, mu=S, sigma=σ*np.exp(-7), observed=S_obs)

After running pm.sample(), I draw predictions from the Latent GP:

with latent_model:
    τ_new = gp.conditional('τ_new', Xnew=dim2_new, given={'X':dim2_lst,
                                                      'y':trace['τ_lgt_β_dim2'].mean(axis=0)[:,None],
                                                      'noise':trace['τ_lgt_β_dim2'].std(axis=0)[:,None],
                                                      'gp':gp})

    gp_draws = pm.sample_posterior_predictive([trace], vars=[τ_new], samples=1000)

The parameter estimates are very close to their true values. The estimates of τ_lgt_β_dim2 along this axis are shown as white circles below, with the respective τ_lgt_μ values as blue circles. However, gp.conditional is giving what appears to be a GP conditioned on nothing at all.

On the other hand, if I directly regress on the posterior estimates of τ_lgt_β_dim2, the result is as expected:

with pm.Model() as marginal_model:
    ℓ = pm.InverseGamma('ℓ', mu=1.7, sigma=0.5)
    η = pm.Gamma('η', alpha=2, beta=1)
    
    cov = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    grp_τ = gp.marginal_likelihood('grp_τ', X=x, y=y, noise=e)

    mp = pm.find_MAP()
    trace_marginal = pm.sample()

with marginal_model:
    τ_new = gp.conditional('τ_new', dim2_new)

    map_samples = pm.sample_posterior_predictive([mp], vars=[τ_new], samples=1000)
    full_samples = pm.sample_posterior_predictive([trace_marginal], vars=[τ_new], samples=1000)

...

My main question is, how do I get better predictions from the latent GP? Is this difference between Latent and Marginal implementations expected, or have I made a mistake?

As a side note, please let me know if you have suggestions for analyzing this sort of timeseries data with a mechanistic model better. The point-by-point likelihood evaluation seems inefficient (my real dataset ~11000 points, and I expect it to grow by two or three fold) and naive, ignorant of the inherent correlation in points (and therefore likelihoods) resulting from a given set of parameters.

1 Like

Hi John,
Welcome, and thanks for this detailed question!

Using Latent and gp.prior with a StudentT seems relevant to me, as shown in this example.
However, I think @bwengals will be able to help you more quickly than I on this :wink:

Thanks for responding quickly! Looking at that example more closely, it seems they first sample draws from the GP hyperprior at the X coordinates in the dataset. That works for me, though the result is just linearly-interpolated retrodictions:

The conditional method is where things go haywire.

(As a side note, the second application in that example samples from conditional, but then just plots the retrodictions. Maybe that should be more consistent?)

Apologies for the bump, but does anyone have suggestions here?

I tried running your collab, made a few tweaks and got:

Does this look right…? All I changed was:

with latent_model:
    τ_new = gp.conditional('τ_new', Xnew=dim2_new)
    gp_draws = pm.sample_posterior_predictive(trace, vars=[τ_new])
1 Like

Yup, that worked!

All it took was a pair of square brackets… I guess I had looked at the examples for using MAP estimates first, in which you need to pass the MAP result as a single-element list. Anyways, thanks a lot for taking the time to dig through and find that!

Great! I’m not sure why exactly the brackets messed things up… I’ll double check that. I also noticed that the docstring on the condition method for Latent were a bit wrong (they look incorrectly copy/pasted from Marginal).

The conditioning would be on gp, or X, and f. Not on y or noise, since those certainly aren’t part of the Latent GP model. Sorry for the confusion!

2 Likes