Predictions from Latent GP in hierarchical model appear not conditioned on anything

BioGoertz · June 22, 2020, 10:11pm

I’m using gp.Latent as a (top-level) prior in a hierarchical GLM to estimate parameter values in a mechanistic model of timeseries data (experimental instrument readings). The parameter estimates are well-defined and reasonable, but when I go to predict new parameters using gp.conditional the GP draws don’t appear conditioned on the estimates. Even supplying the posterior estimates as given I just get flat predictions with broad, uniform variance. However, if I then use gp.Marginal to regress a GP on the posteriors using the same priors on the hyperparameters, the GP draws fit the data nicely. I’ve put a “toy” example in this colab notebook. (Sorry, it takes ~30-60 minutes to run on colab, it was the simplest I could make it while retaining the structure.)

The “true” parameter values (as estimated without the GP) are very close to the fake ones I generate in the notebook. (Note, despited the appearance of dim2, I don’t want to use a periodic kernel; I don’t expect to observe much beyond the extents of this plot.)

Fake experimental measurements, ~50 timepoints each:

The model looks like this. I have 14 groups (reaction conditions), each of which has 2 replicates. Each group is assumed to have a typical parameter value from which individual reactions can vary slightly. The groups are defined by two components, called dim1 and dim2 below. The parameter τ (under an appropriate transform) varies very close to linearly along dim1, but has more complex behavior along dim2, hence the GP. My actual model has four parameters, each estimated with a similar hierarchical GLM structure, albeit without a GP component (for now).

S_obs = np.hstack(data.sig)
grp = data.group.values
n_grps = len(set(grp))
n_pts = data.sig.map(len)
rxn = np.hstack([int(i)*np.ones(n, dtype=int) for i,n in enumerate(n_pts)])
n_rxns = len(set(rxn))
t = np.hstack(data.time)

dim2_lst = np.sort(list(set(dim2)))[:,None]
dim2_idx = [list(dim2_lst).index(v) for v in dim2]
dim2_new = np.linspace(dim2_lst.min()-0.25,dim2_lst.max()+0.25,100)[:,None]
coord1 = theano.shared(dim1)
coord2 = theano.shared(dim2)

with pm.Model() as latent_model:
    # Dim1 behaves nearly linearly
    τ_lgt_β_dim1 = pm.Normal('τ_lgt_β_dim1', mu=0, sigma=0.5)
    
    # Dim2 behavior is more complicated, requiring a GP
    ℓ = pm.InverseGamma('ℓ', mu=1.7, sigma=0.5)
    η = pm.Gamma('η', alpha=2, beta=1)
    cov = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    gp = pm.gp.Latent(cov_func=cov)
    τ_lgt_β_dim2 = gp.prior('τ_lgt_β_dim2', X=dim2_lst)
    
    ## Group-level hyperpriors
    # Expected value of τ based on dim1 and dim2
    τ_lgt_μ_grp = pm.Deterministic('τ_lgt_μ_grp', τ_lgt_β_dim1*coord1 + τ_lgt_β_dim2[dim2_idx])
    # Non-centered parameterization of actual value of τ for each group
    τ_lgt_μ_grp_σ = pm.Exponential('τ_lgt_μ_grp_σ', lam=1)
    τ_lgt_μ_grp_nc = pm.Normal('τ_lgt_μ_grp_nc', mu=0, sigma=1, shape=n_grps)
    τ_lgt_μ = pm.Deterministic('τ_lgt_μ', τ_lgt_μ_grp_nc*τ_lgt_μ_grp_σ+τ_lgt_μ_grp)

    # Non-centered parameterization of actual value of τ for each reaction
    τ_lgt_σ = pm.Exponential('τ_lgt_σ', lam=1)
    τ_lgt_z_nc = pm.Normal('τ_lgt_z_nc', mu=0, sigma=1, shape=n_rxns)
    τ_lgt_z = pm.Deterministic('τ_lgt_z', τ_lgt_z_nc*τ_lgt_σ+τ_lgt_μ[grp])
    # Un-transform τ back to the natural scale
    τ_lgt = pm.Deterministic('τ_lgt', τ_lgt_z*scale+shift)
    τ = pm.Deterministic('τ', pm.invlogit(τ_lgt))
    
    σ = pm.Exponential('σ', lam=1)
    
    # Simplified mechanistic model
    S = 1 / (1+pm.math.exp(t/τ[rxn]))
    S_lik = pm.StudentT('S_lik', nu=3, mu=S, sigma=σ*np.exp(-7), observed=S_obs)

After running pm.sample(), I draw predictions from the Latent GP:

with latent_model:
    τ_new = gp.conditional('τ_new', Xnew=dim2_new, given={'X':dim2_lst,
                                                      'y':trace['τ_lgt_β_dim2'].mean(axis=0)[:,None],
                                                      'noise':trace['τ_lgt_β_dim2'].std(axis=0)[:,None],
                                                      'gp':gp})

    gp_draws = pm.sample_posterior_predictive([trace], vars=[τ_new], samples=1000)

The parameter estimates are very close to their true values. The estimates of τ_lgt_β_dim2 along this axis are shown as white circles below, with the respective τ_lgt_μ values as blue circles. However, gp.conditional is giving what appears to be a GP conditioned on nothing at all.

On the other hand, if I directly regress on the posterior estimates of τ_lgt_β_dim2, the result is as expected:

with pm.Model() as marginal_model:
    ℓ = pm.InverseGamma('ℓ', mu=1.7, sigma=0.5)
    η = pm.Gamma('η', alpha=2, beta=1)
    
    cov = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    grp_τ = gp.marginal_likelihood('grp_τ', X=x, y=y, noise=e)

    mp = pm.find_MAP()
    trace_marginal = pm.sample()

with marginal_model:
    τ_new = gp.conditional('τ_new', dim2_new)

    map_samples = pm.sample_posterior_predictive([mp], vars=[τ_new], samples=1000)
    full_samples = pm.sample_posterior_predictive([trace_marginal], vars=[τ_new], samples=1000)

...

My main question is, how do I get better predictions from the latent GP? Is this difference between Latent and Marginal implementations expected, or have I made a mistake?

As a side note, please let me know if you have suggestions for analyzing this sort of timeseries data with a mechanistic model better. The point-by-point likelihood evaluation seems inefficient (my real dataset ~11000 points, and I expect it to grow by two or three fold) and naive, ignorant of the inherent correlation in points (and therefore likelihoods) resulting from a given set of parameters.

AlexAndorra · June 23, 2020, 10:35am

Hi John,
Welcome, and thanks for this detailed question!

Using Latent and gp.prior with a StudentT seems relevant to me, as shown in this example.
However, I think @bwengals will be able to help you more quickly than I on this

BioGoertz · June 23, 2020, 11:01am

Thanks for responding quickly! Looking at that example more closely, it seems they first sample draws from the GP hyperprior at the X coordinates in the dataset. That works for me, though the result is just linearly-interpolated retrodictions:

The conditional method is where things go haywire.

(As a side note, the second application in that example samples from conditional, but then just plots the retrodictions. Maybe that should be more consistent?)

BioGoertz · July 1, 2020, 3:50pm

Apologies for the bump, but does anyone have suggestions here?

bwengals · July 3, 2020, 10:01pm

I tried running your collab, made a few tweaks and got:

Does this look right…? All I changed was:

with latent_model:
    τ_new = gp.conditional('τ_new', Xnew=dim2_new)
    gp_draws = pm.sample_posterior_predictive(trace, vars=[τ_new])

BioGoertz · July 6, 2020, 7:55pm

Yup, that worked!

All it took was a pair of square brackets… I guess I had looked at the examples for using MAP estimates first, in which you need to pass the MAP result as a single-element list. Anyways, thanks a lot for taking the time to dig through and find that!

bwengals · July 6, 2020, 9:15pm

Great! I’m not sure why exactly the brackets messed things up… I’ll double check that. I also noticed that the docstring on the condition method for Latent were a bit wrong (they look incorrectly copy/pasted from Marginal).

The conditioning would be on gp, or X, and f. Not on y or noise, since those certainly aren’t part of the Latent GP model. Sorry for the confusion!

Topic		Replies	Views
`conditional` and Deep Gaussian Processes v5 gaussian_process	6	958	April 17, 2023
Question regarding `gp.conditional` and `gp.prior`	5	72	March 4, 2025
Multivariate Gaussian Process Classification (GPC) and Prediction Questions	2	900	February 23, 2020
Inference without explicitly adding gp.conditional() when running sample_posterior_predictive	0	13	October 11, 2024
How do you formulate a `Latent` GP that is equivalent to the `Marginal` implementation? Questions	5	483	May 26, 2021

Predictions from Latent GP in hierarchical model appear not conditioned on anything

Related topics