Model using independent GP priors on columns of a matrix

Hi everyone!
I am trying to set up a model that would give me independent GP priors on the columns of a matrix. The model I have so far is the following:
m=number of columns
[nan_mask]=missing data

    with self.model as pmf:
        lengthscale = list(2.0*np.ones((m,)))
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=m, active_dims=list(range(0, m)))

        # Add white noise to stabilise
        cov += pm.gp.cov.WhiteNoise(1e-6)

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)
        x = np.linspace(0, 1, m)[:, None]
        for j in np.arange(1, m):
            x2 = np.linspace(0, 1, m)[:, None]
            x = np.hstack((x, x2))
        F = gp.prior("F", X=x, shape=(m, m))

        # Marginal Likelihood
        R = pm.Normal("R", mu=F[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask])

According to my understanding, using individual lengthscales on each active dimension in the covariance matrix, would ensure that I get an individual priors for all m input dimensions. The priors will be given by F. The way I modelled F is that F should contain m priors for m inputs of length m.
(using this homepage..

However, Is there a possibility to show that they are independent GP priors on each input x? (- The reason for my question is, that the result I get with this model is worse than expected, so I am thinking whether the model I use, despite converging, is not quite what I want to use…)

  • Thanks in advance for any comments/suggestions :slight_smile: .

What are the dimensions of self.data? And the dimensions of the input X? Would it be possible to change the code a little or include some simple input data so I could copy/paste and run it?

There are a couple small things, that might help though. One is, it’s better to use the jitter argument to pm.gp.Latent instead of adding WhiteNoise covariance. Next is, your lengthscales are fixed at 2. This might be what you want, but if you put a prior on the lengthscale parameters you might get better results. Third is, the overall scaling of the covariance is fixed at one essentially. You can write

        eta = ...
        cov = eta**2 * pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=m, active_dims=list(range(0, m)))

to allow the GP to follow the overall scale of your data.

Hi, thank you for your response and suggestions!

A minimal working example of my data:
(data here is randomly generated)

    # only working if prior args: reparametrize=False: shape of matrices: (3, 5)
    # -------------------------------------------------
    nan_mask = np.array([[False,  True,  True,  True,  True], [True, False, False, False, False], [False, False,  True,  True, False]])
    self.data = np.array([[0.00776596, -0.03106055, -0.03106055, -0.03106055, -0.03106055], [-0.03106055,  0.62549377,  0.10648936, -1.01249089,  1.12443077], [-1.15815381,  1.42086795, -0.03106055, -0.03106055, -0.55473531]])

    # working: shape of matrices: (3, 3)
    # -------------------------------------------------
    nan_mask = np.array([[False, True, True], [True, False, False], [False, False, True]])
    self.data = np.array([[0.00776596, -0.03106055, -0.03106055], [-0.03106055, 0.62549377, 0.10648936], [-1.15815381, 1.42086795, -0.03106055]])
    n, m = self.data.shape
    self.model = pm.Model()

    with self.model as pmf:
        lengthscale = []
        for j in np.arange(0, n):
            l = pm.InverseGamma(f"l{j}", alpha=2.0, beta=1.0)
            lengthscale.append(l)
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=n, active_dims=list(range(0, n)))

        # Add white noise to stabilise
        cov += pm.gp.cov.WhiteNoise(1e-6)

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)
        x = np.linspace(0, 1, m)[:, None]
        for j in np.arange(1, n):
            x2 = np.linspace(0, 1, m)[:, None]
            x = np.hstack((x, x2))
        # kernel_plot = plt.imshow(cov(x).eval(), cmap="inferno", interpolation="none")
        # plt.colorbar(kernel_plot)
        # plt.savefig("test_kernel2.png")
        F = gp.prior("F", X=x, shape=(n,m))
        
        # Marginal Likelihood
        R = pm.Normal("R", mu=F[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask])

While creating the minimal example I am getting another error in F, if the matrices I am using are not squared:

ValueError: shapes (5,5) and (3,5) not aligned: 5 (dim 1) != 3 (dim 0)

The error is resolved, when I use

F = gp.prior("F", X=x, reparameterize=False, shape=(n, m))

Is there any other disadvantage of using reparametrize=False, except of that it is less efficient? The reason it does not work must be because my matrix is not square anymore, because the Cholesky decomposition is only possible for square matrices? My goal is to create n independent priors for each row. I believe my cov matrix is modelled correctly, as according to the documentation, modelling it in this way it can be also used for ARD (- putting independent lengthscales on individual weights), doesn’t it? I am unsure about the prior though. Thanks again, I appreciate your help! :slight_smile: (And I am also about to incorporate your recommended changes regarding cov, I am just trying to fix the shape bug first.)

I am new to the discussion forum, I might have posted my answer in a way you did not see it.

Hi, thank you for your response and suggestions!

A minimal working example of my data:
(data here is randomly generated)

    # only working if prior args: reparametrize=False: shape of matrices: (3, 5)
    # -------------------------------------------------
    nan_mask = np.array([[False,  True,  True,  True,  True], [True, False, False, False, False], [False, False,  True,  True, False]])
    self.data = np.array([[0.00776596, -0.03106055, -0.03106055, -0.03106055, -0.03106055], [-0.03106055,  0.62549377,  0.10648936, -1.01249089,  1.12443077], [-1.15815381,  1.42086795, -0.03106055, -0.03106055, -0.55473531]])

    # working: shape of matrices: (3, 3)
    # -------------------------------------------------
    nan_mask = np.array([[False, True, True], [True, False, False], [False, False, True]])
    self.data = np.array([[0.00776596, -0.03106055, -0.03106055], [-0.03106055, 0.62549377, 0.10648936], [-1.15815381, 1.42086795, -0.03106055]])
    n, m = self.data.shape
    self.model = pm.Model()

    with self.model as pmf:
        lengthscale = []
        for j in np.arange(0, n):
            l = pm.InverseGamma(f"l{j}", alpha=2.0, beta=1.0)
            lengthscale.append(l)
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=n, active_dims=list(range(0, n)))

        # Add white noise to stabilise
        cov += pm.gp.cov.WhiteNoise(1e-6)

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)
        x = np.linspace(0, 1, m)[:, None]
        for j in np.arange(1, n):
            x2 = np.linspace(0, 1, m)[:, None]
            x = np.hstack((x, x2))
        # kernel_plot = plt.imshow(cov(x).eval(), cmap="inferno", interpolation="none")
        # plt.colorbar(kernel_plot)
        # plt.savefig("test_kernel2.png")
        F = gp.prior("F", X=x, shape=(n,m))
        
        # Marginal Likelihood
        R = pm.Normal("R", mu=F[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask])

While creating the minimal example I am getting another error in F, if the matrices I am using are not squared:

ValueError: shapes (5,5) and (3,5) not aligned: 5 (dim 1) != 3 (dim 0)

The error is resolved, when I use

F = gp.prior("F", X=x, reparameterize=False, shape=(n, m))

Is there any other disadvantage of using reparametrize=False, except of that it is less efficient? The reason it does not work must be because my matrix is not square anymore, because the Cholesky decomposition is only possible for square matrices? My goal is to create n independent priors for each row. I believe my cov matrix is modelled correctly, as according to the documentation, modelling it in this way it can be also used for ARD (- putting independent lengthscales on individual weights), doesn’t it? I am unsure about the prior though. Thanks again, I appreciate your help! :slight_smile: (And I am also about to incorporate your recommended changes regarding cov, I am just trying to fix the shape bug first.)

Sorry! no you posted it right, I just lost track of it. Thanks for the reminder!

The thing with reparametrize=False means it uses pm.MvNormal under the hood. It’s pretty much always better to have reparameterize=True. Try this code below and let me know if its what you need. I changed a few things:

nan_mask = np.array([[False,  True,  True,  True,  True], [True, False, False, False, False], [False, False,  True,  True, False]])
data = np.array([[0.00776596, -0.03106055, -0.03106055, -0.03106055, -0.03106055], [-0.03106055,  0.62549377,  0.10648936, -1.01249089,  1.12443077], [-1.15815381,  1.42086795, -0.03106055, -0.03106055, -0.55473531]])

n, m = data.shape
x = np.linspace(0, 1, m)[:, None]

with pm.Model() as pmf:
    eta = pm.Exponential("eta", lam=0.1) # set this for your case
    lengthscale = pm.InverseGamma("lengthscale", alpha=2, beta=1) # shared lengthscale
    cov_func = eta**2 * pm.gp.cov.ExpQuad(input_dim=1, ls=lengthscale) # input_dim = number of columns of x

    # I think the model is that each column of data is a sample from the same GP
    # data[:, i] ~ GP(0, K)
    # if so, that means K is the same for each column of your data.  Is that right?  
   
    # unfortunately our current GP implementation assumes the data is *one* sample 
    # from a GP, but fortunately it's easy to code it from scratch
    Kxx = pm.gp.util.stabilize(cov_func(x), 1e-6)
    L = pm.gp.util.cholesky(Kxx)
    v = pm.Normal("v", mu=0.0, sigma=1.0, size=(m, n))
    F = pm.Deterministic("F", (L @ v).T) 
    
    # Likelihood
    sigma = pm.HalfNormal("sigma", sigma=10)
    R = pm.Normal("R", mu=F[~nan_mask], sigma=sigma, observed=data[~nan_mask])

Basically, it sounds like the model is

\begin{align} F_j &\sim GP(0, k(x, x')) \\ Y_j &\sim N(F_j, \sigma^2) \end{align}

where I’m using j to mean column. So Y_j is the $j$th column of Y. Notice there’s no subscript on the covariance function k(x, x'). That means it’s the same covariance matrix for each column of Y, which also means there’s one single lengthscale parameter. Is that what you’re meaning to do? I gathered that from how you were stacking x, but that might be because you’re making an artificial example to post of course. Let us know if this is right, or if your inputs x have more than one column.

Does that help?

Thank you so much for your help! Just for the records, in case others might face similar issues: Using pymc3 gave me following errors:
For v:

super().__init__(shape, dtype, defaults=defaults, *args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'size'

and Kxx

 Kxx = pm.gp.util.stabilize(cov_func(x), 1e-6)
 TypeError: stabilize() takes 1 positional argument but 2 were given
  • Using the latest version pymc 5.3.0 works without any issues

My goal was to have individual lengthscales for each row of F. But you are right, the way I stacked F, that would be columns. - I guess computing individual cov_funcs for each row/column and then in the end giving v the shape of either (1, n) or (m, 1) will result in having individual Fs for either each row or each column? - I will try that and let you know how it goes - Thanks again :slight_smile:

Right, I should have specified I was testing this out in v5.2.0!

My goal was to have individual lengthscales for each row of F

For that case, you’d think of them as different GPs, because the covariance function for each is different. How many columns do you have? In this case you could add the GPs in a loop – we don’t have a way of batching K unfortunately. Then it might look a bit more like:

nan_mask = np.array([[False,  True,  True,  True,  True], [True, False, False, False, False], [False, False,  True,  True, False]])
data = np.array([[0.00776596, -0.03106055, -0.03106055, -0.03106055, -0.03106055], [-0.03106055,  0.62549377,  0.10648936, -1.01249089,  1.12443077], [-1.15815381,  1.42086795, -0.03106055, -0.03106055, -0.55473531]])

n, m = data.shape
x = np.linspace(0, 1, m)[:, None]

coords = {
    "column": [f"col_{i}" for i in range(m)], 
}

with pm.Model(coords=coords) as model:
    eta = pm.Exponential("eta", lam=0.1, dims="column") # set this for your case
    lengthscale = pm.InverseGamma("lengthscale", alpha=2, beta=1, dims="column")

    F = []
    for i in range(m):
        cov_func = eta[i]**2 * pm.gp.cov.ExpQuad(input_dim=1, ls=lengthscale[i])
        gp = pm.gp.Latent(cov_func=cov_func)
        f = gp.prior("f", X=X)
        fs.append(f[:, None])
    F = pt.concatenate(fs, axis=1) # make sure this is shaped right, same as data

Also, check out pm.gp.HSGP, for low dimensional inputs (input_dim < 3) and pm.cov.ExpQuad it can be much faster than pm.gp.Latent.

1 Like

Thank you again! - I do have quite many columns and rows, so my data matrices are quite big. But this is very helpful, also for running smaller experiments!

Hi :slight_smile: . A quick short follow-up question. To understand the implementation better, here each m has it’s own GP prior:

My initial attempt, inspired by how ARD (automatic relevance determination) is implemented, was to create an ExpQuadratic kernel with a list of lengthscales, which should give me independent priors on each row of my input:

        lengthscale = []
        for j in np.arange(0, n):
            l = pm.InverseGamma(f"l{j}", alpha=2.0, beta=1.0)
            lengthscale.append(l)
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=n, active_dims=list(range(0, n)))

And then stacking x, so that F gets a matrix as it’s input

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)
        x = np.linspace(0, 1, m)[:, None]
        for j in np.arange(1, n):
            x2 = np.linspace(0, 1, m)[:, None]
            x = np.hstack((x, x2))
        F = gp.prior("F", X=x, shape=(n,m))

Now, you mentioned earlier, that the current implementation of a GP assumes that the data is one sample from a GP. - In this case, the way I designed cov.ExpQuad, shouldn’t it give me independent priors on each row? Conceptually, doesn’t cov.ExpQuad have the possibility to create independent priors for data with multidimensional feature input, which could be represented as multiple rows of a matrix? If that is true, wouldn’t F in my case create a matrix containing samples of independent GPs with each having their own prior? (In that implementation I was using pymc3)

Oh I think I see, yes you can do both, the GP can have a multidimensional input (think timeseries → spatial → spatio temporal) and have multiple outputs. For GPs with a multidimensional input, you specify it correctly and you can do ARD:

        lengthscale = []
        for j in np.arange(0, n):
            l = pm.InverseGamma(f"l{j}", alpha=2.0, beta=1.0)
            lengthscale.append(l)

though, as an aside, you can use coords and dims to write it a bit cleaner as:

coords = {
    "gp_input_dim": ["dim1", "dim2", ...],
}

with pm.Model() as model:
    # lengthscale is a vector now
    lengthscale = pm.InverseGamma("f", alpha=2.0, beta=1.0, dims="gp_input_dim")

So then, is your observed data multidimensional too? Like, would you think of it as a set of multiple time series or a stack of spatial data sets? If so, then you’d use the solution I gave you above for that part. There’s no problem having both multiple input dimensions and multiple GPs in the same problem.