IndexError: too many indices for array - Non-linear matrix factorization with missing values using Gaussian Processes - using pymc3

Viktoria_S · April 7, 2023, 4:12am

Hi everyone,

I am currently implementing the following non-linear matrix factorization (MF) approach by N. D. Lawrence et. al.. And I am getting errors for how to set up the correct model also because the matrix I want to factorize and predict has missing values.

I’ve previously implemented the PMF approach by Salakhutdinov and Mnih. Which worked well using the following model:

    # data and mask of missing values
    nan_mask_matrix = np.isnan(self.data_matrix)
    self.data_matrix[nan_mask_matrix] = self.data_matrix[~nan_mask_matrix].mean()
    self.model = pm.Model()

    with self.model as pmf:

        W = pm.MvNormal(
            "W",
            mu=0,
            tau=self.alpha_w * np.eye(dim),
            shape=(n, dim),
            testval=np.random.randn(n, dim) * std,
        )

        H = pm.MvNormal(
            "H",
            mu=0,
            tau=self.alpha_h * np.eye(dim),
            shape=(m, dim),
            testval=np.random.randn(m, dim) * std,
        )

        R = pm.Normal("R", mu=(W @ H.T)[~nan_mask_matrix], tau=self.alpha, observed=self.data_matrix[~nan_mask_matrix])

Now, when trying to adapt my approach to non-linear MF, my idea is to replace the mean of R, the conditional distribution over observed values, by a matrix F consisting of samples from a Gaussian process (GP) prior (instead of being a matrix multiplication between W and H.T).

The current mean has dimensions: mdim times ndim. Now the goal is that the function F will eventually have ndim priors of length mdim. Starting with just one prior, therefore changing R to

r = pm.Normal(“r”, mu=f[~nan_mask_one_column], tau=self.alpha, observed=self.data_one_column[~nan_mask_one_column])

following Implementations works well, using this piece of code:

    self.data_one_column = self.data_matrix[1,:]
    nan_mask_one_column = nan_mask_matrix[1,:]

    with self.model as pmf:
        # Parameters of GP
        lengthscale = 0.1
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=1)

        # Add white noise to stabilise
        cov += pm.gp.cov.WhiteNoise(1e-6)

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)

        # Input x to create the GP prior of dimension m
        x = np.linspace(0, 1, m)[:, None]
        f = gp.prior("f", X=x)

        # Marginal Likelihood
        r = pm.Normal("r", mu=f[~nan_mask_one_column], tau=self.alpha, observed=self.data_one_column[~nan_mask_one_column])

Now, the problem occurs when I try to extend it to a matrix.
This is my current attempt:

    with self.model as pmf:
        # Parameters of GP
        # m: Number of GP priors
        lengthscale = list(0.1*np.ones((m,)))
        cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=m, active_dims=list(range(0, m)))

        # Add white noise to stabilise
        cov += pm.gp.cov.WhiteNoise(1e-6)

        # GP latent variable model, assume 0 mean
        gp = pm.gp.Latent(cov_func=cov)

        # Input x to create the GP priors of needed dimension
        x = np.linspace(0, 1, m)[:, None]
        for j in np.arange(1, m):
            x2 = np.linspace(0, 1, m)[:, None]
            x = np.hstack((x, x2))

        F = gp.prior("F", X=x)

        # Regression model
        R = pm.Normal("R", mu=F.T[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask], init=pm.Normal.dist(mu=0.0, sd=1.0), shape=(m,m))

And the error happens in R:

R = pm.Normal("R", mu=F.T[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask], init=pm.Normal.dist(mu=0.0, sd=1.0), shape=(m,m))

being:

raise IndexError("too many indices for array")
 IndexError: too many indices for array

Whould anyone know how to adjust R? Or is there a bigger mistake that I am making before trying to compute R?
Thanks in advance and happy Easter break:)

Viktoria_S · April 12, 2023, 3:17am

Just in case someone is interested or encounters a similar problem:
Solved by giving the GP prior the right shape and removing the transpose.

I am not sure yet, whether the logic is correct, which I am currently verifying, but the code runs and debugging is possible.

Topic		Replies	Views
Either "too many indices for array" or "array must not contain infs or NaNs" error with a GP Questions	4	2940	May 18, 2019
2d Gaussian process IndexError Questions	13	1044	June 14, 2019
Multidimensional gaussian process Questions	55	10116	October 22, 2020
Model using independent GP priors on columns of a matrix v3 gaussian_process , modeling	9	561	May 8, 2023
Avoiding looping when using GP prior on latent variables version agnostic gaussian_process	8	600	March 30, 2022

IndexError: too many indices for array - Non-linear matrix factorization with missing values using Gaussian Processes - using pymc3

Related topics