# IndexError: too many indices for array - Non-linear matrix factorization with missing values using Gaussian Processes - using pymc3

Hi everyone,

I am currently implementing the following non-linear matrix factorization (MF) approach by N. D. Lawrence et. al.. And I am getting errors for how to set up the correct model also because the matrix I want to factorize and predict has missing values.

I’ve previously implemented the PMF approach by Salakhutdinov and Mnih. Which worked well using the following model:

``````    # data and mask of missing values
self.model = pm.Model()

with self.model as pmf:

W = pm.MvNormal(
"W",
mu=0,
tau=self.alpha_w * np.eye(dim),
shape=(n, dim),
testval=np.random.randn(n, dim) * std,
)

H = pm.MvNormal(
"H",
mu=0,
tau=self.alpha_h * np.eye(dim),
shape=(m, dim),
testval=np.random.randn(m, dim) * std,
)

``````

Now, when trying to adapt my approach to non-linear MF, my idea is to replace the mean of R, the conditional distribution over observed values, by a matrix F consisting of samples from a Gaussian process (GP) prior (instead of being a matrix multiplication between W and H.T).

The current mean has dimensions: mdim times ndim. Now the goal is that the function F will eventually have ndim priors of length mdim. Starting with just one prior, therefore changing R to

following Implementations works well, using this piece of code:

``````    self.data_one_column = self.data_matrix[1,:]

with self.model as pmf:
# Parameters of GP
lengthscale = 0.1

# Add white noise to stabilise
cov += pm.gp.cov.WhiteNoise(1e-6)

# GP latent variable model, assume 0 mean
gp = pm.gp.Latent(cov_func=cov)

# Input x to create the GP prior of dimension m
x = np.linspace(0, 1, m)[:, None]
f = gp.prior("f", X=x)

# Marginal Likelihood
``````

Now, the problem occurs when I try to extend it to a matrix.
This is my current attempt:

``````    with self.model as pmf:
# Parameters of GP
# m: Number of GP priors
lengthscale = list(0.1*np.ones((m,)))
cov = pm.gp.cov.ExpQuad(ls=lengthscale, input_dim=m, active_dims=list(range(0, m)))

# Add white noise to stabilise
cov += pm.gp.cov.WhiteNoise(1e-6)

# GP latent variable model, assume 0 mean
gp = pm.gp.Latent(cov_func=cov)

# Input x to create the GP priors of needed dimension
x = np.linspace(0, 1, m)[:, None]
for j in np.arange(1, m):
x2 = np.linspace(0, 1, m)[:, None]
x = np.hstack((x, x2))

F = gp.prior("F", X=x)

# Regression model
``````

And the error happens in R:

``````R = pm.Normal("R", mu=F.T[~nan_mask], tau=self.alpha, observed=self.data[~nan_mask], init=pm.Normal.dist(mu=0.0, sd=1.0), shape=(m,m))
``````

being:

``````raise IndexError("too many indices for array")
IndexError: too many indices for array
``````

Whould anyone know how to adjust R? Or is there a bigger mistake that I am making before trying to compute R?
Thanks in advance and happy Easter break:) Just in case someone is interested or encounters a similar problem:
Solved by giving the GP prior the right shape and removing the transpose.

• I am not sure yet, whether the logic is correct, which I am currently verifying, but the code runs and debugging is possible.