Hi All,
I’m new to pymc3 and I may be making a stupid mistake but I am trying to build a Gaussian Process Regression model and feed in a 2d input. I have adapted my example from here:
I have used PCA to preprocess my input and built the following model:
number_of_pcs = 2
with pm.Model() as model:
z = pm.Gamma('z', 1, 1, shape=number_of_pcs)
nu = pm.Gamma('nu', 1, 1, shape=number_of_pcs)
K = nu * pm.gp.cov.ExpQuad(number_of_pcs, z)
mu = pm.gp.mean.Zero()
sigma = pm.HalfCauchy('sigma', 2.5)
x = features.iloc[:500, :number_of_pcs].values
y = y_df.values[:500]
y_obs = pm.gp.GP('y_obs', mean_func=mu, cov_func=K, sigma=sigma, observed={'X': x, 'Y': y})
However, I get the following error:
ValueError: Input dimension mis-match (input[0].shape[1] = 500, input[1].shape[1] = 2)
(using X=x, observed=y gives the same error)
This works when number_of_pcs = 1, but fails if > 1.
Any ideas on how I should proceed? Or any examples of a similar model that I can try to learn from?
Thank you in advance.
Hey there, this works for me in 1 or higher dimensions:
y_obs = pm.gp.GP("y_obs", mean_func=mu, cov_func=K, X=x, sigma=sigma, observed={'X': x, 'Y': y})
Notice the addition of X=x
. Does this syntax work for you? I’m on master and I’m seeing the same behavior as you. I’ll see about a patch for this asap. Thank you for posting this!
Thanks @bwengals,
Unfortunately, that does not work for me it still fails with the same error message (3.1 master). I wasn’t sure it was an issue previously (I thought maybe I had done something wrong), do you want me to open a github issue?
I tracked my issue back to line 122 of gp\cov.py.
There’s a list of two factors returned by merge_factors, but the product cannot be calculated. If I change:
K = nu * pm.gp.cov.ExpQuad(number_of_pcs, z)
to:
K = pm.gp.cov.ExpQuad(number_of_pcs, z)
then it works correctly. I tried to figure out the underlying issue but my knowledge of theano is a bit lacking.
ah! I think see it now.
nu = pm.Gamma('nu', 1, 1, shape=number_of_pcs)
K = nu * pm.gp.cov.ExpQuad(number_of_pcs, z)
nu
should be a scalar since its role is to scale the covariance matrix, but you have it defined as a vector of random variables. Try removing the shape=number_of_pcs
part.
Oh, ok I see. Thanks for all your help, I understand now (noob mistake…).
No prob! Thanks for posting. I messed with it for quite a while before noticing