I have a model based on this answer that I am using to perform a linear regression between environmental observations. The model works fine, but does not allow me to predict on test data with a different length, due to both the reasons suggested here.
Removing the deterministic part seems fine, but not the explicit shape part. How could I reformulate this model to still capture the errors-in-variables and allow for prediction. Any help would be much appreciated.
Training data:
# True parameter values
alpha_true = 0
beta_true = 2
# Size of dataset
size = 100
# True data
x_true = np.array(np.linspace(-5,5,size))
y_true = alpha_true + beta_true * x_true
# Add noise to data
x = x_true + np.random.normal(loc=0, scale=1, size=size)
y = y_true + np.random.normal(loc=0, scale=1, size=size)
Model
x_in = shared(x)
y_in = shared(y)
with pm.Model() as model:
err_odr = pm.HalfNormal('err_odr', 5.)
err_param = pm.HalfNormal('err_param', 5.)
a = pm.Normal('intercept', 0, err_param)
b = pm.Normal('slope', 0, err_param)
x_lat = pm.Normal('x_lat', 0, 5., shape=x.shape[0])
x_obs = pm.Normal('x_obs', mu=x_lat, sd=err_odr, observed=x_in, shape=x.shape[0])
y_lat = a + b * x_lat
y_obs = pm.Normal('y_obs', mu=y_lat, sd=err_odr, observed=y_in)
trace = pm.sample(2000, tune=2000, cores=1)