Hello, I know there’s a few questions on this topic and I (believe!) I have tried all the suggested solutions but none work for me.
I’m running a simple Bayesian linear model, and want to perform inference on a 2d (data frame) that has a different number of rows to my training data. This doesn’t work - if they have the same shape it’s all fine, but as soon as I change the number of rows, well …
In my opinion I have tried everything, coords, mutable coords, adding the observed values for the sampling as a Data argument and re-setting those during inference (even though I don’t know these values of course!) - but nothing works. Any help or pointers massively appreciated, I feel this shouldn’t be so difficult?!
Minimal working example:
covariates = pd.DataFrame(np.random.randint(1,5,(38, 9)), columns=[f'feature_{i}' for i in range(9)])
coords = {'obs': covariates.index, 'features':covariates.columns}
coords_mutable = {'obs': np.arange(len(covariates))}
with pm.Model(coords=coords, coords_mutable=coords_mutable) as base_model:
covars_data = pm.Data("covars_data", covariates, dims=['obs', 'features'])
mu = pm.Normal("mu", 0, sigma=1)
sigma = pm.HalfCauchy("sigma", beta=10)
n = covariates.shape[0]
n_cov = covariates.shape[1]
intercept = pm.Normal("intercept", mu=mu, sigma=sigma, shape=n)
beta = pm.Normal("beta", mu=0, sigma=1, shape=n_cov, dims="features")
mean_val = intercept + covars_data @ beta
data = pd.DataFrame(np.random.randint(1,5,(38, 5)), columns=['a','b','c','d','e'])
endpt1_data = pm.Data(
"endpt1_data", data.a
)
se1_data = pm.Data("se1_data", data.b)
sigma_data = pm.Data("sigma_data", data.c, dims='obs')
obs2_data = pm.Data('obs2_data', data.e, dims='obs')
# Meta-analytic likelihood of endpoint 2
est2_dist = pm.Normal(
"endpt2",
mu=mean_val,
sigma=sigma_data,
observed=obs2_data,
dims='obs',
)
idata = pm.sample(100)
### prediction
covariates = pd.DataFrame(np.random.randint(1,5,(1, 9)), columns=[f'feature_{i}' for i in range(9)])
with base_model:
pm.set_data({"covars_data": covariates}, coords={'obs': np.arange(len(covariates))})
# pm.set_data({"obs2_data": [0]}, coords={'obs': np.arange(len(covariates))}) # this seems nonsense??
# pm.set_data({"sigma_data": [1]}, coords={'obs': np.arange(len(covariates))})
ppc = pm.sample_posterior_predictive(idata)
If I just delete est2_dist
from the model training this runs, and even if I keep it and just delete observed= it still runs … so that seems to be the issue!? But then how do I solve this?
And error is:
ValueError: Shape mismatch: A.shape[0] != y.shape[0] Apply node that caused the error: CGemv{no_inplace}(intercept, 1.0, covars_data, beta, 1.0) Toposort index: 0 Inputs types: [TensorType(float64, shape=(38,)), TensorType(float64, shape=()), TensorType(float64, shape=(None, None)), TensorType(float64, shape=(9,)), TensorType(float64, shape=())] Inputs shapes: [(38,), (), (1, 9), (9,), ()] Inputs strides: [(8,), (), (72, 8), (8,), ()] Inputs values: ['not shown', array(1.), 'not shown', 'not shown', array(1.)]
Thank you!