Unable to make OOS predictions using a simple model

Hello, everyone. I am having an issue with getting OOS predictions from my pymc model. If I use new data it throws a value error, and if I try only setting 1 observation as the data it works but returns predictions with a shape equal to the training data.

Here is my model:

conversions = ag_d[‘conversions’].to_numpy()

conversions_sq = np.square(conversions)

scaler = StandardScaler()

conversions_sq_scaled = scaler.fit_transform(conversions_sq.reshape(-1,1)).flatten()

conversions = conversions[:, None]

admits = ag_d[‘admits’].to_numpy()

with pm.Model() as linear:
X = pm.MutableData(“X”, conversions_sq_scaled)
alpha = pm.Exponential(“alpha”, 1/10)
b0 = pm.HalfNormal(‘intercept’, sigma=4)
b1 = pm.HalfNormal(‘beta’, sigma=2)
mu = pm.Deterministic(‘mu’, var=pm.math.exp(b0 + b1*X))
l = pm.NegativeBinomial(“l”, mu=mu, alpha=alpha, observed=admits)

with linear:
idata_linear = pm.sample(tune=2000, draws=4000, target_accept=0.99)
with linear:
ppc_linear = pm.sample_posterior_predictive(idata_linear)
with linear:
pm.set_data(new_data={‘X’: new_conversions_sq_scaled})
ppc_oos = pm.sample_posterior_predictive(idata_linear)

And the error I get:

ValueError Traceback (most recent call last) File [c:\Users\j.dekermanjian\Anaconda3\envs\m_geo\lib\site-packages\pytensor\compile\function\types.py:970](file:///C:/Users/j.dekermanjian/Anaconda3/envs/m_geo/lib/site-packages/pytensor/compile/function/types.py:970), in Function.call(self, *args, **kwargs) 968 try: 969 outputs = ( → 970 self.vm() 971 if output_subset is None 972 else self.vm(output_subset=output_subset) 973 ) 974 except Exception: File [c:\Users\j.dekermanjian\Anaconda3\envs\m_geo\lib\site-packages\pytensor\graph\op.py:543](file:///C:/Users/j.dekermanjian/Anaconda3/envs/m_geo/lib/site-packages/pytensor/graph/op.py:543), in Op.make_py_thunk…rval(p, i, o, n, params) 539 @is_thunk_type 540 def rval( 541 p=p, i=node_input_storage, o=node_output_storage, n=node, params=None 542 ): → 543 r = p(n, [x[0] for x in i], o) 544 for o in node.outputs: File [c:\Users\j.dekermanjian\Anaconda3\envs\m_geo\lib\site-packages\pytensor\tensor\random\op.py:378](file:///C:/Users/j.dekermanjian/Anaconda3/envs/m_geo/lib/site-packages/pytensor/tensor/random/op.py:378), in RandomVariable.perform(self, node, inputs, outputs) 376 rng_var_out[0] = rng → 378 smpl_val = self.rng_fn(rng, *(args + [size])) 380 if ( 381 not isinstance(smpl_val, np.ndarray) 382 or str(smpl_val.dtype) != out_var.type.dtype

Inputs values: [Generator(PCG64) at 0x272252051C0, array([72], dtype=int64), array(4, dtype=int64), array(13.6946887), ‘not shown’] Outputs clients: [[‘output’], [‘output’]] HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag ‘optimizer=fast_compile’. If that does not work, PyTensor optimizations can be disabled with ‘optimizer=None’. HINT: Use the PyTensor flag exception_verbosity=high for a debug print-out and storage map footprint of this Apply node.

I also try doing:

with linear:
pm.set_data(new_data={‘X’: np.array([1.0])})
ppc_oos = pm.sample_posterior_predictive(idata_linear)

This runs but returns predictions with the same length as the training data. So I don’t think the model is using the new data.

Any advice would be appreciated. Also how do you put tabs in the code blocks?

EDIT:
Actually, it looks like when I run it with only 1 sample it is broadcasting the value to the original training data size. This must be an issue with the shape/dims.

EDIT2:
I changed my model to include the dimensions, but I am still getting the same error.
Here is what I changed:

with pm.Model() as linear:
linear.add_coord(name=‘dims’, values=np.arange(0, len(conversions_sq_scaled)), mutable=True)
X = pm.MutableData(“X”, conversions_sq_scaled)
alpha = pm.Exponential(“alpha”, 1/10)
b0 = pm.HalfNormal(‘intercept’, sigma=4)
b1 = pm.HalfNormal(‘beta’, sigma=2)
mu = pm.Deterministic(‘mu’, var=pm.math.exp(b0 + b1*X))
l = pm.NegativeBinomial(“l”, mu=mu, alpha=alpha, observed=admits, dims = ‘dims’)

with linear:
pm.set_data(new_data={‘X’: new_conversions_sq_scaled}, coords={‘dims’: np.arange(0, len(new_conversions_sq_scaled))})
ppc_oos = pm.sample_posterior_predictive(idata_linear)

EDIT3:
Okay, I was able to get it to work.
here is what I needed to change:

with pm.Model() as linear:
X = pm.MutableData(“X”, conversions_sq_scaled)
alpha = pm.Exponential(“alpha”, 1/10)
b0 = pm.HalfNormal(‘intercept’, sigma=4)
b1 = pm.HalfNormal(‘beta’, sigma=2)
mu = pm.Deterministic(‘mu’, var=pm.math.exp(b0 + b1*X))
l = pm.NegativeBinomial(“l”, mu=mu, alpha=alpha, observed=admits, shape=X.shape)
with linear:
pm.set_data(new_data={‘X’: new_conversions_sq_scaled})
ppc_oos = pm.sample_posterior_predictive(idata_linear)

Can someone explain to me why the shape worked but the dims did not?