BART out of sample with adding lines to pandas df

I can put my data with ease into the BART bikes example and run it like it is shown with a train test split. But when I train the model with all 400 values and try to predict with it beyond that, I get an error. If somebody knows what causes the code to break, it would be good!

features = ["MFI", "RSI", "ATR"]

X_train = sales[features]
Y_train = sales["Close"]

X_test = sales[features][-1::450]
Y_test = sales["Close"][-1::450]

RANDOM_SEED = 5781
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")

with pm.Model() as model_oos_ts:
   X = pm.MutableData("X", X_train)
   Y = Y_train
   α = pm.Exponential("α", 1 / 10)
   μ = pmb.BART("μ", X, Y)
   y = pm.NegativeBinomial("y", mu=pm.math.abs(μ), alpha=α, observed=Y, shape=μ.shape)
   idata_oos_ts = pm.sample(random_seed=RANDOM_SEED)
   posterior_predictive_oos_ts_train = pm.sample_posterior_predictive(
       trace=idata_oos_ts, random_seed=RANDOM_SEED
   )

with model_oos_ts:
   X.set_value(X_test)
   posterior_predictive_oos_ts_test = pm.sample_posterior_predictive(
       trace=idata_oos_ts, random_seed=RANDOM_SEED
   )

Sampling: [y, μ]

0.00% [0/2000 00:00<00:00]
---------------------------------------------------------------------------

Which leads to the following outcome:


ValueError                                Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:972, in Function.__call__(self, *args, **kwargs)
   970 try:
   971     outputs = (
--> 972         self.vm()
   973         if output_subset is None
   974         else self.vm(output_subset=output_subset)
   975     )
   976 except Exception:

ValueError: Not enough dimensions on input.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [32], in <cell line: 1>()
     1 with model_oos_ts:
     2     X.set_value(X_test)
----> 3     posterior_predictive_oos_ts_test = pm.sample_posterior_predictive(
     4         trace=idata_oos_ts, random_seed=RANDOM_SEED
     5     )

File ~\anaconda3\lib\site-packages\pymc\sampling\forward.py:644, in sample_posterior_predictive(trace, model, var_names, sample_dims, random_seed, progressbar, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
   639 # there's only a single chain, but the index might hit it multiple times if
   640 # the number of indices is greater than the length of the trace.
   641 else:
   642     param = _trace[idx % len_trace]
--> 644 values = sampler_fn(**param)
   646 for k, v in zip(vars_, values):
   647     ppc_trace_t.insert(k.name, v, idx)

File ~\anaconda3\lib\site-packages\pymc\util.py:390, in point_wrapper.<locals>.wrapped(**kwargs)
   388 def wrapped(**kwargs):
   389     input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 390     return core_function(**input_point)

File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:985, in Function.__call__(self, *args, **kwargs)
   983     if hasattr(self.vm, "thunks"):
   984         thunk = self.vm.thunks[self.vm.position_of_error]
--> 985     raise_with_op(
   986         self.maker.fgraph,
   987         node=self.vm.nodes[self.vm.position_of_error],
   988         thunk=thunk,
   989         storage_map=getattr(self.vm, "storage_map", None),
   990     )
   991 else:
   992     # old-style linkers raise their own exceptions
   993     raise

File ~\anaconda3\lib\site-packages\pytensor\link\utils.py:536, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
   531     warnings.warn(
   532         f"{exc_type} error does not allow us to add an extra error message"
   533     )
   534     # Some exception need extra parameter in inputs. So forget the
   535     # extra long error message in that case.
--> 536 raise exc_value.with_traceback(exc_trace)

File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:972, in Function.__call__(self, *args, **kwargs)
   969 t0_fn = time.perf_counter()
   970 try:
   971     outputs = (
--> 972         self.vm()
   973         if output_subset is None
   974         else self.vm(output_subset=output_subset)
   975     )
   976 except Exception:
   977     restore_defaults()

ValueError: Not enough dimensions on input.
Apply node that caused the error: Elemwise{Composite{(i0 / (Abs(i1) + i0))}}[(0, 1)](InplaceDimShuffle{x}.0, μ)
Toposort index: 2
Inputs types: [TensorType(float64, (1,)), TensorType(float64, (?,))]
Inputs shapes: [(1,), ()]
Inputs strides: [(8,), ()]
Inputs values: [array([573.82329429]), array(13290.87321969)]
Outputs clients: [[nbinom_rv{0, (0, 0), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x1F8CFBAD4A0>), MakeVector{dtype='int64'}.0, TensorConstant{4}, α, Elemwise{Composite{(i0 / (Abs(i1) + i0))}}[(0, 1)].0)]]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.



import arviz as az

with model_oos_ts:
   az.plot_trace(idata_oos_ts, var_names=["μ"]);
1
import arviz as az

I was trying with predict as well - as it is mentioned in another related question. I installed BART after this question was made. So I do not think it is because of the version.

rng = np.random.RandomState(342)
a = pmb.predict(idata, rng, X.values, 100)
b = pmb.predict(idata, rng, X.values, 100)

as it is shown in another question I get -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [36], in <cell line: 2>()
     1 rng = np.random.RandomState(342)
----> 2 a = pmb.predict(idata, rng, X.values, 100)
     3 b = pmb.predict(idata, rng, X.values, 100)

AttributeError: module 'pymc_bart' has no attribute 'predict'

Which would otherwise be a good solution for me.

From reading more questions in the forum I think that it has something to do with the shape.

No I was wondering if it is possible to add empty lines to the intial pandas dataframe and split it up from the end of the regular data to get a prediction as an intermediate or dummy solution?

This is my idea with pandas so far: