I can put my data with ease into the BART bikes example and run it like it is shown with a train test split. But when I train the model with all 400 values and try to predict with it beyond that, I get an error. If somebody knows what causes the code to break, it would be good!
features = ["MFI", "RSI", "ATR"]
X_train = sales[features]
Y_train = sales["Close"]
X_test = sales[features][-1::450]
Y_test = sales["Close"][-1::450]
RANDOM_SEED = 5781
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")
with pm.Model() as model_oos_ts:
X = pm.MutableData("X", X_train)
Y = Y_train
α = pm.Exponential("α", 1 / 10)
μ = pmb.BART("μ", X, Y)
y = pm.NegativeBinomial("y", mu=pm.math.abs(μ), alpha=α, observed=Y, shape=μ.shape)
idata_oos_ts = pm.sample(random_seed=RANDOM_SEED)
posterior_predictive_oos_ts_train = pm.sample_posterior_predictive(
trace=idata_oos_ts, random_seed=RANDOM_SEED
)
with model_oos_ts:
X.set_value(X_test)
posterior_predictive_oos_ts_test = pm.sample_posterior_predictive(
trace=idata_oos_ts, random_seed=RANDOM_SEED
)
Sampling: [y, μ]
0.00% [0/2000 00:00<00:00]
---------------------------------------------------------------------------
Which leads to the following outcome:
ValueError Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:972, in Function.__call__(self, *args, **kwargs)
970 try:
971 outputs = (
--> 972 self.vm()
973 if output_subset is None
974 else self.vm(output_subset=output_subset)
975 )
976 except Exception:
ValueError: Not enough dimensions on input.
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [32], in <cell line: 1>()
1 with model_oos_ts:
2 X.set_value(X_test)
----> 3 posterior_predictive_oos_ts_test = pm.sample_posterior_predictive(
4 trace=idata_oos_ts, random_seed=RANDOM_SEED
5 )
File ~\anaconda3\lib\site-packages\pymc\sampling\forward.py:644, in sample_posterior_predictive(trace, model, var_names, sample_dims, random_seed, progressbar, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
639 # there's only a single chain, but the index might hit it multiple times if
640 # the number of indices is greater than the length of the trace.
641 else:
642 param = _trace[idx % len_trace]
--> 644 values = sampler_fn(**param)
646 for k, v in zip(vars_, values):
647 ppc_trace_t.insert(k.name, v, idx)
File ~\anaconda3\lib\site-packages\pymc\util.py:390, in point_wrapper.<locals>.wrapped(**kwargs)
388 def wrapped(**kwargs):
389 input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 390 return core_function(**input_point)
File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:985, in Function.__call__(self, *args, **kwargs)
983 if hasattr(self.vm, "thunks"):
984 thunk = self.vm.thunks[self.vm.position_of_error]
--> 985 raise_with_op(
986 self.maker.fgraph,
987 node=self.vm.nodes[self.vm.position_of_error],
988 thunk=thunk,
989 storage_map=getattr(self.vm, "storage_map", None),
990 )
991 else:
992 # old-style linkers raise their own exceptions
993 raise
File ~\anaconda3\lib\site-packages\pytensor\link\utils.py:536, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
531 warnings.warn(
532 f"{exc_type} error does not allow us to add an extra error message"
533 )
534 # Some exception need extra parameter in inputs. So forget the
535 # extra long error message in that case.
--> 536 raise exc_value.with_traceback(exc_trace)
File ~\anaconda3\lib\site-packages\pytensor\compile\function\types.py:972, in Function.__call__(self, *args, **kwargs)
969 t0_fn = time.perf_counter()
970 try:
971 outputs = (
--> 972 self.vm()
973 if output_subset is None
974 else self.vm(output_subset=output_subset)
975 )
976 except Exception:
977 restore_defaults()
ValueError: Not enough dimensions on input.
Apply node that caused the error: Elemwise{Composite{(i0 / (Abs(i1) + i0))}}[(0, 1)](InplaceDimShuffle{x}.0, μ)
Toposort index: 2
Inputs types: [TensorType(float64, (1,)), TensorType(float64, (?,))]
Inputs shapes: [(1,), ()]
Inputs strides: [(8,), ()]
Inputs values: [array([573.82329429]), array(13290.87321969)]
Outputs clients: [[nbinom_rv{0, (0, 0), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x1F8CFBAD4A0>), MakeVector{dtype='int64'}.0, TensorConstant{4}, α, Elemwise{Composite{(i0 / (Abs(i1) + i0))}}[(0, 1)].0)]]
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
import arviz as az
with model_oos_ts:
az.plot_trace(idata_oos_ts, var_names=["μ"]);
1
import arviz as az
I was trying with predict as well - as it is mentioned in another related question. I installed BART after this question was made. So I do not think it is because of the version.
rng = np.random.RandomState(342)
a = pmb.predict(idata, rng, X.values, 100)
b = pmb.predict(idata, rng, X.values, 100)
as it is shown in another question I get -
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [36], in <cell line: 2>()
1 rng = np.random.RandomState(342)
----> 2 a = pmb.predict(idata, rng, X.values, 100)
3 b = pmb.predict(idata, rng, X.values, 100)
AttributeError: module 'pymc_bart' has no attribute 'predict'
Which would otherwise be a good solution for me.
From reading more questions in the forum I think that it has something to do with the shape.
No I was wondering if it is possible to add empty lines to the intial pandas dataframe and split it up from the end of the regular data to get a prediction as an intermediate or dummy solution?
This is my idea with pandas so far: