Is BART currently broken for out of sample predictions?

Hey there,

I recently updated to v6, and for a BART model I’ve noticed some interesting shape errors that until now have not been present. My script breaks when I attempt to create posterior predictive samples for my hold out data:

Vectorized input 0 has an incompatible shape in axis 0.

But in my workflow, I am following the suggested pattern in using pm.data to set my matrix of covariates from the test set and a ‘dummy’ vector of predictions:

with self.model:  # sample with new input data
                pm.set_data({"X": X_test,
                             "y": .y_test},coords = {"subj":X_test.index})
              
               #pm.set_data({},coords = {"subj".X_test.index})                         
                
                post_pred = pm.sample_posterior_predictive(trace = self.idata_tree,
                                               model = self.model,
                                               var_names=['Y','θ',],    
                                               freeze_vars = ['mu'] ,
                                               # tested with false as well: predictions = True,
                                               progressbar = False,
                                               extend_inferencedata = False,
                                               random_seed = rng
                                               )

My model is a very simple binomial regression with bart serving as a ‘link function’ to the covariates.

with pm.Model(#coords = coords
                      ) as self.model:
            #train discrete length intervals on covariates using bart
            X_ij =  pm.Data('X',
                            X,
                            #dims = ("subj", "X_vars") 
                            )
           
            y_obs = pm.Data("y",
                            y.to_numpy(),
                            #dims = ("subj")
                            )
       
            mu = pmb.BART("mu", 
                          X_ij, 
                          y_obs, 
                          **self.model_config,
                          )
            
            self._BART = mu

            θ = pm.Deterministic("θ", 
                                 pm.math.sigmoid(mu) 
                                 )
 
            Y = pm.Bernoulli("Y", 
                                p = θ, 
                                observed = y_obs,
                                #dims = 'subj',
                                )

Is anyone else experiencing something similar?

Thanks!

Its hard to tell from this – check what your fitting and test dataset shapes look like.

can you give a minimal reproducible example?

Hey Eveyone,

apologies for the lapse in reply @ricardoV94 @fonnesbeck

It seems that the main issue is that the model context wasn’t being persisted in my databricks notebook.

After creating my model definition and performing some intermediate work, I was trying to use the context manager to run the same model on new data (as well as sample my posterior prediction). This was breaking because the underlying pytensor computational graph was being dumped out somewhere.

I have not been able to isolate the exact error; but it’s stemming from me attempting to create a custom class that is handling artifact storage between mechanically sampling my model. I need to investigate which node is being handed what (and when) when it’s time to sample, extend inference data, and then start storing things( so I don’t drive myself insane with a messy notebook-obligatory plug to do your actual stats work…probably note in a notebook).

If i just sample from my model, extend my inference data for my posterior predictive, and then use the recommended container approach to perform out of sample prediction within the same call via the context manager, everything works as expected…with the caveat that upon inspection, my model digraph is now a cyclical graph (my likelihood Y and data container y now have an edge). :smile: Sorry for potentially setting up a panic situation.

I’m trying to determine if there is an issue for me here, or if this is just an inconvenient
visual artifact lol. (Note, my dimensionality is different from presented, I just created a sparse dummy matrix to check some things out)