Need help with setup for BART model for binary classification

import arviz as az
import pymc as pm # 0.4.3
import pymc_bart as pmb # 0.2.1
import pandas as pd

df_train_features = # select data here, about 850k samples
df_train_labels = # select labels here, highly unbalanced with about 98% zeros

with pm.Model() as model_bart:
	mu = pmb.BART("mu", df_train_features, df_train_labels, m=200)
	theta = pm.Deterministic("theta", pm.math.invprobit(mu))
	y = pm.Bernoulli("y", p=theta, observed=df_train_labels)
	idata = pm.sample(random_seed=0, tune=200)

When I run the above code, the output of Jupyter notebook shows the progress like below:

Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
<progress bar> 100.00% [4800/4800 <time> Sampling 4 chains, 0 divergences]
Sampling 4 chains for 200 tune and 1_000 draw iterations (800 + 4_000 draws total) took <time> seconds.

So I have run into a few issues:

  1. If I use the entire training data as mentioned above, at some point during the progress bar, the kernel just died without any further warning nor error. This happened a few times when progress was at 50%, 80% and even 100%. When I tried with just 1% of the data (so 8500 rows), then it worked fine. How can I force verbose output to see the precise error message?
  2. Is the above model setup correct for BART classification? I am basing it against chapter 4 of the original BART paper.
  3. In an online setting, assuming the above code doesn’t run into any error, how can I feed further training data into the model without retraining it from scratch? Which variable above should I pickle? And how do I resume training?
  4. Relating to question 1 and 3, if it’s because of memory constraintI split up the data into chunks of 10% and do the training 10 times sequentially?
  5. I see lots of mentions of steps and potential for NUTS sampler in this forum from various google search queries. Is it something I should concern myself with, and change the above code accordingly?

My apology if my questions are naive. I am completely new to both PyMC and Bayesian inference in general so I am still learning.

Even if you have insight into just one of the questions, I’d really appreciate your inputs.