Is using the BART's predict function a correct way to predict new data?

Hello and thank you to all the developers for your continued support.

I want to predict new data with a model using BART.
So I used the method pm.MutableData() and pass the value to BART like this…

    data_X = pm.MutableData("df_X", df_X[use_cols])
    data_y = pm.MutableData("df_y", df_y[target_cols[0]])
    σ = pm.Normal("σ", model_params["Normal"]["σ"]) 
    μ = pmb.BART("μ", data_X, data_y, m=model_params["BART"]["m"], alpha=model_params["BART"]["alpha"]) 
    y = pm.Normal("y", μ, σ, observed=data_y)

I got an error.

TypeError                                 Traceback (most recent call last)
<ipython-input-106-135349d654bb> in <module>
     34     # μ = pmb.BART("μ", df_X[use_cols], df_y[target_cols[0]], m=model_params["BART"]["m"], alpha=model_params["BART"]["alpha"]) #パラメタの事前分布
     35     # y = pm.Normal("y", μ, σ, observed=df_y[target_cols[0]]) #尤度関数
---> 36     μ = pmb.BART("μ", data_X, data_y, m=model_params["BART"]["m"], alpha=model_params["BART"]["alpha"]) #パラメタの事前分布
     37     y = pm.Normal("y", μ, σ, observed=data_y) #尤度関数
     38 

1 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/numeric.py in ones(shape, dtype, order, like)
    202         return _ones_with_like(shape, dtype=dtype, order=order, like=like)
    203 
--> 204     a = empty(shape, dtype, order)
    205     multiarray.copyto(a, 1, casting='unsafe')
    206     return a

TypeError: expected sequence object with len >= 0 or a single integer

It worked fine when I just passed a pandas DataFrame directly.

And I know there is a function for predicting new data on BART.

# it works.
predict = pmb.predict(trace, rng, df_X[use_cols].values, 100)

but I think I should use the “sample_posterior_predivtive” function for predicting new data by whole posterior distribution(take the average for prediction) like below.

with bart_model:
    pm.set_data({"df_X": test_X})
    posterior_predictive = pm.sample_posterior_predictive(trace)

But pm.MutableData function didn’t work.

So, how to predict new data with a model using BART?
Is using the BART’s predict function a correct way?

Thanks.

CC @aloctavodia

Hi, currently pmb.predict returns only the mean function. To compute the posterior predictive distribution you can do something like this.

μ_pred = pmb.predict(idata, rng, df_X[use_cols].values, 100)
stats.norm(μ_pred, idata.posterior["σ"].mean())

In the next release of pymc-bart, making predictions will be much smoother.

1 Like

Thank you for your reply! I got it.

Looking forward to the next release.
Thank you for your support.

Excuse me, but I’m confused.

I can’t find a way to get value from stats.norm function.
Does it mean you suggested like this?

I need predicted value for new data and its standard deviation as uncertainty.

from scipy.stats import norm #scipy's "stats"?
s = norm.rvs(μ_pred, idata.posterior["σ"].mean()).std(axis=0).flatten() # Do I need RVS?
m = norm.rvs(μ_pred, idata.posterior["σ"].mean()).mean(axis=0).flatten()

I’m sorry for the newbie to Bayes theorem.
Thanks.

If you run

μ_pred = pmb.predict(idata, rng, X.values, 100)
y_pred = norm.rvs(μ_pred, idata.posterior["σ"].mean())

y_pred will have shape (100, len(X)), that is 100 predictions, each one having the same size as the observed data.

Then you can work with that as you need, for example if you want the mean over the 100 predictions for each datapoint you can do y_pred.mean(0), or if you want the overall mean just y_pred.mean(). The same goes for the standard deviation the HDI, or whatever quantity you want to compute.

Alternatively, instead of using the mean of sigma, you can take samples from the posterior of sigma, something like this.

μ_pred = pmb.predict(idata, rng, X.values, 100).squeeze().T
σ_ = az.extract(idata, var_names="σ" ,num_samples=100)
y_pred = norm.rvs(μ_pred.T, σ_).T
1 Like

Now I know there are some functions that could be called like “stats.norm(loc, scale).***(size).”
I’m sorry for my ignorance.

I will try these code you specified.
Thank you for your kind answer.

Excume me, but I may found another issue.
I apologize if this is not an issue or this will be fixed in future releases.


μ_pred = pmb.predict(idata, rng, X.values, 100)

I passed a randomstate as “rng” to the function,
so I expected it returns fixed values but it doesn’t. (Randomly changed every time I call it.)

Is it a correct behavior on the current version?
If it is not, how can I get fixed values?

Environment:
pymc: 4.1.2
pymc-bart: 0.0.3
OS: GoogleColab(Ubuntu)
Python: 3.9

Thanks.

It works ok for me.

if you run

rng = np.random.RandomState(342)
a = pmb.predict(idata, rng, X.values, 100)
rng = np.random.RandomState(342)
b = pmb.predict(idata, rng, X.values, 100)

a-b

you should get a vector of zeros.

if instead you do

rng = np.random.RandomState(342)
a = pmb.predict(idata, rng, X.values, 100)
b = pmb.predict(idata, rng, X.values, 100)

a and b should have different values. Notice that this is the expected behavior with random number generation, and not something particular to BART

1 Like

I’m sorry for my ignorance again.
Now I can reproduct same results.

Thank you very much.

Don’t be, we all have been there!

1 Like