Predict the mu or the observed?

Number_Huang · November 26, 2022, 2:55am

hi,I have a question about the prediction,pls check the following code and the comments
#data

x_train,x_test=np.random.random((100,5)),np.random.random((30,5))
y_train,y_test=np.random.random((100,1)),np.random.random((30,1))

#train

with pm.Model() as model:
    a = pm.Normal("a", 0.0, 0.5)
    b = pm.Normal("b", 0.0, 1.0,shape=(5,1))
    mu = a + pm.math.dot(pm.MutableData("x",x_train),b)
    sigma = pm.Exponential("sigma", 1.0)

    #case1: here the prediction target is mu, which makes sense
    pm.Normal("obs", mu=pm.Deterministic("y",mu), sigma=sigma, observed=y_train)
    
    #case2: but I saw some cases is as the following
    #pm.Normal("obs", mu=mu, sigma=sigma, observed=pm.MutableData("y",y_train))
    trace = pm.sample()

#predict
#for case1,it works,for case2,raise the error “shape mismatch”

with model:
    pm.set_data({"x": x_test})
    # use the updated values and predict outcomes and probabilities:
    idata_2 = pm.sample_posterior_predictive(
        trace,
        var_names=["y"],
        return_inferencedata=True,
        predictions=True,
        #extend_inferencedata=True,
        random_seed=100,
    )
    a=idata_2.predictions["y"].mean(("chain", "draw"))

================
so,my question is where should we set the “y” for prediction target? the mu or the observed?
I am really confused many example show the target is on observed,but I did not see workable code.
and the api consitency is really a big problem,cause so many examples on website can not work!
thanks!

cluhmann · November 26, 2022, 3:14pm

Your code seems to work for me. What version of pymc are you using?

AlexAndorra · November 26, 2022, 8:11pm

Same for me, your code works fine. With PyMC 4.3.0 and Aesara 2.8.7

Number_Huang · November 28, 2022, 2:19am

thanks.
yes,case1 works, but case2(turn on the comment) fails when predict,my question is where to set the ‘predict target’.

Number_Huang · November 28, 2022, 2:22am

thanks,I use the 4.4.0,case1 works.I just wonder case1 is the regular code for multivarible regression?

cluhmann · November 28, 2022, 2:47am

Both cases work for me. That’s why I asked.

AlexAndorra · November 28, 2022, 5:29pm

Same. Can you update your PyMC @Number_Huang ?

I use the 4.4.0

Wait. Do we have 4.4.0 already @cluhmann ?

cluhmann · November 28, 2022, 5:37pm

Appears so: Release v4.4.0 · pymc-devs/pymc · GitHub

AlexAndorra · November 28, 2022, 5:57pm

Niiiiiice! Going even faster than I can keep up
So did you test the code above with 4.4.0? I didn’t yet

cluhmann · November 30, 2022, 12:17am

Works with pymc 4.4.0 for me.

Number_Huang · November 30, 2022, 2:36pm

Thanks Alex,I reran it and both work. so,actually my question is which one is correct?pm.Deterministic(“y”,mu) or pm.MutableData(“y”,y_train). It seems the pm.MutableData(“y”,y_train) makes no sense,but I did see the kind of code.

cluhmann · November 30, 2022, 2:46pm

Whether or not you wrap your y in a MutableData object comes down to whether you might want to swap out the original y with new data at a later stage (much like you currently are doing for x). Whether you wrap your y in a Deterministic comes down to whether you want to see sampled values of y in your trace/InferenceData object. No right or wrong.

Number_Huang · November 30, 2022, 10:49pm

thanks cluhmann.really need detailed doc about the pymc underhood mechnism and updated code.thanks all anyway

Topic		Replies	Views
Sample_posterior_predicitve not catching shape of new data v5 prediction	10	1267	August 24, 2022
Out of sample predict issue	6	554	June 20, 2023
PyMC v5.10.3 prediction stuff v5 modeling	11	464	January 15, 2024
Pymc5 out of sample v5 modeling	9	1228	August 20, 2023
Using sample posterior predictive on new data v5 modeling	4	81	April 30, 2025

Predict the mu or the observed?

Related topics