"Out of sample" predictions with the GLM sub-module


#1

I don’t know if I’m using the right vocabulary here but I want to use a model I’m fitting with GLM to give me the posterior predictive distribution of a variable that was not observed - equivalent to a train/test split. Is there a way to do this and continue using the compact syntax of the GLM sub-module?


#2

Yes, you can feed a theano.shared X and y for fitting/sampling, and then replace the test value for prediction. For more information see:
http://docs.pymc.io/notebooks/posterior_predictive.html#Prediction
http://docs.pymc.io/notebooks/api_quickstart.html#4.1-Predicting-on-hold-out-data


#3

Hi @junpenglao is there an example of this using the GLM module?

I tried creating a shared variable for a model using GLM, received this error:

PatsyError: Error evaluating factor: TypeError: The generic ‘SharedVariable’ object is not subscriptable. This shared variable contains an object of type: <class ‘pandas.core.frame.DataFrame’>. Did you forget to cast it into a Numpy array before calling theano.shared()?

The variable is a dataframe (and the model runs fine without the shared variable).

thanks


#4

I dont think that is possible for pandas data frame, you can try passing the shared array to x and y specifically, but I am not sure it would work as:


#5

Hello,

Just to be sure about the outcome of this discussion as I am currently facing the same issue : can we use any shared theano variables to make OOS prediction with GLM?
I have tried several configurations without success…
Thank you in advance.