Testing hiearchical model on unseen data

EtienneT · January 3, 2018, 11:01pm

Theano shared variables with ppc seems to be the preferred way to test your model with unseen data. But there’s only very basic examples on how to use them.

For example this notebook last section shows how to make prediction by specifying a specific index for the county of a new home.

St Louis county prediction

yhat_stl = Normal(‘yhat_stl’, mu=a[69] + b, tau=tau_y)

But this seems very inefficient to test a lot of unseen data.

Taking the example from the radon by county data, how could we fit a hierarchical model with shared variables and then sample a ppc for a dataset of unseen data? Each data point (a home) in the unseen data being associated to a particular county?

Thanks,

junpenglao · January 4, 2018, 6:52am

I think the best approach is first working out how to express the model in linear functions: y_hat = X*beta + Z*b. Then the prediction is done by setting new values to tt.shared variables and do sample_ppc.

So instead of doing:
y_hat = a[county]
transfer county into a design matrix Xcounty (you can use patsy from patsy import dmatrices, see some example here), and do y_hat = tt.dot(Xcounty, a)

EtienneT · January 4, 2018, 3:58pm

I will investigate the design matrices this is something I was not aware of. Thank you!

In your code example this would be the relevant line right?

_, L = dmatrices('tempresp ~ -1+subj', data=tbltest, return_type='matrix')

Where you don`t care about the _ variable and L is the design matrix for ‘subj’ which is your categorical variable?

junpenglao · January 4, 2018, 4:19pm

Yep exactly
You can also use one-hot-encoding from scikit-learn for similar purpose.

Topic		Replies	Views
Sampling a specific country from the hierarchical model example Questions	3	491	May 24, 2019
How `observed` data are subset in hierarchical modelling Questions	4	572	March 21, 2018
Multivariate hierarchical linear model with different hyperpriors Questions	2	620	November 2, 2018
Predictions for out of sample indexed categorical variables Questions	0	205	November 14, 2022
Real-life example on Housing price regression: advice requested Questions	5	2447	April 3, 2019

Testing hiearchical model on unseen data

St Louis county prediction

Related Topics