How to predict new values on hold-out data

rpgoldman · July 24, 2019, 3:07pm

I apologize – this was a false positive. My colleague just told me that the problem was an error in post processing, not in the sampling.
I will check to verify that and let you know.

Isaac_S · June 28, 2020, 7:33pm

This’s a nice workaround. How can I modify this resolution in a situation when working with pm.GLM.from_formula.

Here is my model building block that I prefer not to change:

with pm.Model() as model:
	priors = {
			"Intercept"      : pm.Normal.dist(mu=start_mu['Intercept']     , sigma=start_sig['Intercept']),
			"a" : pm.Normal.dist(mu=start_mu['a'], sigma=start_sig['a']),
			"b" : pm.Normal.dist(mu=start_mu['b'], sigma=start_sig['b']),
			"c" : pm.Normal.dist(mu=start_mu['c'], sigma=start_sig['c']),
			"d"  : pm.Normal.dist(mu=start_mu['d']    , sigma=start_sig['d'])
		}
	family = pm.glm.families.StudentT()
    pm.GLM.from_formula(formula = 'y ~ a + b + c + d',
                        data    = train_df,
                        priors  = priors,
                        family  = family
                       )
	trace = pm.sample(draws=11000,
                      tune=700,
                      init='advi',
                      start=None,
                      cores=ncores,
                      chains=ncores,
                      random_seed=[123 for _ in range(ncores)],
                      discard_tuned_samples=True,
                      compute_convergence_checks=True)

Thanks!

lucianopaz · June 28, 2020, 8:48pm

You just have to try to move your model’s definition into a function that receives either the training or test data:

def model_factory(data):
    with pm.Model() as model:
        priors = {
			"Intercept"      : pm.Normal.dist(mu=start_mu['Intercept']     , sigma=start_sig['Intercept']),
			"a" : pm.Normal.dist(mu=start_mu['a'], sigma=start_sig['a']),
			"b" : pm.Normal.dist(mu=start_mu['b'], sigma=start_sig['b']),
			"c" : pm.Normal.dist(mu=start_mu['c'], sigma=start_sig['c']),
			"d"  : pm.Normal.dist(mu=start_mu['d']    , sigma=start_sig['d'])
		}
        family = pm.glm.families.StudentT()
        pm.GLM.from_formula(formula = 'y ~ a + b + c + d',
                        data    = data,
                        priors  = priors,
                        family  = family
                       )
    return model

with model_factory(train_data):
    trace = pm.sample(draws=11000,
                      tune=700,
                      init='advi',
                      start=None,
                      cores=ncores,
                      chains=ncores,
                      random_seed=[123 for _ in range(ncores)],
                      discard_tuned_samples=True,
                      compute_convergence_checks=True)

with model_factory(test_data):
    ppc = pm.sample_posterior_predictive(trace) #or whatever

Sam_Anand · July 21, 2020, 7:35pm

Thank you for this! it was super helpful.

Would it be possible to show an example of a hierarchical linear model as well? I’m finding it hard to plot the prediction vs train and test.
Also, if you can point to any article that helps build and predict hierarchical data that’s not necessarily linear.
Thank you for your time!

AlexAndorra · July 22, 2020, 9:01am

Hi @Sam_Anand!
I think the PyMC3 port of Rethinking_2 chapters 13 and 14 (hierarchical models) could help you with that
PyMCheers

Topic		Replies	Views
Out of sample prediction v5 modeling	20	1895	April 7, 2023
How do we predict on new unseen groups in a hierarchical model in PyMC3? Questions	10	5392	September 12, 2022
Shared theano in multiple regression Questions	12	1086	February 12, 2019
Unable to predict using set_value with an errors-in-variables model Questions	9	1591	December 5, 2021
Sample_posterior_predicitve not catching shape of new data v5 prediction	10	1273	August 24, 2022

How to predict new values on hold-out data

Related topics