Predicting with Categorical

davidia · October 25, 2019, 3:33pm

Hello,

I have 2 questions and would very grateful for some help.

In the below model how do I change the p parameter in the Categorical for out of sample prediction? I have attempted with a shared variable but it doesn’t seem to do anything.
Say my out of sample data had 4 data points instead of the 20 in the training. How can I get categorical to swallow the dimension change?

Many thanks!!

from theano import shared
import pymc3 as pm
import numpy as np

x_data = shared( np.random.rand(20,5,3) )
y_data = shared( 2 * np.argmax(x,axis=2) + (np.random.rand(20,5) -0.5) )

with pm.Model() as cmodel:
    c = pm.Categorical('cat',p=x_data,shape=(20,5))
    beta = pm.Exponential('beta',1)
    y = pm.Normal('y',mu=beta*c,sd=1,observed=y_data)
    t1 = pm.sample()

    #predict
    res1 = pm.sample_posterior_predictive(t1,100)
    
    # favour category 0
    c0 = np.zeros((20,5,3)) + 0.01
    c0[:,:,0] = 1
    x_data.set_value(c0)
    
    #predict again
    res2 = pm.sample_posterior_predictive(t1,100)
    
print( res1['y'].mean(0).mean() , res2['y'].mean(0).mean() )

AlexAndorra · October 25, 2019, 3:52pm

Hi! This is not yet a direct answer to your question, but the API for out-of-sample predictions was simplified in a recent version. Maybe this will solve your problem?

davidia · October 28, 2019, 12:03pm

Thanks for pointing me to this, it does seem a nice way to manage the data Unfortunately, I gave it a go but got the same results.

AlexAndorra · October 28, 2019, 5:29pm

Was worth a try
Can you give more details about what you’re trying to model? I don’t really see how your likelihood relates to the data-generating process

davidia · October 28, 2019, 9:02pm

Thanks for replying again, this is just a toy example. The answer to my original question is

Delete the categorical variable with trace.remove_values('cat')
Use a model factory like as demonstrated in How do we predict on new unseen groups in a hierarchical model in PyMC3? (this takes care of the shape problem)

But then it’s still slow which is known problem so the right thing to do is just ditch the Categorical altogether Marginalized Gaussian Mixture Model

AlexAndorra · October 29, 2019, 1:59pm

Yeah, sampling from a discrete distribution is usually impractical…
Glad you found a solution then – sorry I wasn’t more helpful

Topic		Replies	Views
How to make out of sample predictions with categorical variables? Implementation question from Rethinking 2 Book Questions	1	394	July 23, 2020
Problem with pm.Categorical Questions	4	4241	December 8, 2017
Modelling Categorial Variable Questions	2	700	October 17, 2019
pm.Categorical for matrix of probabilities Questions	3	573	February 22, 2019
Multivariate categorical with different probabilities Questions	5	1861	April 9, 2019

Predicting with Categorical

Related topics