Getting multinomial class probabilities during posterior prediction on test Data

Hello PyMC3 community! I’ve been working on a multinomial class prediction model and have been pulling my hair out trying to find the correct approach to get the posterior class probability when predicting on new (test) data. I know there must be a simple way such as described in this stack overflow post. Also similar to the post on an Observed Deterministic (can’t post another link :frowning: ) variable.

The difference being I am trying to find pm.Deterministic sample values not from the previously sampled trace, but when the posterior is evaluated on hold-out test data.

Here’s example pseudo code in the same form as my model:

# example data
y_train = [[0,0,1],[0,0,1],[0,1,0]]
x_train = [10, 11, 20]

# set training data as theano shared variables
xt = theano.shared(x_train)
yt = theano.shared(y_train)

with pm.Model() as my_model:
    # variables tuning
    theta_1 = pm.Normal('theta_1', mu=1.25, sd=0.1)
    # deterministic transformation in some function
    class_param = function((xt, theta_1)
    p = pm.Deterministic('p', tt.nnet.softmax(class_param))
    observed = pm.Multinomial('obs', n=1, p=p, observed=yt)
    step = pm.Metropolis()
    trace = pm.sample(draws=3000, step=step)

trace_burnt = trace[2000:]
xt.set_value(x_test)
ppc = pm.sample_ppc(trace_burnt, samples=500, model=my_model)

So in this case I’m trying to find a specific vector [p(C=0), P(C=1), P(C=2)] 500 times sampled for each data point in x_test, instead of getting a multinomial binary prediction vector e.g. [1, 0, 0] 500 times per data point. These are the values of my ‘p’ variable which is a pm.Deterministic.

It seems like I’m just missing something here. Please let me know if there is a simple way to do it, any help at all is really appreciated. Currently I am sampling these binary predictions N times, summing and dividing each by N. Or I am taking confidence intervals with statsmodels.stats.proportion.multinomial_proportions_confint that accepts these binary prediction vectors.

Not sure I understand what you mean here. What would be the expected output in this case?

I’m trying to get the respective values of the unobserved deterministic ‘p’ variable, those are probabilities for each class that it passes to the pm.Multinomial variable, For example [0.5, 0.4, 0.1] instead of [1,0,0]. The multinomial variable is doing as it should, but I’m trying to analyze the class probabilities for different inputs also.

Oh I see. Unfortunately, there is no easy way to do it for Deterministic node. You need to select points from the trace (point = trace._straces[chain_idx].point(point_idx)) and pass the posterior sample of theta_1 (point['theta_1']) through the deterministic function (function and tt.nnet.softmax). You might want to replace tt.nnet.softmax with a numpy version to ease the computation.

Tip: If you are sampling another stochastic node that is not observed (for example, if in this example you have instead p=Dirichlet('p', tt.nnet.softmax(class_param))), you can sample its posterior prediction via ppc = pm.sample_ppc(trace_burnt, samples=500, vars=[‘p’], model=my_model)

Thanks for giving some insight. I guess its not as simple as I hoped. I’ll try your tips and add anything else I come across.