Compute predictions for multinomial categorical model

rodix · May 16, 2023, 1:04pm

Hello,

I’ve started reading the Rethinking Statistics (2022) book and it was great so far. I want to make sure I understand how to compute predictions for a specific model, in this case is the multinomial categorical. (Chapter 11)
I have these generated data:

N = 500

# simulate family incomes for each individual
family_income = np.random.rand(N)

# assign a unique coefficient for each type of event
b = np.array([-2.0, 0.0, 2.0])

p = softmax(np.array([0.5, 1.0, 1.5])[:, None] + np.outer(b, family_income), axis=0).T

career = np.asarray([np.random.multinomial(1, pp) for pp in p])
career = np.where(career == 1)[1]
career[:11]

I’m creating a model that assigns a coefficient to each observation value (family_income)

with pm.Model() as m11_14:
    a = pm.Normal("a", 0.0, 1.5, shape=2)  # intercepts
    b = pm.Normal("b", 0.0, 1.0, shape=2)  # coefficients on family income

    s0 = a[0] + b[0] * family_income
    s1 = a[1] + b[1] * family_income
    s2 = np.zeros(N)  # pivot
    s = pm.math.stack([s0, s1, s2]).T

    p_ = at.nnet.softmax(s)
    career_obs = pm.Categorical("career", p=p_, observed=career)

    trace_11_14 = pm.sample(1000, tune=2000, target_accept=0.9, random_seed=RANDOM_SEED)
az.summary(trace_11_14, round_to=2)

I want to computed the implied predictions based on the posterior distributions of the coefficients a and b. I’ve tried this:

post_11_14 = az.extract_dataset(trace_11_14["posterior"])
s0 = post_11_14["a"][0, :] + post_11_14["b"][0, :]  * family_income

The issue is that the length of the posterior for each coefficient is 4000 (4 chains of 1000 samples each) and the length of observations (family_income) is 500. What is the best practice in order to create predictions for each observation? Do we do some sort of aggregation on (chain, samples) level?
Here is the link to the entire chapter code.

Topic		Replies	Views
Predicting occupation from college major Questions	6	704	April 25, 2020
Getting multinomial class probabilities during posterior prediction on test Data Questions	4	1549	January 31, 2018
Issue with running prediction with multinomial softmax v5	3	556	November 2, 2022
Plot predictions by category in a model with multiple categories version agnostic bambi , modeling	1	49	November 27, 2024
Multivariate-multinomial logistic regression Questions	10	7043	June 19, 2020

Compute predictions for multinomial categorical model

Related topics