Multiple response model, broadcast failure?

jbuddy_13 · November 19, 2020, 12:34am

I am trying to extend the ideas of item response theory to multiple responses. Consider a marketing survey, which asks customers, “what’s the deciding factor in whether or not you purchase product X?” Where answers are {0: price, 1: durability, 2: ease-of-use}.

Here is some synthetic data (rows are customers, columns are products, each cell is the class response.)

responses = np.array([ 
          [0,1,2,1,0],
          [1,1,1,1,1],
          [0,0,2,2,1],
          [1,1,2,2,1],
          [1,1,0,0,0]  
    ])

students = 5
questions = 5
categories = 3

with pm.Model() as model:
    z_student = pm.Normal("z_student", mu=0, sigma=1, shape=(students,categories))
    z_question = pm.Normal("z_question",mu=0, sigma=1, shape=(categories,questions))
    
    # Transformed parameter
    theta = pm.Deterministic("theta", tt.nnet.softmax(z_student - z_question))
     
    # Likelihood
    kij = pm.Categorical("kij", p=theta, observed=responses)
    trace = pm.sample(chains=4)

az.plot_trace(trace, var_names=["z_student", "z_question"], compact=False);

This code produces the following error: ValueError: Input dimension mis-match. (input[0].shape[0] = 5, input[1].shape[0] = 3).

However, when I change the theta line to: theta = pm.Deterministic("theta", tt.nnet.softmax(z_student - z_question.transpose())) the sampler doesn’t instantly failure, rather is samples wrong.

az.summary(trace)

mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
z_student[0,0]	0.150	0.893	-1.620	1.752	0.012	0.013	5789.0	2327.0	5771.0	2991.0	1.0
z_student[0,1]	0.393	0.879	-1.319	1.980	0.012	0.012	5150.0	2610.0	5153.0	3195.0	1.0
z_student[0,2]	-0.591	0.915	-2.254	1.108	0.011	0.012	6408.0	2737.0	6415.0	2830.0	1.0
z_student[1,0]	-0.064	0.860	-1.676	1.538	0.011	0.014	5748.0	1942.0	5747.0	2850.0	1.0
z_student[1,1]	0.602	0.864	-0.982	2.185	0.012	0.011	4921.0	3028.0	4920.0	3269.0	1.0
z_student[1,2]	-0.548	0.906	-2.218	1.137	0.012	0.012	6076.0	2870.0	6083.0	3410.0	1.0
z_student[2,0]	-0.166	0.907	-1.974	1.450	0.013	0.014	4681.0	2121.0	4692.0	3108.0	1.0
z_student[2,1]	-0.188	0.875	-1.776	1.472	0.011	0.014	5923.0	2073.0	5945.0	3333.0	1.0
z_student[2,2]	0.344	0.865	-1.288	1.951	0.012	0.012	4828.0	2750.0	4822.0	3039.0	1.0
z_student[3,0]	-0.212	0.892	-1.980	1.395	0.011	0.013	6019.0	2504.0	5996.0	3391.0	1.0
z_student[3,1]	0.097	0.876	-1.573	1.713	0.012	0.013	5304.0	2252.0	5332.0	2971.0	1.0
z_student[3,2]	0.096	0.851	-1.583	1.645	0.011	0.012	5554.0	2678.0	5543.0	3288.0	1.0
z_student[4,0]	0.160	0.881	-1.367	1.947	0.012	0.013	5421.0	2189.0	5413.0	2927.0	1.0
z_student[4,1]	0.414	0.863	-1.255	2.026	0.012	0.012	4900.0	2548.0	4897.0	3248.0	1.0
z_student[4,2]	-0.558	0.901	-2.266	1.130	0.011	0.012	6551.0	2728.0	6582.0	3142.0	1.0
z_question[0,0]	-0.179	0.883	-1.795	1.488	0.011	0.015	6317.0	1769.0	6315.0	3389.0	1.0
z_question[0,1]	0.107	0.886	-1.511	1.807	0.012	0.013	5236.0	2431.0	5209.0	3503.0	1.0
z_question[0,2]	0.164	0.878	-1.450	1.834	0.012	0.013	5131.0	2248.0	5106.0	3102.0	1.0
z_question[0,3]	0.186	0.904	-1.450	1.882	0.011	0.014	6228.0	2175.0	6219.0	3335.0	1.0
z_question[0,4]	-0.187	0.877	-1.790	1.508	0.011	0.014	5819.0	2089.0	5834.0	3198.0	1.0
z_question[1,0]	-0.389	0.849	-1.948	1.219	0.012	0.012	4726.0	2494.0	4713.0	3146.0	1.0
z_question[1,1]	-0.600	0.858	-2.249	0.946	0.012	0.011	5093.0	3247.0	5116.0	3312.0	1.0
z_question[1,2]	0.179	0.868	-1.520	1.763	0.012	0.012	5204.0	2514.0	5201.0	3418.0	1.0
z_question[1,3]	-0.103	0.862	-1.683	1.561	0.013	0.013	4608.0	2212.0	4615.0	3163.0	1.0
z_question[1,4]	-0.381	0.866	-2.047	1.147	0.011	0.012	6181.0	2735.0	6188.0	3038.0	1.0
z_question[2,0]	0.565	0.908	-1.125	2.337	0.012	0.012	6022.0	2879.0	6045.0	3173.0	1.0
z_question[2,1]	0.536	0.923	-1.192	2.241	0.012	0.013	6041.0	2476.0	6046.0	3059.0	1.0
z_question[2,2]	-0.325	0.856	-1.918	1.289	0.012	0.012	5429.0	2741.0	5418.0	3004.0	1.0
z_question[2,3]	-0.107	0.881	-1.953	1.363	0.012	0.012	5834.0	2545.0	5841.0	3332.0	1.0
z_question[2,4]	0.576	0.910	-1.202	2.253	0.011	0.013	6385.0	2606.0	6371.0	2905.0	1.0
theta[0,0]	0.360	0.173	0.072	0.685	0.003	0.002	4309.0	3774.0	4256.0	2846.0	1.0
theta[0,1]	0.528	0.182	0.208	0.857	0.003	0.002	4949.0	4563.0	4908.0	3050.0	1.0
theta[0,2]	0.113	0.104	0.001	0.304	0.001	0.001	6095.0	4045.0	7146.0	2780.0	1.0
theta[1,0]	0.216	0.144	0.007	0.477	0.002	0.002	6149.0	4576.0	6493.0	3116.0	1.0
theta[1,1]	0.678	0.168	0.381	0.962	0.002	0.002	5954.0	5954.0	6180.0	3320.0	1.0
theta[1,2]	0.107	0.100	0.000	0.294	0.001	0.001	6321.0	3863.0	7623.0	3252.0	1.0
theta[2,0]	0.234	0.150	0.010	0.509	0.002	0.002	6154.0	4352.0	6684.0	3252.0	1.0
theta[2,1]	0.230	0.152	0.005	0.506	0.002	0.001	6885.0	5424.0	6459.0	2923.0	1.0
theta[2,2]	0.536	0.186	0.194	0.858	0.002	0.002	5595.0	5250.0	5622.0	2805.0	1.0
theta[3,0]	0.239	0.157	0.007	0.526	0.002	0.002	5843.0	4627.0	5789.0	2853.0	1.0
theta[3,1]	0.381	0.178	0.065	0.703	0.003	0.002	4927.0	4377.0	5009.0	3315.0	1.0
theta[3,2]	0.380	0.174	0.069	0.692	0.003	0.002	4653.0	4176.0	4624.0	2562.0	1.0
theta[4,0]	0.361	0.175	0.057	0.668	0.002	0.002	5185.0	4637.0	5269.0	2985.0	1.0
theta[4,1]	0.527	0.184	0.186	0.852	0.003	0.002	4614.0	4445.0	4668.0	2497.0	1.0
theta[4,2]	0.111	0.100	0.002	0.303	0.001	0.001	6159.0	3978.0	7520.0	3473.0	1.0

Of note, please reference the theta values learned. They’re names include: Theta[0,0]…Theta[0,2],…Theta[4,2].

A tensor should be returned instead, where Theta[0,0,0] refers to the learned value {student_0, question_0, response_class_0}. However, my code is currently not accomplishing this effect.

Any ideas?

Note, suggestions taken from @PedroSebe

PedroSebe · November 19, 2020, 2:12pm

It seems that z_student has shape (students, categories) while z_question has shape (categories,questions), so you can’t subtract them.

jbuddy_13 · November 19, 2020, 2:28pm

Hi @PedroSebe yes, I did try giving them the same shape, which did sample successfully.

My confusion then becomes, why does this logic not extend to the binary response case? If you see the first code block z_student has shape (1,students) while z_question has shape (questions,1).

These elements must be shaped in this way to ensure proper sampling. Yet, this rule does not extend to multi-class problem. And I’m not sure why. See: IRT w/ PyMC3, unusual number of parameters modeled?

MarcoGorelli · November 20, 2020, 4:11pm

Hi @jbuddy_13 - not sure I understand what you mean here, what’s “the first code block”? If I put a breakpoint before the # Transformed parameter line, then I see

ipdb>  z_student.random().shape
(5, 3)
ipdb>  z_question.random().shape
(3, 5)

, I don’t understand when z_student has shape (1,students) - any chance you could clarify?

jbuddy_13 · November 20, 2020, 4:31pm

@MarcoGorelli, I apologize, this got lost in some edits. By first code block, I was referring to the below. Notice that z_student has the shape(1,students) and z_question has the shape (questions,1).

This is super useful, as it allows for one theta to be computed for every cell in the matrix, k. Note, that if the shapes were (students, 1) and (questions, 1), PyMC3 will learn far fewer theta values. Namely along the diagonal, (ex z_student[0] - z_question[0], z_student[1] - z_question[1] but would not model student 0’s performance on question 1.)

Does this clear things up?

MarcoGorelli · November 20, 2020, 4:38pm

Thanks - any chance you could copy and paste the code (rather than a screeenshot) so I can try running it?

jbuddy_13 · November 20, 2020, 4:38pm

@MarcoGorelli, yup- below


k = np.array([1,1,1,1,1,
              1,1,1,1,0,
              1,1,1,0,0,
              1,1,0,0,0]).reshape(4,5)

students = 5
questions = 4

with pm.Model() as model:
    z_student = pm.Normal("z_student", mu=0, sigma=1, shape=(1,students))
    z_question = pm.Normal("z_question",mu=0, sigma=1, shape=(questions,1))
    
    # Transformed parameter
    theta = pm.Deterministic("theta", pm.math.sigmoid(z_student - z_question))
     
    # Likelihood
    kij = pm.Bernoulli("kij", p=theta, observed=k)
    trace = pm.sample(chains=4)

az.plot_trace(trace, var_names=["z_student", "z_question", "theta"], compact=False);

MarcoGorelli · November 20, 2020, 4:55pm

My confusion then becomes, why does this logic not extend to the binary response case? If you see the first code block z_student has shape (1,students) while z_question has shape (questions,1) .

OK, I think I see - if you have shapes (1, m) and (n, 1), then numpy will do the broadcasting for you:

>>> a = np.random.randn(1, 5)
>>> b = np.random.randn(3, 1)
>>> (a - b).shape
(3, 5)

On the other hand, if you have shapes (k, m) and (n, k), then it won’t:

>>> a = np.random.randn(2, 5)
>>> b = np.random.randn(3, 2)
>>> (a - b).shape
ValueError: operands could not be broadcast together with shapes (2,5) (3,2)

So perhaps first figure out what shape you expect a-b to be, then reshape them accordingly so that the subtraction works

jbuddy_13 · November 20, 2020, 5:12pm

Yes; I know what I want exactly and I believe this diagram articulates it quite tell. Translating this PyMC3 will be the real trick.

MarcoGorelli · November 20, 2020, 5:13pm

Before translating to PyMC3, I believe you need to get the translation to numpy right first

jbuddy_13 · November 20, 2020, 5:18pm

Good point! I’m not sure about numpy, but in terms of Theano I believe this is the right idea

import theano
from theano import tensor
import numpy as np
a = tensor.matrix()
b = tensor.matrix()
elem_sub = a[0,0] - b[0,0], a[0,1] - b[1,0], a[0,2] - b[2,0]  
function = theano.function([a,b], elem_sub)
d = function(np.array([[0, 0,0]]), np.array([[1],
                                             [1],
                                             [1]]))
print(d)
>>>
[array(-1.), array(-1.), array(-1.)]

jbuddy_13 · November 20, 2020, 5:23pm

Though, I’m running into an error when I try to implement this into my model.

responses = np.array([ 
          [0,1,2,2,2],
          [0,1,2,1,1],
          [0,1,2,0,0],
          [0,1,2,0,1],
          [0,1,2,1,0]  
    ])

students = 5
questions = 5
categories = 3

a = tensor.matrix()
b = tensor.matrix()
elem_sub = a[0,0] - b[0,0], a[0,1] - b[1,0], a[0,2] - b[2,0]  
function = theano.function([a,b], elem_sub)

with pm.Model() as model:
    z_student = pm.Normal("z_student", mu=0, sigma=1, shape=(students,categories))
    z_question = pm.Normal("z_question", mu=0, sigma=1, shape=(categories,questions))
    # Transformed parameter
    theta = pm.Deterministic("theta", tt.nnet.softmax(function(z_student,z_question)))   
    # Likelihood
    kij = pm.Categorical("kij", p=theta, observed=responses)

With the error

TypeError: Bad input argument with name "z_student" to theano function with name "<ipython-input-2-2a16f255dca1>:23" at index 0 (0-based).  
Backtrace when that variable is created:

Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?

Needless to say, I’m super confused!

jbuddy_13 · November 20, 2020, 8:55pm

@MarcoGorelli, @PedroSebe, wow did I ever overcomplicate things!

responses = np.array([ 
          [0,1,2,2,2],
          [0,1,2,1,1],
          [0,1,2,0,0],
          [0,1,2,0,1],
          [0,1,2,1,0]  
    ]).flatten()

students = 5
questions = 5
categories = 3

with pm.Model() as model:
    z_student = pm.Normal("z_student", mu=0, sigma=1, shape=(students,categories))
    z_question = pm.Normal("z_question", mu=0, sigma=1, shape=(questions,categories))
    # Transformed parameter
    deltas = []
    for i in range(students):
        for j in range(questions):
            student_i = z_student[i,]
            question_j = z_question[j,]
            deltas.append(student_i - question_j)
    
    thetas = pm.Deterministic("theta", tt.nnet.softmax(deltas))   
    # Likelihood
    kij = pm.Categorical("kij", p=thetas, observed=responses)
    trace = pm.sample(chains=4)

az.plot_trace(trace, var_names=["z_student", "z_question"], compact=False);
`` `
All I needed was a nested for loop !

Topic		Replies	Views
IRT w/ PyMC3, unusual number of parameters modeled? Questions	4	899	November 19, 2020
Create model matrix Questions	7	1517	June 23, 2021
Shape error when sampling more than one value Questions	13	1907	December 20, 2018
KeyError Setting Mixture Proportions for Mixture Model Questions	26	1497	May 13, 2019
Combining parameters into single tensor Questions	19	3538	October 30, 2017

Multiple response model, broadcast failure?

Related topics