Understanding dimensions/shapes of variables

mmonga · August 29, 2023, 1:20pm

I’m trying to understand better the dimensions in pymc (I’m using v5) and I tried the following code:

import pymc as pm

with pm.Model(coords = {
    'quizzes': range(5),
    'students': range(3)
}) as m2:
    a = pm.Normal('alpha', 0, 1, shape=(1, 3), dims='students')
    b = pm.Normal('beta', 0, 1, shape=(5, 1), dims='quizzes')
    logit_p = a + b
    y = pm.Bernoulli("y", logit_p=logit_p, dims=('quizzes', 'students'))
 
assert tuple(m2['y'].shape.eval()) == (5, 3)

with m2:
  pm.sample_prior_predictive()

However the sampling throws an error: conflicting sizes for dimension 'students': length 1 on the data but length 3 on coordinate 'students' and I cannot make sense of it. Anyone bother to explain it or point me to the relevant documentation?

Thank you in advance.

cluhmann · August 29, 2023, 1:49pm

Welcome!

You are asking for an alpha parameter for each student and there are 5 such students. But you have also specified that alpha be of shape=(1,3) which conflicts with the request for 5 alphas. You can remove the shape argument

with pm.Model(coords = {
    'quizzes': range(5),
    'students': range(3)
}) as m2:
    a = pm.Normal('alpha', 0, 1,dims='students')
    b = pm.Normal('beta', 0, 1, dims='quizzes')
    logit_p = a + b
    y = pm.Bernoulli("y", logit_p=logit_p, dims=('quizzes', 'students'))

but you’ll still get size conflict. Why? Because now you have 5 alphas and 5 betas and so you have 5 ys. But then you also specify that y must have dims=('quizzes', 'students') (i.e., shape == (5, 5)).

Does that help?

ricardoV94 · August 29, 2023, 1:51pm

@cluhmann is correct about the conflict coming from the mismatch between shapes/dims. Specifically the problem is that shape implies 2 dimensions, but you specified only one in dims.

You can drop the dummy dimension like @cluhmann example, and do the right broadcasting logic afterwards, with something like logit_p = a[None, :] + b[:, None]

PyMC uses dims only as labels, and broadcasting still works like in numpy: positional based and not based on dims.

cluhmann · August 29, 2023, 1:55pm

Oops, I missed read the ‘students’ coord as range(5). So yes, you would need to get the shapes in order as @ricardoV94 outlined.

mmonga · August 29, 2023, 2:49pm

Thank you! This in fact works as expected.

import pymc as pm

with pm.Model(coords = {
    'quizzes': range(5),
    'students': range(3)
}) as m2:
    a = pm.Normal('alpha', 0, 1, dims='students')
    b = pm.Normal('beta', 0, 1, dims='quizzes')
    logit_p = a[None, :] + b[:, None]
    y = pm.Bernoulli("y", logit_p=logit_p, dims=('quizzes', 'students'), observed=np.ones(shape=(5,3)))

assert tuple(m2['y'].shape.eval()) == (5, 3)
with m2:
    pm.sample_prior_predictive()

I didn’t know the trick with the None… the line suggested is equivalent to:

 logit_p = a.reshape((1, 3)) + b.reshape((5, 1))

I was confused by the docs (“When used alone, dims must be combined with explicit shape information.”): I assumed the shape parameter forced a reshape. Instead the suggested solution is: get the implicit shape and explicit the wanted reshape.

jessegrabowski · August 29, 2023, 2:54pm

I agree that line in the docs is confusing. Actually you’re not in the “used alone” case. If you look at the example closely, there is no coords=coords in the pm.Model() in that first example. This is what is meant by “used alone”.

So in your case, students is inheriting the shape info from range(3). This was the source of your original error. It was traversing the shape of the random variable one dimension at a time, checking if the shapes implied by the dims matched the provided shapes. It found 1 (your batch dim), and checked it against 3 (the active dim). Since they aren’t the same, it raised the length 1 on the data but length 3 on coordinate 'students' error.

Topic		Replies	Views
Proper way to model several variables v5	4	1270	August 8, 2022
Understanding 'Coords' and 'dims' to analyze dataset version agnostic modeling	3	162	September 28, 2024
Dimension error for Hierarchical model version agnostic	11	383	April 13, 2023
Error while using dims with multiple linear regression model	3	280	December 27, 2023
How do I find where a Dimensionality error is when I can't graph my model? v5	7	1005	July 8, 2022

Understanding dimensions/shapes of variables

Related topics