Problem with coords/dims in hierarchical model

jessegrabowski · January 4, 2023, 7:18pm

The first thing I see right away is that you’re indexing beta on the first axis with player_idx, but the dims of beta are predictors, player. I think it should be beta[:, player_idx].

Thinking about shapes a bit more, if you have k predictors, p players, and n rows of data, the output of pm.math.dot(beta[:, player_idx], X) will be (k, n), (n, k) -> (k, k), which isn’t right. I think, since you are expanding beta to be shape (k, n) with the player_idx, you can do the dot product “by hand” like this:

y_hat = (beta[:, player_idx].T * X).sum(axis=-1)

This will multiply each cell of each row by the correct beta for that variable, then sum across the rows, producing the usual linear projection. The output shape now should be (n, ).

One other little “optimization” i see is that you could use the .assign method of DataFrames instead of sm.add_constant to save a few keystrokes, as in data = sd[predictors].assign(intercept = 1.0), which should give the name and dtype you want on the constant.

~~For your priors, you currently have a “fully pooled” model, where all player-pitch tuples are assumed to come from the same underlying distribution.~~ (edit: this is wrong, see below) You are also pooling all your parameters, so you assume that the value of the coefficient on release_speed comes from the same distribution as the coefficient on pfx_z, for example. You could split up the parameters by adding a predictors dimension to mu and sigma. In this case, you would still have full pooling of player-pitch tuples, but each feature would be treated independently.

As far as best practices go, I think what you have looks really clean. If you end up using non-centered parameterizations for beta that can be messy, so I sometimes write helper functions to handle that. You could also consider writing a function that handles model creation and returns the model object. That would make building and comparing a bunch of specifications cleaner.

Topic		Replies	Views
Error while using dims with multiple linear regression model	3	265	December 27, 2023
Use of coordinates/dimensions in hierarchical models v5 hierarchical	2	909	November 23, 2022
Hierarchical Model with Multiple Predictors	5	1630	August 15, 2023
Understanding coords, indexation, Data, ..., for multilevel models v5 modeling	1	3684	April 29, 2022
Data / model structure with hierarchical model and covariates at individual and group level Questions	5	927	October 6, 2021

Problem with coords/dims in hierarchical model

Related topics