The first thing I see right away is that you’re indexing beta
on the first axis with player_idx
, but the dims of beta
are predictors, player
. I think it should be beta[:, player_idx]
.
Thinking about shapes a bit more, if you have k predictors, p players, and n rows of data, the output of pm.math.dot(beta[:, player_idx], X)
will be (k, n), (n, k) -> (k, k)
, which isn’t right. I think, since you are expanding beta to be shape (k, n)
with the player_idx, you can do the dot product “by hand” like this:
y_hat = (beta[:, player_idx].T * X).sum(axis=-1)
This will multiply each cell of each row by the correct beta for that variable, then sum across the rows, producing the usual linear projection. The output shape now should be (n, )
.
One other little “optimization” i see is that you could use the .assign
method of DataFrames instead of sm.add_constant
to save a few keystrokes, as in data = sd[predictors].assign(intercept = 1.0)
, which should give the name and dtype you want on the constant.
For your priors, you currently have a “fully pooled” model, where all player-pitch tuples are assumed to come from the same underlying distribution. (edit: this is wrong, see below) You are also pooling all your parameters, so you assume that the value of the coefficient on release_speed
comes from the same distribution as the coefficient on pfx_z
, for example. You could split up the parameters by adding a predictors
dimension to mu
and sigma
. In this case, you would still have full pooling of player-pitch tuples, but each feature would be treated independently.
As far as best practices go, I think what you have looks really clean. If you end up using non-centered parameterizations for beta
that can be messy, so I sometimes write helper functions to handle that. You could also consider writing a function that handles model creation and returns the model
object. That would make building and comparing a bunch of specifications cleaner.