Thanks for explaining a bit more. Your wording is great, it’s just that working with models that include time is a huge pain in the ass, so you have to be careful. Let me set up the problem as I now understand it to see if we’re on the same page.
Let’s index games by i \in N, and time by t \in T. We chop the games into T = 100 even time “buckets” and record the number of points scored in that interval only as s_{i,t}. This means that we can get the total score of the game by summing over the buckets: S_i = \sum_{t=0}^T s_{i,t}.
More importantly, we can take the cumsum over the s_{i,t} to get the “points remaining to be scored”. Let’s call that R_{i,t} = \sum_{k=0}^t s_{i,k}. The goal of the exercise is to predict \hat{R}_{i,t} as a function of some features, i.e:
\hat{R}_{i,t} = f(X_{i,t} \beta_t + \epsilon_{i,t}) (where f is some linking function).
The shape of R_{i,t} should be like T \times N, since it has a value in each of the N games and each of the T time buckets. Further, I assume that the feature matrix X has shape T \cdot N \times k, where k is the number of features. Theta, therefore, is exactly what you are looking for: it has shape T \times N: the t-th row and i-th column contains the estimated score-to-go (in logit space, since it hasn’t been passed through f yet), starting from time t, for the i-th game in your sample.
One final note: usually labels are column vectors, so you might prefer to have R_{i,t} be a T \cdot N \times 1 column vector, organized as (i=0, t=0, i=0, t=1, … i=0, t=99, i=1, t=0, …, i=2999, t=99). If you put your labels in this format, you can do a .ravel() on theta to make them match.
Let me know if I’m still not getting something and we can try to work it out together!