Well in that case you still have three dimensions – isotope-species-year – you just have to think through how to connect the (noisy, heterogeneous) samples of individuals to the species-level features you are trying to infer from the measurements.
At the very least, I would contemplate estimating the annual mean (plus std or whatever else, depending on how the variable is distributed) for each species-year inside your model, then use those estimated means in the down-stream isotope model. That would let you propagate uncertainty about the true values, and also let you let you do some partial pooling between species if appropriate. For example, if you’re measuring size of closely related species, for example, you could assume that they should all be clustered around some grand mean with deviation, allowing you to infer something about the species with less samples. You could also include some time dynamics here if you think it’s appropriate, by putting a Gaussian random walk prior over the means.
All this is just spitballing, because I don’t have any domain knowledge about your problem. Sorry!
For the environmental images, I know next to nothing about this. I guess there should be some feature extraction techniques used in your domain that could allow you to reduce them to a more practical form? You can’t really use thousands of pixel arrays in a linear model.
As for using the gist, I would make sure to incorporate some hierarchy into the parameters, and think about if modeling the isotopes independently (like in the gist) makes sense or not. Also, use pd.factorize to painlessly get the index maps (the things I called i_idx, j_idx, t_idx) off your data instead of what I did. Factorize also gives back labels you can use in your coords that are more beautiful than just integers.