Hi everyone! I hope this question is not out of place.

I have a problem where I have one variable (Y), which consists of N biological measurements per year over four years, and three other variables (X1, X2, X3) consisting of M satellite environmental data (M>>N) over the same four years. I’d like to see whether the yearly changes in X1, X2, and X3 could explain the corresponding changes in Y. I’ve been playing with gaussian processes, but I can’t seem to wrap my head around how due to the different sample sizes between Y, X1, X2, and X3.

I’ve successfully applied a model similar to the “islands model” presented in Bayesian Analysis with Python by @aloctavodia, where the distance matrix of the annual means in X1 models the annual means in Y. Still, I feel like I’m leaving a lot of information on the table by just using mean values.

Any suggestions or pointers are welcome .

Am I correct that this is a missing data problem? That is, X1, X2, and X3 are collected at higher frequency than Y (perhaps Y is not collected at regular intervals at all?)?

If so, you should be able to reduce everything to the lowest time frequency (paying attention to adjust for stock and flow variables), then use your model to interpolate the missing data.

Hi @jessegrabowski! Thank you very much for your response.

I missed quite some context, and, unfortunately (for me), it is not a missing data problem. Y is stable isotope data from biological tissue. In short, stable isotopes give information regarding what an animal ate (and where) three months (in my case) before sampling. For example, let’s say that I sampled 20 animals in one year during May. I’d get information about the animal’s behavior from March through May.

Moreover, my animals are pretty mobile, so getting environmental information from where they were sampled would not give enough information in space or time. For this reason, Xs are data from regional oceanographic variables extracted from three-month composite satellite images. That is to say, the only dimension that X and Y share is time.

Interesting! So is it a mix of (time, animal) longitudinal observations for the isotope data (the same animals over time), and time-only environmental factors shared by all the animals (the Xs)?

1 Like