Yes, random variables are aesara tensors, so the broadcasting rules explained in https://aesara.readthedocs.io/en/latest/tutorial/broadcasting.html apply. TL; DR they are similar to numpy ones but tensors also have a true/false indicator to “turn off” broadcasting on some dimensions if desired.
Yes, I’d recommend factorizing into 2d vectors with length as the number of observations in a style similar to long format dataframes. There are two reasons for that. The first is that it works also with ragged data; keeping every hierarchy level as its own dimension only works if there are the same number of observations per group which is generally not the case. The second is I haven’t had time to do proper benchmarks but I am quite sure it is faster to implement the models this way unless no indexing were required if working with multidimensional arrays of parameters.
Using your example, if you have M
observations per combination of city and time, then you can strucutre you observations as a 3d array with dimensions time: 2, city: 3, observation: 5
→ n_obs = 30
. You can then use broadcasting only to combine the 1d (length 2) time parameters vector with the 1d (length 3) citiy parameters, no indexing would be needed here and I think it would be quite intuitive, but that is generally not the case and you have 5 observations for Berlin and 7 for Paris.
I think it is important to note that the number of parameters and the number of observations are independent and you always need to evaluate the model on all your observations. In a simple linear regression we have 3 parameters a, b, \sigma but the log likelihood is the sum of the linear model evaluated at every observations; PyMC does that automatically (and aesara can optimize the operations done sometimes) when given the observed
kwarg with all the observations.
Second note, the pm.Deterministics
and the city_temperature[cities_idx]
do not add more parameters to the model, they only create transformations and arrays with the original parameters restructured to compute the right quantity when working with arrays.
In my opinion, going back to what I said above you need 1d vectors and all the factors as indexes (you can use pandas.factorize
for this like you have done) of if same # of observations per category then multidimensional array (not compatible with pandas anymore unless you have only one level). I can’t really speak about having the levels as “raw” columns or as multiindex though.
I think it is
Here is where the parameters vs observation difference comes into play. The likelihood can’t be of size 3 because the likelihood combines both data and parameters. There are only 3 possible values for the mean yes, but we need to evaluate the normal (whatever one of the 3 means we need to use) at each of the observations. In the exponent of the pdf (x - \mu) you need to take the x into account too, not only the \mu.
Moreover, to do posterior predictive checks you need to preserve that structure.
However, this can be quite inconvenient if you want to analyze the posterior predictive samples/predictions at each city (i.e. to get the probability the temperature will be higher than t in each city).
To get around this I think you should consider two options:
- You can update the
cities_idx_pm
withpm.set_data
and replace it with an arange of length the number of cities. This is easy and convenient if you only have one category, but it quickly becomes cumbersome. To add thetime
component you’d need to use0, 0, 1, 1, 2, 2
as city_idx and0, 1, 0, 1, 0, 1
as time_idx… - Repeat the
mu
computation with xarray and use xarray-einstats to sample from the posterior predictive. Broadcasting and alignment will be automatic and you’ll get a multidimensional array with hierarchies as different dimensions.
The likelihood with size 3 looks very wrong to me. I still don’t grok the new size
argument, but you might be generating a 3, 174
array to evaluate the log likelihood on. What is the shape of the data in the log likelihood group of the result? Do you get the same results?
Yes, the example does have the same number of observations per category, so you can store the observations in a time, city, date
3d array. Then this line can become city_time_temperature = city_temperature[None, :] + time_offset[:, None]
which might work as input directly if the observed is the 3d size, or you might need an extra length 1 dim for broadcasting to work so use mu=city_time_temerature[:, :, None]
.
However, big however. That is the mean of the temperatures, it is not what you observed. I’d recommend working with the posterior predictive samples to get values you can interpret as what you’d expect to see. Already with the current model, you can get posterior predictive samples as:
from xarray_einstats.stats import XrContinuousRV
from scipy.stats import norm
post = idata.posterior
temperature = XrContinuousRV(norm, post["city_temperature"] + post["time_offset"], 0.5).rvs()
I recently wrote a Discourse topic to introduce xarray-einstats.
Magic dimension/label based broadcasting is (at least for now) outside the scope of pymc and aesara. IIRC there is an open issue on aesara about this though you might want to search for it. All of this and more however already works after sampling once you get the results as InferenceData.