Loop fusion failed

Hi. In order to implement a hierarchical model for marathon times i need one different variable for each runner, since they don’t all have the same number of race years. For example if the dataset has 100 runners i cant do something like

meanRunner = pm.StudentT(“meanRunner” nu=3, mu=meanYear,sigma=sigma_muRunner,shape=(100,4))
since not all 100 runners have run in 4 race years. There are some with 2 races years, some with 3 etc.
So if runner 1 has run in 4 races and runner 2 in 2 i have to go in a for loop and create each variable like
meanRunner1 = pm.StudentT(“meanRunner1” nu=3, mu=meanYear,sigma=sigma_muRunner,shape=4)

meanRunner2 = pm.StudentT(“meanRunner2” nu=3, mu=meanYear,sigma=sigma_muRunner,shape=2)

and then stack them like stack([meanRunner1,meanRunnner2] etc. For a small number up to 10-15 the loop is working and i am getting what i need but for a larger number i get
UserWarning: Loop fusion failed because the resulting node would exceed the kernel argument limit.
and the program breaks.
Is there a way to handle this situation?
Thanks in advance

Your situation is called an “unbalanced panel”. In PyMC, you can handle this using indicator variables. Here’s a similar situation. In your case, you will have two indicators, one for time and one for runner. You’ll make a single runner_mean with shape 100, and a single year_mean with shape 4, then you’ll use the indices to broadcast them together.

Yes, the dataset is unbalanced. There are seven race years (2013-2019) and 108 runners. Each runner might have run a number of races 0 to 10 to each race year. Lets say a runner run 1 race in 2013, 3 races in 2017 and 2 races in 2019. So i wanted to create only the needed “variables” ( tensors/subtensors).
If i do something like the following…
There are 7 means of racetime , one for each race year
meanYear = pm.StudentT(“meanYear”,nu=3, mu=meanRacetime,sigma=240,shape=7)

and if for every runner i create his means of race time for every year
meanRunner = pm.StudentT(“meanRunner”, nu=3, mu=meanYear,sigma=240,shape=(108,7))

and finally i use the runner index and year index
Y_obs = pm.StudentT(“Runner”,nu=3,
mu=meanRunner[runner_idx,year_idx],sigma=240,observed=run_data.RaceSeconds)

then if i do trace_plot for runner 0 who has run in 2013,2014,2015 (0,1,2) he will have 7 means instead of 3. I wanted to avoid these extra 4 means for him and have just 3.
That’s why i was creating each variable for each runner with the needed shape inside a for loop. A runner should have only 1 mean ( if he had run various races only in one year) another runner should have 3 means (if he had run various races in 3 years).
Is there a way to avoid the for loop? With my previous installation the for loop was running (it was very slow but produced properly), but since i installed bambi it must have updated some libraries and it runs only for a small number (<20). That’s why i asked if there is a more proper way.
Thanks again.

Is there a way to avoid the for loop? With my previous installation the for loop was running (it was very slow but produced properly), but since i installed bambi it must have updated some libraries and it runs only for a small number (<20). That’s why i asked if there is a more proper way.
Thanks again.

Yes, with some slighly complex advanced indexing.

Are the average running times for the same runner across multiple years related in the prior? If not, you can treat each runner and year combination as it’s own identity: have one parameter for each, ravel your observations and use a single vector advanced indexing to select the right means for each observation.

If they are related, it’s more tricky because you will have two levels of prior. Before we go there, just confirm if that’s the case.

1 Like

Since meanRunner is centered on meanYear, the effects are just additive. You can do meanYear[year_idx] + meanRunner[runner_idx]. You data should be in long form, as in the issue I linked.

1 Like

The meanRacetime is prior for meanYear, meanYear is prior for meanRunner.
with pm.Model() as runners:
meanRacetime = pm.StudentT(“meanRacetime”,nu=3,mu=8300,sigma=240)

meanYear = pm.StudentT("meanYear",nu=3, mu=meanRacetime,sigma=240,shape=7)

meanRunner = pm.StudentT(“meanRunner”, nu=3, mu=meanYear,sigma=240,shape=(108,7))

Y_obs = pm.StudentT("Runner",nu=3, mu=meanRunner[runner_idx,year_idx],sigma=240,
                        observed=run_data.RaceSeconds)

Sure, the effect is additive (it is an initial approach in order to figure out how to handle shaping and indexing). But what is the shape of meanRunner? I have 108 runner so the first dimension will be 108, but not all of them have a race mean for every year because they haven’t run in every year. The years are 7.
If i write something like
meanRunner = pm.StudentT(“meanRunner”, nu=3, mu=meanYear,sigma=240,shape=(108,7))
I don’t know how the ‘empty’ means interfere with the inference.
So i was looping over the runners and i was creating 108 different variables with proper size for each runner.
e.g.
meanRunner1 = pm.StudentT(“meanRunner1”, nu=3, mu=meanYear,sigma=240,shape=3)
if runner1 had races in 3 years
meanRunner2 = pm.StudentT(“meanRunner2”, nu=3, mu=meanYear,sigma=240,shape=5)
if runner2 had races in 5 years.
The sampling was very slow but the results were good. So i am asking how i can handle such a situation in an effective way. I don’t have experience in working with tensors.

The way you want to handle this is to first reshape your data into “long form”. Here’s a dummy dataset with 5 runners and 3 times:

                       times
runner  year                
A      2018-01-01   1.492973
       2019-01-01   1.259906
       2020-01-01   5.536423
B      2019-01-01   7.734591
       2020-01-01  11.157608
C      2018-01-01   2.014740
       2020-01-01   7.937118
D      2018-01-01   2.950629
E      2019-01-01   3.261694
       2020-01-01   6.551939

So you can see that it’s off-balance, we only have 1 year for runner D, but all 3 for runner A.

Next, we need to get indexes for every row, telling us which year and which runner it is. We will use these to slice into random variables. To get these, use pd.factorize:

runner_idx, runners = pd.factorize(index.get_level_values(0))
year_idx, years = pd.factorize(index.get_level_values(1))

These are just arrays of integers:

runner_idx
>>> Out: array([0, 0, 1, 1, 1, 2, 3, 3, 4, 4], dtype=int64)

time_idx
>>> Out: array([0, 1, 2, 0, 1, 1, 2, 1, 2, 1], dtype=int64)

So you can see that the first two rows are runner 0, on years 0, 1 then runner 1 on years 2, 0, 1, and so on. You also get back the list of labels to use in your PyMC coords. Here’s the relevant portion of a PyMC model:

coords = {'runner':runners, 'year':[x.year for x in years]}
with pm.Model(coords=coords) as mod:
    y = pm.MutableData('y', df.values)
    runner_mean = pm.Normal('runner_mean', dims=['runner'])
    year_mean = pm.Normal('year_mean', dims=['year'])
    mu = runner_mean[runner_idx] + year_mean[year_idx]

So you make two effects – one for runners, one for years. Then you can use the indexes to add them together. The indices will ensure that the right effects end up with the right rows. This is equivalent to broadcasting them together, like runner_mean[:, None] + year_mean[None], which would give you the full (n_runners, n_years) matrix, except that the indexes mean only the data you have is computed.

1 Like

First of all thanks a lot for the detailed answer. Although I think I am either missing something or I haven’t explained it clearly.
The following line is fine. One mean of racetimes for each year. 7 years, 7 mean race times for each year.
year_mean = pm.Normal(‘year_mean’, dims=[‘year’])
I have a question concerning the following line
runner_mean = pm.Normal(‘runner_mean’, dims=[‘runner’])
This means one mean race time for each runner. But i don’t want this. If a runner1 has run 3 races in year1 , 2 races in year2, 5 races in year3 i want at the end 3 mean race times for each one of the years. If runner2 has run in two years i need two means for him, one for each year.
So each runner should have one mean for every of the years he has run but runners in general have different number of means as they have raced in different number of years.

Just so we’re clear, this:

x \sim T(\mu, \sigma, \nu)

Is equivalent to this:

x \sim \mu + T(0, \sigma, \nu)

Including if \mu \sim T(\mu_0, \sigma_0, \nu_0)

My point is that when you add together year_mean and runner_mean, you end up with a unique mean for each runner-year combination. It is exactly equivalent to what you originally wrote, meanRunner = pm.StudenT('meanRunner', mu=meanYear, ...). The number of combinations of runner + year is governed by the indices – each unique addition of two means leads to a unique mean.

If there are multiple races within each year, this is no problem. You just need to introduce another index, race_id. The data becomes something like this:

                              times
runner year       race_id          
A      2018-01-01 0       -0.050662
                  1        0.921232
                  2       -0.665283
       2019-01-01 0       -1.239528
                  1       -0.884322
                  2        0.423450
                  3       -1.070264
       2020-01-01 0        2.510278
                  1        1.525603
                  2        1.790918
B      2018-01-01 0        1.141352
                  1        1.347654
                  2        4.210434
                  3        1.485402
       2019-01-01 0        1.636708

You don’t need to factorize race_id though, just factorize runner and year. All the races of racer A in 2018 will share the same mean, all the races of racer A in 2019 will share a mean, and so on.

Thanks again for the very detailed explanation.
mu = runner_mean[runner_idx] + year_mean[year_idx]
I didn’t know how to pass the mu to the observations. If i just pass mu or mu[index]. Is it something like the following?
Y_obs = pm.Normal(“Runner”, mu=mu,sigma=sig,observed=data.times)
And if i want to plot a specific mean for the first combination of runner+year)
use something like:
m = pm.Deterministic(‘m’,mu)
az.plot_trace(trace.posterior.m.sel(m_dim_0=0))
for the second
az.plot_trace(trace.posterior.m.sel(m_dim_0=1))

Yes, this line is correct:

 Y_obs = pm.Normal(“Runner”, mu=mu,sigma=sig,observed=data.times)

To recover specific combinations of runner + times, I think the easiest thing will be to make the full matrix of runner-year pairs and save it as a deterministic, and give it named dims, like this:

all_mu = pm.Deterministic('all_mu', runner_mean[:, None] + year_mean[None], dims=['runner', 'year'])

I am broadcasting the two effects together by making runner_mean into a column vector, so the runners are on the rows, and year_mean into a row-vector, so years are on the columns. Thus, all_mu[i, j] will be the mean for the i-th runner in the j-th year. To plot the effects, you can then use the named dimensions. This plots the KDE for the posterior mean of runner A in year 2018:

az.plot_posterior(trace, var_names=['all_mu'], coords={'runner':['A'], 'year':[2018]})

This also gives you counter-factual means for the runner-year pairs that you didn’t observe in the data, which might or might not be interesting.

Thanks a lot. That was exactly what I needed in order to start, but I couldn’t figure it out on my own. Thanks again.

1 Like