Understanding a Model Configuration

Hi all,

I’m looking through a notebook containing a number of different model types in both PyMC3 and STAN, link here:

I am hoping someone can help me understand a particular section.

In the Hierarchical Model of Parent and Manufacturer section, the PyMC3 model contains a variable as such:

b0 = pm.Normal('b0_mfr', mu=b0_parent[mfr_parent_map], sd=b0_mfr_sd, shape=n_mfr)

What i’m particularly confused about is this: mu=b0_parent[mfr_parent_map], specifically the reference in square brackets.

the b0_parent reference is a normal distribution initiated earlier in the model, which all makes sense to me:

b0_parent = pm.Normal('b0_parent', mu=b0_parent_mn, sd=b0_parent_sd, shape=n_parent)

but I don’t understand how the [mfr_parent_map] reference comes into play. This map is an array/list of label encoded values that map car manufacturers to their parent company.

I peeled the code out and remade the array without encoding to post here:

['fiat', 'fiat', 'aston martin lagonda', 'volkswagen', 'volkswagen',
       'bmw', 'fiat', 'peugeot-citroen', 'peugeot-citroen', 'fiat',
       'fiat', 'ford', 'honda', 'hyundai', 'tata', 'hyundai',
       'volkswagen', 'toyota', 'mazda', 'mclaren', 'daimler-ag', 'bmw',
       'mitsubishi', 'renault-nissan', 'peugeot-citroen', 'volkswagen',
       'renault-nissan', 'bmw', 'volkswagen', 'volkswagen', 'daimler-ag',
       'shanghai-auto', 'subaru', 'suzuki', 'toyota', 'gm', 'volkswagen',

Array is created from the table below.

idx mfr_enc parent_enc cnt
0 abarth fiat 12
1 alfa romeo fiat 10
2 aston martin lagonda aston martin lagonda 6
3 audi volkswagen 339
4 bentley motors volkswagen 18
5 bmw bmw 539
6 chrysler jeep fiat 29
7 citroen peugeot-citroen 93
8 ds peugeot-citroen 28
9 ferrari fiat 2
10 fiat fiat 27
11 ford ford 218
12 honda honda 13
13 hyundai hyundai 39
14 jaguar tata 27
15 kia hyundai 28
16 lamborghini volkswagen 2
17 lexus toyota 5
18 mazda mazda 31
19 mclaren mclaren 11
20 mercedes-benz daimler-ag 315
21 mini bmw 147
22 mitsubishi mitsubishi 4
23 nissan renault-nissan 4
24 peugeot peugeot-citroen 94
25 porsche volkswagen 67
26 renault renault-nissan 12
27 rolls royce bmw 7
28 seat volkswagen 63
29 skoda volkswagen 46
30 smart daimler-ag 5
31 ssangyong shanghai-auto 5
32 subaru subaru 21
33 suzuki suzuki 21
34 toyota toyota 27
35 vauxhall gm 73
36 volkswagen volkswagen 107
37 volvo geely 95

I just can’t understand how it’s being used in the context of this model. Is it basically created a b0 hyperprior for each Parent company as per the shape parameter, then the square bracket reference is accessing the b0_parent mu for the relevant manufacturer?

Yes, it’s just extending the prior to have the same shape of the index. You can do the same in numpy, which is called fancy indexing: Fancy Indexing | Python Data Science Handbook