Hi all,
I’m looking through a notebook containing a number of different model types in both PyMC3 and STAN, link here:
I am hoping someone can help me understand a particular section.
In the Hierarchical Model of Parent and Manufacturer section, the PyMC3 model contains a variable as such:
b0 = pm.Normal('b0_mfr', mu=b0_parent[mfr_parent_map], sd=b0_mfr_sd, shape=n_mfr)
What i’m particularly confused about is this: mu=b0_parent[mfr_parent_map], specifically the reference in square brackets.
the b0_parent reference is a normal distribution initiated earlier in the model, which all makes sense to me:
b0_parent = pm.Normal('b0_parent', mu=b0_parent_mn, sd=b0_parent_sd, shape=n_parent)
but I don’t understand how the [mfr_parent_map] reference comes into play. This map is an array/list of label encoded values that map car manufacturers to their parent company.
I peeled the code out and remade the array without encoding to post here:
['fiat', 'fiat', 'aston martin lagonda', 'volkswagen', 'volkswagen',
'bmw', 'fiat', 'peugeot-citroen', 'peugeot-citroen', 'fiat',
'fiat', 'ford', 'honda', 'hyundai', 'tata', 'hyundai',
'volkswagen', 'toyota', 'mazda', 'mclaren', 'daimler-ag', 'bmw',
'mitsubishi', 'renault-nissan', 'peugeot-citroen', 'volkswagen',
'renault-nissan', 'bmw', 'volkswagen', 'volkswagen', 'daimler-ag',
'shanghai-auto', 'subaru', 'suzuki', 'toyota', 'gm', 'volkswagen',
'geely']
Array is created from the table below.
| idx | mfr_enc | parent_enc | cnt |
|---|---|---|---|
| 0 | abarth | fiat | 12 |
| 1 | alfa romeo | fiat | 10 |
| 2 | aston martin lagonda | aston martin lagonda | 6 |
| 3 | audi | volkswagen | 339 |
| 4 | bentley motors | volkswagen | 18 |
| 5 | bmw | bmw | 539 |
| 6 | chrysler jeep | fiat | 29 |
| 7 | citroen | peugeot-citroen | 93 |
| 8 | ds | peugeot-citroen | 28 |
| 9 | ferrari | fiat | 2 |
| 10 | fiat | fiat | 27 |
| 11 | ford | ford | 218 |
| 12 | honda | honda | 13 |
| 13 | hyundai | hyundai | 39 |
| 14 | jaguar | tata | 27 |
| 15 | kia | hyundai | 28 |
| 16 | lamborghini | volkswagen | 2 |
| 17 | lexus | toyota | 5 |
| 18 | mazda | mazda | 31 |
| 19 | mclaren | mclaren | 11 |
| 20 | mercedes-benz | daimler-ag | 315 |
| 21 | mini | bmw | 147 |
| 22 | mitsubishi | mitsubishi | 4 |
| 23 | nissan | renault-nissan | 4 |
| 24 | peugeot | peugeot-citroen | 94 |
| 25 | porsche | volkswagen | 67 |
| 26 | renault | renault-nissan | 12 |
| 27 | rolls royce | bmw | 7 |
| 28 | seat | volkswagen | 63 |
| 29 | skoda | volkswagen | 46 |
| 30 | smart | daimler-ag | 5 |
| 31 | ssangyong | shanghai-auto | 5 |
| 32 | subaru | subaru | 21 |
| 33 | suzuki | suzuki | 21 |
| 34 | toyota | toyota | 27 |
| 35 | vauxhall | gm | 73 |
| 36 | volkswagen | volkswagen | 107 |
| 37 | volvo | geely | 95 |
I just can’t understand how it’s being used in the context of this model. Is it basically created a b0 hyperprior for each Parent company as per the shape parameter, then the square bracket reference is accessing the b0_parent mu for the relevant manufacturer?