Out of Sample Predictions on heirarchical regression model (PyMC - v5.0.1)

I’d add a value to coords for the feature, then make a single beta with everything bundled together. Example code:

import pymc as pm
import numpy as np
import pandas as pd
from string import ascii_letters

n_obs = 100
n_features = 20
n_groups = 5

X = np.random.normal(size=(n_obs, n_features))
groups = np.random.randint(0, n_groups, size=n_obs)
df = pd.DataFrame(X, columns = list(ascii_letters[:n_features]))

coords = {
   'features':df.columns,
   'groups': np.arange(n_groups)
}

with pm.Model(coords=coords) as mod:
   X = pm.MutableData('X', df)
   alpha = pm.Normal('alpha', dims=['groups'])
   betas = pm.Normal('beta', dims=['groups', 'features'])
   
   mu = alpha[groups] + (betas[groups] * X).sum(axis=-1)

betas is shape (groups, features) and gets blown up to (obs, features) when you index it with groups, so (betas[groups] * X).sum(axis=-1) is the same as you have, just compact.

For indexing X you have two choices:

  1. (my recommended option) Make a function that creates and returns your model, and takes the dataframe and a list of columns as inputs. Use this list of columns to build the coords and slice your data inside the pm.MutableData constructor, so you don’t have to slice X anywhere else down-stream in the model.

  2. If you really want to pass the entire dataframe (although you only use a subset of columns), you can slice by index number rather than index name. Make a list of features you want included in the regression, convert them to numerical indices for your dataframe, then slice betas and X accordingly. Example:

    feature_list_1 = list('afj')
    feature_list_2 = list('bcdehijkl')
    
    feature_idx_1 = [i for i, x in enumerate(df) if x in feature_list_1]
    feature_idx_2 = [i for i, x in enumerate(df) if x in feature_list_2]

    mu1 = alpha[groups] + (betas[groups, :][:, feature_idx_1] * X[:, feature_idx_1]).sum(axis=-1)
    mu2 = alpha[groups] + (betas[groups, :][:, feature_idx_2] * X[:, feature_idx_2]).sum(axis=-1)

If anyone knows how to avoid the awkward double-indexing on betas let me know. If you try to put them both in the same slice it will trigger fancy indexing, which isn’t what you want in this case.

1 Like