PYMC3: using GLM with many features


#1

Hi everyone,
I have an issue when applying the standard GLM formula with many features.
Is there any way to write Y ~ X1+X2 +…+Xn (with n very large) without using the symbol +?
I would love using a more compact syntax, such as X1:Xn.

This is a small example of what I am basically trying to do.

niter = 10000
with pm.Model() as model0:
    pm.glm.GLM.from_formula('target ~ feat1+feat2 + feat3', df, family=pm.glm.families.Binomial()) 
    trace0 = pm.sample(niter, step=pm.Metropolis(), random_seed=123, progressbar=True)

Versions and main components

  • Python Version: 3.5
  • Operating system: Windows 10
  • How did you install PyMC3: (conda/pip) pip

#2

Hi Andrea,

It would probably be easiest to instead do the linear regression manually:

inputs = df.drop('target', axis='columns').values
outputs = df['target'].values
with pm.Model():
   intercept = pm.Normal('intercept', mu=0, sd=1)
   coeffs = pm.Normal('coeffs', mu=0, sd=1, shape=df.shape[1])
   pm.Binomial('obs', p=pm.math.logit(intercept + pm.math.dot(coeffs, inputs)), 
               observed=outputs)
   trace = pm.sample() # should not use metropolis sampler

#3

Alternatively, you could create the string yourself: 'target ~ ' + ''.join([feat + ' + ' for feat in df.drop('target', axis=1)])[:-3]


#4

Dear Thomas,
thanks for the prompt and very useful reply. Just a question: using the proposed solution would imply having an extra + operator at the end of the feature list in python pm.glm.GLM.from_formula('target ~ feat1+feat2 + feat3', df, family=pm.glm.families.Binomial()), wouldn’t it? Moreover, you set the [:-3] at the end but wouldn’t it create an empty list?
Thanks for the support


#5

I set the wrong bracketing (updated now). The [:-3] is supposed to slice off the last +.


#6

Thanks again! Appreciated this a lot :wink: