Building the linear equation of a logistic regression model

researcher · October 16, 2018, 2:26pm

It was mentioned previously in the post below that I am inefficiently building the linear equation part of the model: Access probability p parameter from pm.BetaBinomial distribution during sample_ppc

The post above contains the code used to construct the linear equation part but I’ll also provide below:

with pm.Model() as model:        
    intercept = pm.Normal('intercept', mu=0.0, sd=5, shape=1)

    link_argument = intercept

    for covariate in covariates:
        if covariates[covariate]['type'] == 'categorical':
            shape = covariates[covariate]['encoder'].classes_.size
        elif covariates[covariate]['type'] in ['metric', 'binary']:
            shape = 1

        sigma = pm.HalfCauchy(f'{covariate}_coeff_sigma', beta=5)
        offset = pm.Normal(f'{covariate}_coeff_offset', mu=0, sd=1, shape=shape)
        coeff = pm.Deterministic(f'{covariate}_coeff', 0.0 + offset * sigma)

        if shape > 1:
            link_argument += coeff[model_variables[covariate]]
        else:
            link_argument += coeff * model_variables[covariate]

    omega = pm.Deterministic('omega', pm.invlogit(link_argument))
    kappa = pm.Exponential('kappa', lam=1e-4)

    alpha = pm.Deterministic('alpha', omega * kappa + 1)     
    beta = pm.Deterministic('beta', (1 - omega) * kappa + 1)     
    likelihood = pm.BetaBinomial(
        'likelihood', alpha=alpha, beta=beta, n=model_variables['n'], observed=model_variables['y_obs']
    )

I have two questions as I start to productionize this model:

1.) what is the more efficient way to build this model instead of the for loop that still takes into account the categorical variables?

2.) Does the inefficient building of the model affect the theano calculations that take place when sampling the posterior distribution?

If the inefficiencies only contribute to the building of the model itself, this is less of a concern to me.

Thanks for the help!

bdyetton · October 16, 2018, 5:20pm

As a pymc3 n00b, I can’t answer all your questions,
But for 1:
You could create offset (and sigma) without a loop as:
offset = pd.Normal(mu=0, sd=1, shape=n_covariates)
sigma = pd.HalfCauchy(beta=5, shape=n_covariates)
and then take the dot product with your data:
pm.math.dot(offset*sigma, model_variables)

You would have to create a column of all ones in your data for the intercept.

Not sure on #2, but why dont you test

researcher · October 16, 2018, 5:40pm

This actually does not take into account the categorical variables properly. This would assign one coefficient to the categorical covariate instead of assigning a new coefficient for each category in the covariate if I’m reading this correctly.

bdyetton · October 16, 2018, 6:19pm

Ah yea, thats fair. I guess you can dummy code your catergorical variables then.
This coding can be done outside of the model block as a data preprocessing step to make the model code a bit cleaner.

TBH, i dont think for a small model like this you are likely to get any speedup using a dot product instead of the way you have it now.

For larger models (and generally for pymc3 model specification), matrix multiplication > for loops.

For more complicated models where you think you might need a for loop, especially when porting from other MCMC packages, this blog post has been helpful https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html

researcher · December 4, 2018, 4:09pm

@junpenglao, do you have any suggestion on how to proceed?

I tried vectorizing the above operations into a dot product of the design matrix and coefficient vectors, but that actually resulted in longer computation time since it blew up the graph size I think.

Here is how I vectorized:

def construct_model(model_parameters):
with pm.Model() as model:
    # The bounds of this distribution correspond to machine 0 and machine 1 on logit scale.
    # This means that the baseline number is bounded on [0, 1].
    intercept = pm.Normal('intercept', mu=0, sd=5.0, shape=1)

    link_argument = intercept

    design_matrix = []
    coefficients = []
    
    for parameter in model_parameters:
        model_label = model_parameters[parameter].get('model_label', parameter)

        tensor = model_parameters[parameter]['shared']

        if model_parameters[parameter]['parameter'] != 'covariate':
            continue
            
        if model_parameters[parameter]['covariate_type'] == 'categorical':
            shape = model_parameters[parameter]['encoder'].classes_.size
            
            matrix = []
            for pos in range(shape):
                # Create a vector that is 1 if equal to the position and 0 otherwise, i.e. (one-hot-encode)
                vector = tt.switch(tt.eq(tensor, np.array(pos)), np.array(1.0), np.array(0.0))
                matrix.append(vector)
            matrix = tt.stack(matrix, axis=1)
            print(matrix.eval())
            print(tensor.eval())
    
        elif model_parameters[parameter]['covariate_type'] in ['metric', 'binary']:
            shape = 1
            
            # This turns an array into a vector of dimension Nx1.
            matrix = tensor.dimshuffle(0, 'x')
        
        logger.debug(f'Using parameter {parameter} to create coefficient {model_label}')

        sigma = pm.HalfNormal(f'{model_label}_coeff_sigma', sd=1.0)
        offset = pm.Normal(f'{model_label}_coeff_offset', mu=0, sd=1.0, shape=shape)
        coeff = pm.Deterministic(f'{model_label}_coeff', 0.0 + offset * sigma)

        design_matrix.append(matrix)
        coefficients.append(coeff.dimshuffle(0, 'x'))

    design_matrix = tt.concatenate(design_matrix, axis=1)
    coefficients = tt.concatenate(coefficients, axis=0)
    
    # omega parameter is "mode" of probability p and kappa parameter is "certainty"
    omega = pm.Deterministic('omega', pm.invlogit(tt.dot(design_matrix, coefficients)))
    offset = pm.Exponential('kappa_offset', lam=1, shape=1)
    kappa = pm.Deterministic('kappa', (1 / 1e-3) * offset)

    alpha = omega * kappa + 1
    beta = (1 - omega) * kappa + 1

    observed = model_parameters['impressions_target']['shared']
    n = model_parameters['impressions_anchor']['shared']

    likelihood = pm.BetaBinomial('likelihood', alpha=alpha, beta=beta, n=n, observed=observed)
    
return model

researcher · December 4, 2018, 4:49pm

For context, the original way of building the model resulted in gradient calculations of 3.97 ms ± 116 µs per loop.
With the way just posted, it took ~6s per loop.

Is that 3.97 ms per loop a number that is too large or is that a decently fast calculation speed?

junpenglao · December 4, 2018, 9:24pm

Hmm, you are not really vectorizing it, you still have a for loop that likely quite inefficient.
For vectorized parameterization, you have something like a predictor matrix X and you do X.dot(beta)

researcher · December 4, 2018, 9:25pm

Got it. I’m on it! Thank you.

researcher · December 4, 2018, 11:05pm

Ok, so building that design matrix before the model context with numpy then setting the design matrix as a shared tensor is giving me the similar ~6s per loop. I am still building the coefficients the same way since it’s not clear how else to do it.

Also, is it possible to speed up that 3.97ms per loop even with vectorization or should I pursue other routes?

The long and short of it is that the training takes roughly 15 min and I would like to lower that time by as much as possible.

junpenglao · December 5, 2018, 5:50am

You should do something like beta = pm.Normal('beta', mu=0., sd=1., shape=(X.shape[1], 1))

researcher · December 5, 2018, 1:29pm

But wouldn’t that not capture the hierarchical nature of the model?

junpenglao · December 5, 2018, 2:01pm

You still can - the “trick” is all in the construction of fixed and random effect design matrixes.
You might find eg https://github.com/junpenglao/GLMM-in-Python/blob/master/Playground.py helpful

researcher · December 5, 2018, 2:31pm

So, even without the hierarchical effect, building a vector of coefficients and multiplying by the design matrix is still not as fast as the initial version.

def construct_model(model_parameters):
covariates = [parameter for parameter in model_parameters if model_parameters[parameter]['parameter'] == 'covariate']
design_matrix = np.column_stack([model_parameters[covariate]['array'] for covariate in covariates])

shape = design_matrix.shape[1]
design_matrix = shared(design_matrix)

with pm.Model() as model:
    # The bounds of this distribution correspond to machine 0 and machine 1 on logit scale.
    # This means that the baseline number is bounded on [0, 1].
    intercept = pm.Normal('intercept', mu=0, sd=5.0, shape=1)

    coefficients = []
    
    coefficients = pm.Normal('beta', mu=0., sd=1., shape=(shape, 1))
    
    # omega parameter is "mode" of probability p and kappa parameter is "certainty"
    omega = pm.Deterministic('omega', pm.invlogit(intercept + design_matrix.dot(coefficients)))
    offset = pm.Exponential('kappa_offset', lam=1, shape=1)
    kappa = pm.Deterministic('kappa', (1 / 1e-3) * offset)

    alpha = omega * kappa + 1
    beta = (1 - omega) * kappa + 1

    observed = model_parameters['impressions_target']['shared']
    n = model_parameters['impressions_anchor']['shared']

    likelihood = pm.BetaBinomial('likelihood', alpha=alpha, beta=beta, n=n, observed=observed)
    
return model

researcher · December 5, 2018, 2:33pm

Fyi the number of features is only 6, but there are 351 covariates, i.e. the shape of the design matrix is (5473, 351).

Topic		Replies	Views
Create model matrix Questions	7	1490	June 23, 2021
Regression model using only categorical variables Questions	9	5932	October 26, 2021
Hierarchical logistic regression Sharing	17	4425	October 28, 2019
Slow sampling in a Hierarchical Logistic Regression example Questions	7	797	May 12, 2020
Syntax to mix continuous covariates with dummy categoricals version agnostic	5	233	June 10, 2023

Building the linear equation of a logistic regression model

Related topics