Adding constraints with pm.Potential

Hi,
I’m quite new to pymc and want to add strict constraints to a linear regression model.

import pymc as pm
import pandas as pd
import numpy as np
import pytensor.tensor as at

if __name__ == "__main__":

    df = pd.read_csv("BLR_lib_GU48-1234.csv", header=None, index_col=False)

    X = df.iloc[:, :-1]
    # Standardize the features
    stds = X.std().values
    X -= X.mean()
    X /= X.std()

    y_obs = df.iloc[:, -1]

    with pm.Model(coords={"predictors": X.columns.values}) as model:
        # Prior on error SD
        sigma = pm.HalfNormal("sigma", sigma=0.1)

        # coefficients
        betas = pm.Normal("beta", sigma=1, dims="predictors")
        expected_value = at.dot(X.values, betas) 

        # constraints
        con_1 = pm.math.ge(betas/stds[0] + betas/stds[1], 0.0)
        con_2 = pm.math.ge(betas[2]/stds[2], 0.0)
        con_3 = pm.math.le(betas[4]/stds[4], 0.0)
        con_4 = pm.math.ge(betas[2]/stds[2] + betas[3]/stds[3] + betas[4]/stds[4], 0.0)
        con_5 = pm.math.ge(betas[5]/stds[5], 0.0)
        con_6 = pm.math.le(betas[8]/stds[8], 0.0)
        con_7 = pm.math.ge(betas[5]/stds[5] + betas[6]/stds[6] + betas[7]/stds[7] + betas[8]/stds[8], 0.0)

        potential_1 = pm.Potential("con_1", pm.math.log(pm.math.switch(con_1, 1, 0.0)))
        potential_2 = pm.Potential("con_2", pm.math.log(pm.math.switch(con_2, 1, 0.0)))
        potential_3 = pm.Potential("con_3", pm.math.log(pm.math.switch(con_3, 1, 0.0)))
        potential_4 = pm.Potential("con_4", pm.math.log(pm.math.switch(con_4, 1, 0.0)))
        potential_5 = pm.Potential("con_5", pm.math.log(pm.math.switch(con_5, 1, 0.0)))
        potential_6 = pm.Potential("con_6", pm.math.log(pm.math.switch(con_6, 1, 0.0)))
        potential_7 = pm.Potential("con_7", pm.math.log(pm.math.switch(con_7, 1, 0.0)))

        y = pm.Normal('y', mu=expected_value, sigma=sigma, observed=y_obs.values)

        start = {"sigma": 1.0, "beta": np.zeros(len(X.columns))}
        idata = pm.sample(draws=2000, tune=1000, chains=4, step=[pm.NUTS(target_accept=0.99, max_treedepth=30)])

The regression is performed with standardized input data. The restrictions are specified for the unstandardized coefficients, which is why they need to be transformed. During execution, when strict constraints are applied (using pm.math.log(pm.math.switch(constraint, 1, 0.0))), I receive the error message:

pymc.exceptions.SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'sigma_log__': array(-2.67328751), 'beta': array([ 0.95591867,  0.33871937,  0.00713485,  0.73745527,  0.74412467,
       -0.82814385,  0.69393661,  0.62493708,  0.95804281, -0.80145671,
       -0.51079383,  0.11153434, -0.66721516])}

This error disappears when I use a low number for the false state in pm.math.switch. Does anyone know why this is happening and whether the constraints have been integrated into the model?

Welcome!

I cannot run your code because it relies on the external csv file. But I assume that the starting point[s] is[are] running afoul of your constraints. If the sampler cannot find valid points to begin its initialization, it may well give up with the error you’re seeing.

1 Like

Hard constraints like that are unlikely to play well with NUTS, even if it finds a valid initial point. It’s better if you can softly penalize them instead.

Some of those also seem avoidable altogether by using positive priors

2 Likes

ZeroSumNormal prior might be useful, combined with something like pm.Potential('positive_penalty', -(betas[4] + betas[8]) to punish draws with positive values in the positions of the beta vector you want to be negative.

I don’t think dividing by the standard deviation is relevant here, because it’s always strictly positive.

2 Likes

Thank you for all of your advices!

I have tried using the initval argument in pm.sample and pm.Normal. Both doesn’t seem to work. When I check for the inital point with model.initial_point() before pm.sample it shows me the set initial values. Once the pm.sample() starts, other values are used. What could be the problem here?

I changed some priors resulting which leaves only 3 constraints. Is there a threshold where you would say that using hard constraints is still acceptable with NUTS or should i just soft threshold the remaining 3? When soft thresholding those constraints, which small number would be sufficient to emulate a hard constraint?

I don’t understand how ZeroSumNormal can be used here. The constraints are all less or greater equal. Can you please explain the basic idea a bit further.
The constraints are given in the unstandardized formulation to ensure numerical stability in the resulting equation.

NUTS will never behave well with hard constraints if the posterior is close enough to “violating them”. By soft-contraints I actually mean a continuous (differentiable) function that increasingly penalizes the posterior as it violates the constraints more and more. The shape and strength of this penalty will be something you have to investigate as it depends on the model/data.

The typical example from ML is to penalize very large coefficients with something like sum(beta ** 2) * penalty_multiplier.

In a bayesian setting, the ideal is to define constraints as part of the prior. But that is sometimes not possible or too cumbersome, in which case a Potential is a nice tool to just throw at the problem.

1 Like

So for example con_1 = pm.math.ge(betas/stds[0] + betas/stds[1], 0.0) could be penalized with pm.Potential("con_1", pm.math.switch(con_1, 0, -((betas/stds[0] + betas/stds[1])**2*penalty_multiplier) instead of a fixed penalty as done beforehand.

Something like that, but probably the negative of the expression, as the Potential is added to the model logp. You want the logp to be lower when the constraint is violated more

1 Like

Isn’t the negative in the switch of pm.Potential("con_1", pm.math.switch(con_1, 0, -((betas/stds[0] + betas/stds[1])**2*penalty_multiplier) enough or am I missing something?

My logic with ZeroSum was that you want some of the coefficients to be negative and some to be positive. If they all sum to zero, satisfying one set of constraints (say the negative ones) would automatically push the others parameter values towards satisfying the other constraints.

1 Like

Sorry I didn’t see the minus before. That should be enough

1 Like