Square decision region model specification - initial evaluation failed

I’m trying to specify a model in which I have a roughly square decision region, inside the square is true, outside is false. My approach was to basically glue two invlogit functions together in either dimensions - one as a decision “turn on” and one as a “turn off.” I managed to make this with a theano switch:

x = np.linspace(-5,5,2000)
change1 = -2
change2 = 2
turnon_coeff = 5
turnoff_coeff = 5

dist = tt.switch(x< change1 + (change2-change1)/2, 
                 pm.invlogit(turnon_coeff*(x - change1)), 
                 1-pm.invlogit(turnoff_coeff*(x - change2))
                ).eval()

image

Now, I’d like to model this and infer the change points and the slopes of the invlogits, but I keep getting a failure for the inital evaluation of the model, and can’t seem to locate what’s misspecified.

The model looks like the following:

sigmoid_model = pm.Model()

with sigmoid_model:
    
    # Data
    loc_x = pm.Data("loc_x", df["loc_x"])
    loc_z = pm.Data("loc_z", df["loc_z"])
    outcome = pm.Data("outcome", df["isTrue"])
    
    # priors
    left_side = pm.Normal("left_boundary", -1, 0.75)
    right_side = pm.Normal("right_boundary", 1, 0.75)
    top_side = pm.Normal("top_boundary", 3.5, 1.5 )
    bottom_side = pm.Lognormal("bottom_boundary", 1.5, 1.5 ) # can't be negative

    # My only prior here is that I think it's *fairly* steep and positive, but not anything strong
    turnon_coeff_horiz = pm.Lognormal("turnon_coeff_x", 5, 5)
    turnoff_coeff_horiz = pm.Lognormal("turnoff_coeff_x", 5, 5)

    turnon_coeff_vert = pm.Lognormal("turnon_coeff_z", 5, 4)
    turnoff_coeff_vert = pm.Lognormal("turnoff_coeff_z", 5, 4)

    horiz_p = pm.Deterministic("horiz_p",
                               tt.switch(
                                   loc_x < left_side + (right_side - left_side)/2, 
                                   pm.invlogit(turnon_coeff_horiz*(loc_x - left_side)), 
                                   1-pm.invlogit(turnoff_coeff_horiz*(loc_x - right_side))
                               )
                              ) 

    vert_p = pm.Deterministic("vert_p",
                               tt.switch(
                                   loc_z < bottom_side + (top_side - bottom_side)/2, 
                                   pm.invlogit(turnon_coeff_horiz*(loc_z - bottom_side)), 
                                   1-pm.invlogit(turnoff_coeff_horiz*(loc_z - top_side))
                               )
                             )

    
    # Making the probability the sum of the probabilities, however clipping at 0.99
    # to cover random judgement mistakes inside/outside the decision region
    p_add = pm.Deterministic("p_add" , pm.math.clip(horiz_p + vert_p, 0.01, 0.99))
    
    
    yl = pm.Bernoulli('isTrue', p=p_add, observed=outcome)
    trace = pm.sample(1000, init="advi", n_init=10000)

The error it gives me is the following:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'left_boundary': array(-1.), 'right_boundary': array(1.), 'top_boundary': array(3.5), 'bottom_boundary_log__': array(1.5), 'turnon_coeff_x_log__': array(5.), 'turnoff_coeff_x_log__': array(5.), 'turnon_coeff_z_log__': array(5.), 'turnoff_coeff_z_log__': array(5.)}

Initial evaluation results:
left_boundary           -0.63
right_boundary          -0.63
top_boundary            -1.32
bottom_boundary_log__   -1.32
turnon_coeff_x_log__    -2.53
turnoff_coeff_x_log__   -2.53
turnon_coeff_z_log__    -2.31
turnoff_coeff_z_log__   -2.31
isTrue                   -inf
Name: Log-probability of test_point, dtype: float64

Any suggestions here? Something blatently obviously wrong with the model? (Or is there a less silly way to go about this problem?)

Just wanted to bump here to see if anyone had any thoughts or ideas on this, I haven’t made any headway solving this.

Have you tried a hard square curve, with just two change points and a p of 0.99 if inside and 0.01 if outside? Two switches should do it. From there you could try adding one sigmoid at a time and hopefully see what is breaking in the full model.

1 Like

Thanks, this at least gets me started. A pretty straightforward switch statement decision boundary appears to at least not crash… getting divergences, but at least that’s something to get working off of.

About the divergences, are you making sure the left cutoff is always below the right one? You can use an ordered transform to enforce that.

Good call. I tried a couple of things to no avail, following examples in the Martin text.

The first was ordered like you suggest, I was a bit less familiar with this:

breakpoint_model = pm.Model()

with breakpoint_model:
    # Data
    loc_x = pm.Data("plate_x", right_full["plate_x"])
    loc_z = pm.Data("plate_z", right_full["plate_z"])
    outcome = pm.Data("outcome", right_full["isStrike"])
    
    horiz_bounds = pm.Normal(
        "horiz_bounds",
        mu=[-1.0, 1.0],
        sd=1.0,
        shape=2,
        transform=pm.distributions.transforms.ordered        
    )
    vert_bounds = pm.Lognormal(
        "vert_bounds",
        mu=[1.5, 3.5],
        sigma=1.5,
        shape=2,
        transform=pm.distributions.transforms.ordered        
    )
    
    db = pm.Deterministic(
        "decision",
        tt.switch( 
            (loc_x > horiz_bounds[0])  & (loc_x < horiz_bounds[1]) & (loc_z > vert_bounds[0]) & (loc_z < vert_bounds[1]),
            0.97,
            0.03
        )
    )
    
    yl = pm.Bernoulli('strike', p=db, observed=outcome)
    trace = pm.sample(500, init="advi", n_init=10000)

Which gives the NaN error in initialization.

FloatingPointError: NaN occurred in optimization. 
The current approximation of RV `vert_bounds_ordered__`.ravel()[0] is NaN.

The other try was to use a pm.Potential following one of the examples in the text:

breakpoint_model = pm.Model()

with breakpoint_model:
    # Data
    loc_x = pm.Data("plate_x", right_full["plate_x"])
    loc_z = pm.Data("plate_z", right_full["plate_z"])
    outcome = pm.Data("outcome", right_full["isStrike"])
    

    left_side = pm.Normal("left_boundary", -1.0, 1)
    right_side = pm.Normal("right_boundary", 0.8, 1)
    top_side = pm.Lognormal("top_boundary", 3.5, 1.5 )
    bottom_side = pm.Lognormal("bottom_boundary", 1.5, 2 ) # can't be negative

    db = pm.Deterministic(
        "decision",
        tt.switch( 
            (loc_x > left_side)  & (loc_x < right_side) & (loc_z > bottom_side)  & (loc_z < top_side),
            0.97,
            0.03
        )
    )

    # Force boundaries to be ordered
    order_vert = pm.Potential('order_vert',
                                  tt.switch(top_side - bottom_side < 0,
                                  -np.inf, 0)
                                 )
    order_horiz = pm.Potential('order_horiz',
                                  tt.switch(right_side - left_side < 0,
                                  -np.inf, 0)
                                 )
    
    
    yl = pm.Bernoulli('strike', p=db, observed=outcome)
    trace = pm.sample(500, init="advi", n_init=10000) 

Which samples but still has divergences. I’ve only gotten a little time to play with it so I’ll do some more digging to see what I can figure out (no need to burn brainspace before I spin my wheels on it for a while, unless you’re super bored).

Also just for reference, an image of the data

image