How to model/deal with weighted binary outcomes


I have a binary target variable that I try to predict with using a Bernoulli likelihood and a Sigmoid function for the p of the likelihood like logistic regression. I am only interested in the probability of success.
I have to say I am also quite new to Bayesian modelling and PyMC.
I would like to somehow weight the observed data points. I googled a lot and found the following not really satisfying results

  1. If I had a Normal likelihood incorporating weights into the model is rather easy by modifying the sd of the likelihood. I have a Bernoulli likelihood. Is it maybe somehow possible to “convert” the problem into one that uses a likelihood with a variance component that I could modify with the weights?
  2. Following this idea I could weight the logp values. How could I do this with PyMC3? I’m using Minibatches if this is important. I thought about wrapping the Bernoulli likelihood and modifying the logp value. Could this work? How would I get the correct index for the weights then?
  3. Upsampling the data by weights. I would prefer to not do this.
  4. Are there any other ways to weight binary outcomes, maybe some papers?


1 Like

I would go with option 2 - you can add a penalty to the model logp using pm.Potential. It should be a vector the same size as the observed. It is important to get it right if you are using minibatch - you can try setting a theano.shared or minibatch penalty to make sure it is updated each batch the same way as the observed changed.

1 Like

Hi, pm.Potential seems really handy to do the weighting. It works fine with using minibatches, as long as I use only one likelihood in my model. To be more specific the likelihoods all need to have observed data with the same row indexes.

Out of curiosity, what are the options to make it work with a model with more than one likelihood where each likelihood has a different set of observed data? Is it possible to just add the potential to one specific likelihood (although by looking into the model code I don’t think so)?
Is it possible in another way?

At the end the logp is summed together, so for multiple observed variable each with its own penalty, you can add multiple pm.Potential.

Ok, please imagine this (pseudocode) setup

data1 = [1,2,3,4,5]

data2_idx = [0,2,4]
data2 = data1[data2_idx]

weights = [-1,-1, 0, 1, 1]
weights2 = weights[data2_idx]

weights_mb = pm.Minibatch(weights, 2)
weights2_idx = pm.Minibatches(weights2, 2)

data1_mb = pm.Minibatch(data1, 2)
data2_mb = pm.Minibatch(data2, 2)

lk1 = pm.Normal('lk1', mu=..., sd=..., observed=data1_mb)
lk2 = pm.Normal('lk2', mu=..., sd=..., observed=data2_mb)

p1 = pm.Potential('weights-for-lk1', weights_mb)
p2 = pm.Potential('weights-for-lk2', weights2_mb)

I think this will go wrong, although I’m not really sure.

The problem I see is that because the minibatches for data1 and data2 are shuffled differently, the can/will end up with different original data rows. For example lk1 might see a value of [1,2] and lk2 might see a value of [1,5]. This leads to potentials of [-1, -1] and [-1, 1], which I think is not what I want. Mixing two different observations with different weights seems to me to lead to problems.

Is it that this special case of mixed observed data and weights is not model able with a Potential?

Do I need to use two different models for this case or is it doable in one?

In Minibatch you can pass a seed to make sure at each iteration it use the same index to get subset of data (that’s how you can make sure X_minibatch and Y_minibatch is the same row in a regression).
But if you are having doubt about the indexing, you can also following the docstring in minibatch to make it explicit:

To be more concrete about how we get minibatch, here is a demo

  1. create shared variable >>> shared = theano.shared(data)

  2. create random slice of size 10 >>> ridx = pm.tt_rng().uniform(size=(10,), low=0, high=data.shape[0]-1e-10).astype(‘int64’)

  3. take that slice >>> minibatch = shared[ridx]

That’s done. Next you can use this minibatch somewhere else. You can see that implementation does not require fixed shape for shared variable. Feel free to use that if needed.

I tried to do what you said here

but cannot get it to work.
Here is a simple example of what I did

def test():
    N = 1000
    N2 = int(N//2)
    # create test data sets
    data = np.ones(N)
    data[N2:] = 3.0

    with pm.Model() as m1:
        mu1 = pm.Normal('mu1', mu=0.0, sd=100.0)
        err = pm.HalfCauchy('err1', beta=5.0)
        w1 = np.zeros(N)
        w1[N2:] = -1.0e10
        p = pm.Potential('p1', shared(w1))
        l1 = pm.Normal('lk1', mu=mu1, sd=err, observed=data)
        advi = pm.ADVI()
        approx =
    return approx

approx = test()
s = approx.sample(1000)

Average Loss = 5e+12: 100%|██████████| 100000/100000 [00:43<00:00, 2300.78it/s]
Finished [100%]: Average Loss = 5e+12
mean	sd	mc_error	hpd_2.5	hpd_97.5
mu1	2.003387	0.034653	0.001194	1.935714	2.069872
err1	0.999572	0.023656	0.000643	0.956357	1.047354

mu1 seems to be not affected by the potential at all.
What am I doing wrong?

The potential is a constant - it will not have any effect on the parameters. You need to express the potential as condition on mu1 to make it has an effect on mu1

1 Like


I just wanted to add the solution I’m using now to give weights to my observed data with a Bernoulli likelihood:

import numpy as np
import pymc3 as pm
import theano.tensor as T

N = 1000
P = 0.2

data = np.zeros(N)
data[:int(N * P)] = 1.0
weights  = np.zeros(N)
weights[:int(N * P)] = 4.2

with pm.Model() as model:
    p_ = pm.Normal('p_',
    p = pm.Deterministic('p', pm.math.invlogit(p_))
    lk = pm.Bernoulli('lk', p=p, observed=data)
                 weights * T.switch(data,
                                    T.log(1.0 - p)))

    trace = pm.sample(10000)
trace['p'].mean()  # 0.5652725372506145
1 Like

Updated solution after this topic How to perform weighted inference

import numpy as np
import pymc3 as pm
import theano.tensor as T

N = 1000
P = 0.2

data = np.zeros(N)
data[:int(N * P)] = 1.0
weights  = np.ones(N)
weights[:int(N * P)] = 4.0

with pm.Model() as model:
    p_ = pm.Normal('p_',
    p = pm.Deterministic('p', pm.math.invlogit(p_))
             weights * pm.Bernoulli.dist(p=p).logp(data))

    trace = pm.sample(10000, cores=1, random_seed=1)
trace['p'].mean()  # 0.50019413

problem with this method is that you cannot sample from posterior predictive distribution…

i.e. if you wanted to perform logistic regression similar to this example:

you cannot do class weighting in order to predict new points from test set…

1 Like

Hey @tomkov , have you found any solution to this shortcoming yet? I am struggling to think of another approach to sample from posterior predictive distribution.