Model not reproducing observed data?

Bayesian newbie here. I have a 4x4 data matrix consisting of proportions of success (probability of observed successes) and would like to obtain parameters alpha (4-vector for rows), beta (4-vector for columns) and c using MCMC such that the posterior distribution in data[i,j] is given by 1/(1 + exp(alpha[i] + beta[j] + c)) (sigmoid). So for (4 x 4) data I have 9 parameters. (I also have raw data for numbers of successes & failures for each cell, but I’m using data = successes/(successes+failures).)

It seems to converge okay, but the problem is when I try to use ppc the posterior gives probabilities close to 0, contrast to actual data and I’m not sure why…

data = np.array([[0.2, 0.3, 0.5, 0.6],
                 [0.6, 0.2, 0.6, 0.5],
                 [0.5, 0.6, 0.2, 0.3],
                 [0.3, 0.5, 0.6, 0.2]])

basic_model = pm.Model()
with basic_model:

    # Priors for unknown model parameters (9 parameters in total)
    a = pm.Normal('a', mu=0, sd=10, shape=(4, 1))
    b = pm.Normal('b', mu=0, sd=10, shape=(1, 4))
    offset = pm.Normal('offset', mu=0, sd=10)

    # Expected values of outcome (4 x 4)
    p = pm.Deterministic('p', sigmoid(a + b + offset))

    # Likelihood (sampling distribution) of observations (4 x 4)
    obs = pm.Bernoulli('obs', p=p, observed=data)

with basic_model:
    # Draw posterior samples
    trace = pm.sample(1000)

ppc = pm.sample_ppc(trace, samples=1000, model=basic_model)
posterior_samples = np.mean(ppc['obs'], axis=0).round(3)

def sigmoid(x):
    z = np.exp(x)
    return z / (1 + z)

posterior_samples mean gives:
[[ 0.01 0.01 0.012 0.011]
[ 0.003 0.006 0.007 0.008]
[ 0.013 0.011 0.013 0.01 ]
[ 0.007 0.007 0.008 0.015]]

which is nothing like the observed data.

You cannot feed number other than [0, 1] to the Bernoulli distribution, it probably cast data to the closest integer internally.

Thanks @junpenglao! I would have never caught that. Does that mean pm.Bernoulli cannot see data as observed proportion but rather a “single datapoint”? I find that fact a bit strange because I thought pm.Bernoulli(p, observed) would have tried to match p to “observed p”…

Is it even possible to pass in the whole data to Bernoulli? I thought about perhaps using the beta(a,b) distribution where a = success and b = failures, but I won’t be able to use sigmoid anymore, so that would not work.

Try passing your data as a (4 x 4 x N) array.

Thanks for the quick reply as always @junpenglao.
Unfortunately it did not work…
I have used L = pm.Bernoulli('L', p=p, observed=bernoulli_data_parameter)
where bernoulli_data_parameter.shape = (4, 4, 1000).

The error:

ValueError: Input dimension mis-match. (input[0].shape[2] = 1000, input[1].shape[2] = 4)

Try reshaping it into (1000, 4, 4)

1 Like

That works - thanks! I’m really sorry though the problem is still there… I had a (4 x 4) matrix of success rates and my hacky solution was to convert this into a (4 x 4) boolean “stacked up” 1000 times so that the mean of 1000 for each cell equals the success rate I had to begin with… for each cell in (4 x 4) I have different numbers of trials (not 1000 for all of them) but I’m not sure if this is a good way. It’s still not converging to reproduce the observed data although the order of magnitude is better than before. I get "The chain reached maximum tree depth", "The number of effective samples is smaller than 200/25%" warning every time. My target_accept is 0.95. If there is a way to troubleshoot this problem I’d love to know… thank you so much.

In that case, it is probably better to model it with a Binomial distribution, maybe try this:

for i in range(4):
    for j in range(4):
        pm.Binomial('obs_{}_{}'.format(i,j), n=Ntrial[i,j], p=p[i,j], observed=data_raw[i,j])

Thanks! Is there a difference between doing that and passing it through all in one go?:
pm.Binomial('obs', n=Ntrial, p=p, observed=data_raw)
I tried your way which was slower but but it looked like it worked slightly better so I was wondering if there is a difference.

I also get the same errors as before for both (max tree depth reached / number of effective samples is smaller than… for some parameters). I also get posterior values quite different to the observed success rates but I’m not sure if it’s because 9 parameters is too little to express the (4 x 4) grid

LOL you are absolutely right… this is the recommended way and you should definitely do that.

This is usually not a huge problem. You can increase the max tree depth by defining the nuts_kwarg={...} in pm.sample. Increase the number of tuning sample should helps as well.

1 Like