Dawid-Skene-like model implementation: chain contains only divergences, model misspecified

Hi everybody. I read silently the discourse for a few months, and now I’ve decided to write, because I need help with a project.
I am trying to setup a hierarchical model to infer bias and variance for a community of reviewers.
In order to be sure to build the model correctly, I’ve started creating simulated data, asking the model to infer the (known) parameters that generated such data.

  • I start with a matrix 𝐸 of size 𝑅×𝐽 with 𝑅 the number of reviewers and 𝐽 the number of reviewed jobs.
  • The element 𝐸𝑖𝑗 of matrix contains is a continous number 𝑛 (𝑛≥0) representing the score given by the reviewer 𝑖 to the job 𝑗
  • So far every job is judged by just 1 of the 𝑅 reviewers, but for the future I need the model to accept more reviews for a single job
  • I’ve created data with 𝑅=5 and 𝐽=1000

Following this implementation of Dawid-Skene model, I created 3 vectors, with 𝐽 elements each:

  • train_jobs: contains the idxs of the reviewed jobs
  • train_revs: contains the idxs of the reviewers that judged the respective job
  • train_evals: contains the score given by the reviewers to the jobs

So that, for each row element k, I have train_revs[k] = 𝑖, train_jobs[k] = 𝑗, train_evals[k] = 𝐸𝑖𝑗.

This is my model:

    with pm.Model() as rev_model:
        alpha_job     = pm.Gamma('alpha_job', alpha=1., beta=1./3., shape=J)
        beta_rev      = pm.Uniform('beta_rev', lower=-10, upper=10, shape=R)
        sigma_rev     = pm.HalfNormal('sigma_rev', sigma=1, shape=R)
        mu_obs        = alpha_job[train_jobs] + beta_rev[train_revs]
        sigma_obs     = sigma_rev[train_revs]
        obs           = pm.Normal('epts', mu=mu_obs,
        mcmc_trace = pm.sample(chains=4, draws=1000)

I’ve built the dataset so that I have about 200 jobs per reviewer, to make sure that the model could see data for each reviewer.
When I run the sampling I get the errors:

There were 839 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.2170325472246399, but should be close to 0.8. Try to increase the number of tuning steps.
There were 8 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.9237895744308975, but should be close to 0.8. Try to increase the number of tuning steps.
There were 15 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.9145047463020813, but should be close to 0.8. Try to increase the number of tuning steps.
There were 237 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.5561411316516693, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

and I get these parameters

(here I’ve added the first column to the summary df, in which you can read the real value of the “beta_rev” parameters)
I cannot understand what is wrong and why the model seems to be able to infer some parameters and not the others.

To complete my explanation I’ll paste also the code that I use to create the simulated data:

def simulate_data(N_reviewers=10, N_jobs=100, rev_per_job=3, seed=14, sigma_noise=1.):
    if seed != None:
    # sistematic error of reviewers
    R_biases = np.array([np.random.uniform(-5, 5, N_reviewers)]).T
    # random error of 
    J_real   = np.random.gamma(shape=1, scale=3., size=N_jobs)
    #J_real   = np.random.lognormal(sigma=1.0, size=N_jobs)
    # biased judgement of jobs EPTs
    EPTS     = R_biases + J_real
    # keep only rev_per_job reviews for each job
    for j in range(N_jobs):
        reviews = np.random.choice(range(N_reviewers), replace=False, size=rev_per_job)
        remove = set(range(N_reviewers)) - set(reviews)
        EPTS[list(remove), j] = np.nan
    # biased and noised values of jobs EPTs
    EPTS += np.random.normal(loc=0., scale=sigma_noise, size=EPTS.shape)
    # remove negative EPTs
    EPTS[np.where(EPTS < 0)] = 0
    return R_biases, J_real, EPTS

# create E matrix (EPTS) and beta_rev values (biases)

R_biases, J_real, EPTS = simulate_data(rev_per_job=rev_per_job, N_reviewers=N_reviewers,
                                       N_jobs=N_jobs, seed=14, sigma_noise=1.)

# create mask to make the model see just one review
mask = np.array([False]*len(all_reviewers))
for i in range(0, len(all_reviewers), 2):
    rnd = np.random.choice([0, 1])
    mask[i + rnd] = True

# create the 3 vectors
train_revs  = all_reviewers[mask]
train_jobs  = all_jobs[mask]
train_evals = all_evaluations[mask]
test_revs  = all_reviewers[~mask]
test_jobs  = all_jobs[~mask]
test_evals = all_evaluations[~mask]

Here I’ve created the matrix 𝐸 with 2 reviews per job, but then I’ve masked one of the 2 reviews.

Thanks for your help!

Hi Marco,
And congrats on your first question then :slight_smile:
I think there are several things you can try to modify, that could help refine the model and hence help the sampler:

  1. Choose a better prior than Uniform for beta_rev. Uniforms are pretty much always discouraged, unless you have really strong reasons for using them. Here are recommendations for prior choice. Here, it seems that a Normal would be a better choice.
  2. I don’t know what prior info you have about alpha_job parameters, but note that they are constrained to be positive here, because of the Gamma prior.
  3. Can the scores given by the reviewers to the jobs be negative? Because here they can, since your likelihood is a Normal. If they can’t be negative, you should probably change your likelihood.
  4. I think having a hierarchical structure on sigma_obs is very ambitious. It’s very likeliy the sigma_rev are unidentifiable. I would try something easier: sigma_obs = pm.HalfNormal('sigma_obs', sigma=1), or an Exponential.
  5. Note that your model is not hierarchical yet: for that, you’d need hyperpriors on alpha_job and beta_rev. But this will be harder to fit, so I’d advise sticking to a simple version like yours first. Once that samples, add the hierarchical structure.

Hope this helps :vulcan_salute:

Hello Alex,
thanks for your reply, very appreciated! I will try to answer to your suggestion, following your bullet points:

  1. I’ve chosen the Uniform prior for beta_rev because I’ve generated the biases of reviewers using exactly a sampling by Uniform distribution (as you can see in the simulate_data function). I’ve used also a Normal prior, anyway (despite the data generation method), but still doesn’t converge

  2. For my model I am supposing that there exist a “real value” (alpha_job) for the job that you can think about as "the score given to that job by an ideal, unbiased reviewer. The score is constrained to be positive, or zero, but not negative, so yes, you noticed right

  3. For the same reason of point 2., the answer is no, the scores cannot be negative. You are right, the model samples from a Normal and there are no constraints, so this is an error that I have to fix, thanks! Indeed, when I generate the data, I need to make set to 0 all the negative score with EPTS[np.where(EPTS < 0)] = 0, any suggestion about how to make obs bounded? Because I could use an HalfNormal instead of the Normal, but the problem would be that mu_obs could be negative, for the model

  4. Here I have a sigma_rev for every reviewer, because it is supposed to be the “random error” of the reviewer. If you are arguing that maybe it would be better to use a single “sigma” that affects every score, maybe it could be an option, but it is a different situation modelling, I think. I’ll try if it fixes the problem, anyway

  5. Yes, the model is built to be more complicated in the future. I used to have an hyperprior on alpha_job, but I removed it in order to make the model work

Maybe it cannot reach convergence because of the high number of jobs for which the model must infer the alpha_job parameter?

I’ll try to fix the boundary of obs and see what will happen using just one sigma_obs, for all the reviewers, then I’ll update you.

1 Like

Hi everyone again,
I am writing because I am trying to figure out how to handle point 3 of my previous reply to @AlexAndorra suggestion:
I would like to write down a proper Likelihood taking into account that my data is truncated (to be honest, I think that “truncation” is not the correct word here because all the obs < 0 are set zero).

I am modelling the observations with a Normal Likelihood, but, as Alex made me notice, my observations can’t be negative.
I’ve checked the solution for censored data and for truncated data but I think that my situation is different, given that all my negative observations are not truncated, but they are set to zero so that the histogram of my observation shows a high obs=0 bin.

How can I make the model guess which obs=0 is really 0 and which one is more likely to be negative, imputed to 0? Obviously I’m referring to the model that is in my first post here.
Thank you!

Hi Marco,
I think you can look into zero-inflated processes for this: that’ll allow you to model the probability for an observation to be a true zero or to be a zero coming from the inflation process (here, the imputation of negative values to 0). Chapter 12 of Statistical Rethinking 2 has examples about that.

But as your positive results are continuous, this may be more difficult (a zero-inflated Gamma isn’t really conveivable, to the best of my knowledge) so you may have to go with a mixture model here, to give you more flexibility :thinking:

1 Like