Mixture model applied to measurements of different quantities

Hello there,
I am trying to build correctly my pymc3 model, but I need your help.
Here I have a set of jobs j to which I can associate a real quality-score t_j. Then, I have several reviewers r that give me an estimate e_{jr} of the quality-score t_j. Let’s say that I could have both multiple estimates for a job j from different reviewers and multiple estimates from reviewer r for different jobs. So, my observed values e_{jr} can be represented in a matrix E \in \mathbf{R}^{J \times R} where J and R are the total number of jobs and the total number of reviewers respectively. Of course, I’ll have some None among the elements of the matrix E (because not each job is judged by each reviewer).
I followed this implementation so that finally I have 3 vectors, each of them with dimension 1 \times JR^j (with R^j the number of actual reviews per job), where I have a single evaluation for every vector component:

  • revs - vector of reviewers indexes (integers): the reviewer who made the evaluation
  • jobs - vector of jobs indexes (integers): the job to which the evaluation relates
  • evals - vector of evaluations (floats): the quality estimate

So that the i^{th} component of evals[i] is the estimate made by the reviewer revs[i] on the quality-score of the job jobs[i].

My assumptions are that every reviewer has a systematic additive error \beta_r and a random error \sigma_r and I am modelling the evaluations like this:

\begin{aligned} t_j \sim & \Gamma\left(1; \frac{1}{3}\right) \\ \beta_r \sim & \mathcal{N}(0; 10) \\ \sigma_r \sim & |\mathcal{N}(0; 1)| \\ e_{jr} \sim & \mathcal{N}(t_j + \beta_r; \sigma_r) \end{aligned}

technically the model in pymc3 is the following

with pm.Model() as rev_model:
    t               = pm.Gamma('t', alpha=1., beta=1./3., shape=N_jobs)
    beta_rev        = pm.Normal('beta_rev', mu=0, sigma=10.,  shape=N_reviewers)
    sigma_rev       = pm.HalfNormal('sigma_rev', sigma=1, shape=N_reviewers)
    mu_evals        = pm.Deterministic('mu_evals', t[jobs] + beta_rev[revs])
    sigma_evals     = sigma_rev[revs]
    evals           = pm.Normal('evals', mu=mu_evals, sigma=sigma_evals,
                                observed=obs_evals)

as you can see, here t is a vector with length J (N_jobs), and beta_rev and sigma_rev are vectors with length R (N_revs), while both mu_evals and sigma_evals are vectors with length JR^j (len(obs_evals)).

Here my problems begin.
I would like to introduce a different likelihood for evals that are exactly 0. That means that I’d like to be able to say that if an evaluation is exactly 0, I’d like my model to be able to extract that value from a distribution that has all the probability mass on e_{jr}=0, regardless of the reviewer. I thought I could implement this possibility with a mixture model, but I cannot figure out how to do it.
To clarify what I want to achieve, this is what I am trying to model

\begin{aligned} t_j \sim & \Gamma\left(1; \frac{1}{3}\right) \\ \beta_r \sim & \mathcal{N}(0; 10) \\ \sigma_r \sim & |\mathcal{N}(0; 1)| \\ w_i \sim & \mathcal{B}(\alpha, \beta)\\ p(e_{jr}) = & w_1 p(e_{jr}\neq 0 \mid t_j, \beta_r, \sigma_r) + w_2 p(e_{jr} = 0) \end{aligned}
  1. Any suggestion about how to implement it? I find hard to figure out how to modify my likelihood, that right now is a normal with mu that is a vector
  2. Any suggestiin about what to use for p(e_{jr} = 0)?

Thanks to everyone!

Nobody can help? :sweat_smile:

Doesn’t seem like you need a mixture, since there is no ambiguity in your model as to what data comes from each component. You could model the non zeroes with the normal as before and model the proportion of zeroes separately with a binomial likelihood.