Observed variable using custom function

I’m trying to implement in the “non-pedagogical learner” in section 3.1.1 of this paper. The observation itself is a list {(x1, y1), ... (xn, yn)}, we’re trying to infer a rule r, and the likelihood function is a complicated function of the two: exp(-b*Q_r(c)).

I can’t seem to figure out how to make an observed variable in pymc using a custom function that I define (Q_r). The problem is that Q_r is a function of a random variable, so it’s not instantiated until I sample proposed_regex, but I can’t make it a fully deterministic variable because I need to index into an dictionary with the value of proposed_regex (which I can’t do because it’s a RV).

Here was how I did it in pymc2, where I just had to return the log-likelihood from a function with the observed decorator (this also felt kind of wrong to me, unsure if there was a more proper way to do it).

@pm.observed(name="examples")
def examples(value=obs_corpus, r=proposed_regex):
    Q_r = sum([q_r(r, corpora_data[ex]) for ex in obs_corpus])
    return -beta*Q_r

Here’s my attempt (not runnable, but perhaps inspecting my model definition would help debug the problem):

# Observed data
obs_corpus = [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]
beta = 1

learner_model = pm.Model()
with learner_model:
    # Regex priors
    priors = np.array([math.exp(-len(r)) for r in all_hypotheses])
    priors = priors / np.sum(priors)
    proposed_regex = pm.Categorical("proposed_regex", p=priors) 
    
    def q_r(regex, ex):
        """
        Returns 1 if example is labeled incorrectly, 0 o/w.
            ex: a tuple (<example>, <teacher label>) = ("aaa", 1)
        """
        return xor(ex[1], match(all_hypotheses[regex], ex[0]))

    ## The next two lines are broken
    Q_r = pm.Deterministic("Q_r", sum([q_r(proposed_regex, corpora_data[ex]) for ex in obs_corpus])) # total number of incorrect examples
    examples = pm.Exponential("examples", 1, observed=math.exp(-beta*Q_r)) 

I would suggest you to turn the function q_r into a matrix and index to it for boolean computation. Specifically, you dont need to compare the actually string within q_r for your purpose, as long as you have the index it is sufficient. Working with matrix also allow you to avoid the for loop and easier to use theano operation.