Porting stan oxen example to pymc3

thezedd · April 28, 2020, 7:43pm

Hello! I’m learning pymc3 and probablisitic programming (many thanks for the excellent docs and videos, esp Junpeng Lao’s work)

I came across this Stan code example and wanted to reimplement it in pymc3
McElreath oxen blog example

Short version of problem description:

Kids like to drink tea and are allowed to have an evening tea if they’ve stabled the family oxen at the end of the day.
Each evening, find which children have properly stabled their oxen. For many houses, you can see whether or not the ox is stabled. But other houses have enclosed stables, so you cannot observe the ox
Some kids drink tea even if ox is not stabled

Want to find Pr(ox-not-stabled | drinking tea)
priors are Beta(2,2) for:
p_drink # Pr(drink-tea|ox stabed)
p_cheat # Pr(drink-tea|ox not stabled)
p_stabled # Pr(ox stabled)

tea_i ~ Bernoulli ( pi_i )
pi_i = s_i * p_drink + (1-s_i) *p_cheat
s_i ~ Bernoulli(p_stabled) # observe whether ox is stabled or not, some missing data

I’m having issues handling the partially observed discrete variable ‘s_obs’ (whether the ox-status is observed)
My results don’t match the blog’s. I suspect the issue is with the (*****) section below.

What’s the appropriate way to perform the marginalization in pymc3?

This is my implementation

# %% 
N_kids = 51

# use same fake data as blog  [0, 1, 2]  whether ox is stabled (1) or not (0) or not observed (2)
s = np.array([1,1,1,0,1,0,0,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,0,1,0,1,1,0,1,1,1,1,1])
s_obs = np.array([1,1,2,2,2,0,0,1,1,2,2,2,2,1,2,1,1,2,2,0,0,2,1,2,2,1,2,1,2,2,1,1,1,1,2,1,0,1,2,1,2,1,2,2,1,0,1,1,1,1,1])

s_not_obs_idx = np.argwhere(s_obs==2)
s_obs_idx = np.argwhere(s_obs!=2)

tea_obs = np.array([1,1,1,0,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1])


with pm.Model() as model_oxen:
    # priors
    p_cheat = pm.Beta('p_cheat', 2,2) # drink tea if ox not stabled
    p_drink = pm.Beta('p_drink', 2,2) # drink tea if ox stabled
    p_stabled =  pm.Beta('p_stabled', 2,2) # kid stabled ox

    for idx in np.arange(N_kids):        
        if s_obs[idx]==2:
            # ox status not observed (*****)
            pr_tea_ox = pm.Bernoulli('pr_tea_ox_{0:03d}'.format(idx), p_drink, observed=tea_obs[idx])
            pr_tea_noox = pm.Bernoulli('pr_tea_noox_{0:03d}'.format(idx), p_cheat,observed=tea_obs[idx])

            _likelihood_s_not_obs = p_stabled*pr_tea_ox + (1-p_stabled)*pr_tea_noox
            likelihood_s_not_obs = pm.Potential('likelihood_s_not_obs_{0:03d}'.format(idx), _likelihood_s_not_obs)
            # (this also does not work)  likelihood_s_not_obs = pm.Deterministic('likelihood_s_not_obs_{0:03d}'.format(idx), _likelihood_s_not_obs)
        else:
            # ox status observed
            _pi = pm.Deterministic('pi_{0:03d}'.format(idx), s_obs[idx]*p_drink+(1-s_obs[idx])*p_cheat)
            likelihood_tea = pm.Bernoulli('likelihood_tea_{0:03d}'.format(idx), _pi, observed=tea_obs[idx])
            likelihood_s_obs = pm.Bernoulli('likelihood_s_obs_{0:03d}'.format(idx), p_stabled, observed=s_obs[idx])

# %%
with model_oxen:
    trace = pm.sample(1000, init=None, tune=500)
pm.summary(trace, var_names=['p_cheat', 'p_drink', 'p_stabled'])

This is the stan model from the blog:

model{
  // priors
  p_cheat ~ beta(2,2);
  p_drink ~ beta(2,2);
  sigma ~ beta(2,2);

  // probability of tea
  for ( i in 1:N_children ) {
    if ( s[i] == -1 ) {
      // ox unobserved
      // log_mix(): log( sigma*bernoulli(tea[i]|p_drink) + (1-sigma)*bernoulli(tea[i]|p_cheat) )
      target += log_mix( 
                  sigma , 
                  bernoulli_lpmf( tea[i] | p_drink ) , 
                  bernoulli_lpmf( tea[i] | p_cheat ) );
    } else {
      // ox observed
      tea[i] ~ bernoulli( s[i]*p_drink + (1-s[i])*p_cheat );
      s[i] ~ bernoulli( sigma );
    }
  }//i
}

thezedd · April 29, 2020, 12:26am

[update]
using logsumexp() seems to work OK

Q: is there a better way to do the same thing?

with pm.Model() as model_oxen:
    # priors
    p_cheat = pm.Beta('p_cheat', 2,2) # prob drink tea if ox not stabled
    p_drink = pm.Beta('p_drink', 2,2) # prob drink tea if ox stabled
    p_stabled =  pm.Beta('p_stabled', 2,2) # prob kid stabled ox
    
    for idx in np.arange(N_kids):        
        if s_obs[idx]==2:
            # ox status not observed            
            pr_tea_ox = pm.Bernoulli.dist(p_drink).logp(tea_obs[idx])
            pr_tea_noox = pm.Bernoulli.dist(p_cheat).logp(tea_obs[idx])

            a = pm.math.log(p_stabled)+pr_tea_ox
            b = pm.math.log(1-p_stabled)+pr_tea_noox
            _likelihood_s_not_obs = pm.math.logsumexp([a, b])
            likelihood_s_not_obs = pm.Potential('likelihood_s_not_obs_{0:03d}'.format(idx), _likelihood_s_not_obs)
        else:
            # ox status observed
            _pi = pm.Deterministic('pi_{0:03d}'.format(idx), s_obs[idx]*p_drink+(1-s_obs[idx])*p_cheat)
            likelihood_tea = pm.Bernoulli('likelihood_tea_{0:03d}'.format(idx), _pi, observed=tea_obs[idx])
            likelihood_s_obs = pm.Bernoulli('likelihood_s_obs_{0:03d}'.format(idx), p_stabled, observed=s_obs[idx])

junpenglao · April 29, 2020, 7:46am

You should use pm.Mixture for a direct translation of log_mix in Stan

thezedd · April 30, 2020, 6:44pm

Thanks Junpeng,
I had some trouble with casting to theano tensors, but the new model below works OK

Q: are the two models identical? I’m getting more accurate results with model_oxen2 (tighter HPDI, mean is closer to ideal value)

with pm.Model() as model_oxen2:
    # priors
    p_cheat = pm.Beta('p_cheat', 2, 2) # prob drink tea if ox not stabled
    p_drink = pm.Beta('p_drink', 2, 2) # prob drink tea if ox stabled
    p_stabled =  pm.Beta('p_stabled', 2, 2) # prob kid stabled ox

    # ox status not observed, marginalize out ox / no-ox
    pr_tea_ox = pm.Bernoulli.dist(p=p_drink)
    pr_tea_noox = pm.Bernoulli.dist(p=p_cheat)

    _w = T.stack([p_stabled, 1 - p_stabled])
    _comp_dists = [pr_tea_ox, pr_tea_noox]

    likelihood_s_not_obs = pm.Mixture('likelihood_s_not_obs', _w, comp_dists = _comp_dists, observed=tea_obs[s_not_obs_idx])

    # ox status observed
    _pi = pm.Deterministic('pi', s_obs[s_obs_idx]*p_drink+(1-s_obs[s_obs_idx])*p_cheat)
    likelihood_tea = pm.Bernoulli('likelihood_tea', _pi, observed=tea_obs[s_obs_idx])
    likelihood_s_obs = pm.Bernoulli('likelihood_s_obs', p_stabled, observed=s_obs[s_obs_idx])

Topic		Replies	Views
Did anybody replicate the ADVI examples in the "Automatic Variational Inference in Stan" paper? Questions	4	1772	April 9, 2024
Implementing stan model in pymc3 Questions	4	860	June 22, 2021
Translating a Stan model to the most performant PyMC3 form Questions	1	749	October 27, 2021
Building PyMC model programatically version agnostic modeling	4	560	November 23, 2022
Adapting Hawkes process Stan code to PyMC version agnostic modeling , pytensor	14	668	July 3, 2023

Porting stan oxen example to pymc3

Related topics