Different values between pymc3 and stan

Hi,

I have below model

import pandas as pd
import pymc3 as pm
import arviz as arviz

myData = pd.DataFrame.from_dict({'Unnamed: 0': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15, 15: 16, 16: 17, 17: 18, 18: 19, 19: 20, 20: 21, 21: 22, 22: 23, 23: 24, 24: 25, 25: 26, 26: 27, 27: 28, 28: 29, 29: 30, 30: 31, 31: 32, 32: 33, 33: 34, 34: 35, 35: 36, 36: 37, 37: 38}, 'y': {0: 0.0079235409492941, 1: 0.0086530073429249, 2: 0.0297400780486734, 3: 0.0196358416326437, 4: 0.0023902064076204, 5: 0.0258055591736283, 6: 0.17394835142698, 7: 0.156463554455613, 8: 0.329388185725557, 9: 0.0076443508881763, 10: 0.0162081480398152, 11: 0.0, 12: 0.0015759139941696, 13: 0.420025972703085, 14: 0.0001226236519444, 15: 0.133061480234834, 16: 0.565454216154227, 17: 0.0002819734812997, 18: 0.000559715156383, 19: 0.0270686389659072, 20: 0.918300537689865, 21: 7.8262468302e-06, 22: 0.0073241434191945, 23: 0.0, 24: 0.0, 25: 0.0, 26: 0.0, 27: 0.0, 28: 0.0, 29: 0.0, 30: 0.174071274611405, 31: 0.0432109713717948, 32: 0.0544400838264943, 33: 0.0, 34: 0.0907049925221286, 35: 0.616680102647887, 36: 0.0, 37: 0.0}, 'x': {0: 23.8187587698947, 1: 15.9991138359515, 2: 33.6495930512881, 3: 28.555818797764, 4: -52.2967967248258, 5: -91.3835208788233, 6: -73.9830692708321, 7: -5.16901145289629, 8: 29.8363012310241, 9: 10.6820057903939, 10: 19.4868517164395, 11: 15.4499668436458, 12: -17.0441644773509, 13: 10.7025053739577, 14: -8.6382953428539, 15: -32.8892974839165, 16: -15.8671863161348, 17: -11.237248036145, 18: -7.37978020066205, 19: -3.33500586334862, 20: -4.02629933182873, 21: -20.2413384726948, 22: -54.9094885578775, 23: -48.041459120976, 24: -52.3125732905322, 25: -35.6269065970458, 26: -62.0296155423529, 27: -49.0825017152659, 28: -73.0574478287598, 29: -50.9409090127938, 30: -63.4650928035253, 31: -55.1263264283842, 32: -52.2841103768755, 33: -61.2275334149805, 34: -74.2175990067417, 35: -68.2961107804698, 36: -76.6834643609286, 37: -70.16769103228}})

with pm.Model() as myModel :
    beta0 = pm.Normal('intercept', 0, 1)
    beta1 = pm.Normal('x', 0, 1)
    mu = beta0 + beta1 * myData['x'].values
    pm.Bernoulli('obs', p = pm.invlogit(mu), observed = myData['y'].values)

with myModel :
    trace = pm.sample(50000, tune = 10000, step = pm.Metropolis(), random_seed = 1000)

arviz.summary(trace, round_to = 10)

This gives result as below

               mean        sd    hdi_3%   hdi_97%  mcse_mean   mcse_sd      ess_bulk      ess_tail     r_hat
intercept -2.537501  0.599667 -3.707061 -1.450243   0.004375  0.003118  18893.344191  22631.772985  1.000070
x          0.033750  0.024314 -0.007871  0.081619   0.000181  0.000133  18550.620475  20113.739639  1.000194

Now I implement this same model using pystan

import stan
import numpy as np

modelStan =  """
data {
  int<lower=0> N;
  vector[N] y;
  vector[N] x;
}
parameters {
  
  real intercept;
  real slope;
}
transformed parameters {
  vector[N] mu = inv_logit(intercept + slope * x);
}
model {
  intercept ~ std_normal();
  slope ~ std_normal();
  for (n in 1:N)
    target += log_mix(y[n], 
              bernoulli_lpmf(1 | mu[n]),
              bernoulli_lpmf(0 | mu[n]));
}
"""

dataStan = {"N" : myData.shape[0], "y" : myData["y"].values, "x": myData["x"].values}

posterior = stan.build(modelStan, data=dataStan)
fit = posterior.sample(num_chains=4, num_samples=10000)
np.mean(fit["intercept"])
np.mean(fit["slope"])

With this I am getting somewhat different result, particularly for the parameter intercept. It does not seem to be attributed just to sampling fluctuation. Is there any explanation why result between stan and pymc3 are different? I am particularly looking for some insight on how exactly pymc3 handles the response variable which is fractional in nature and modelled via Binomial+logit link.

Any pointer will be very helpful.

Those two models don’t seem equivalent. Is the STAN one a Bernoulli mixture likelihood? In that case you should use a pm.Mixture with Bernoulli components

Thanks. Can you please help me to understand how exactly pymc3 creates Linkelihood function for such type of model? How exactly can I transform this pymc3 model to stan?