Thanks for your reply - you’ve actually gone into a lot of depth and this is really helpful.
I’ve noticed the sum-to-zero constraint in other models and I’m not fully sure why it’s used. I don’t think it would help in my particular case, so I omitted it. In my case, I’m trying to figure out the probability that an order will develop a fault given a partially observable process.
Basically I’m making the assumption that… if an order passes through 3 machines (stage 1, stage 2 and stage 3) then the overall probability of “success” for that order (i.e. no fault) is p_{succ}(s_{1}) \times p_{succ}(s_{2}) \times p_{succ}(s_{3}).
My model has been tweaked and given some promising results initially but I’m struggling with parameterising my priors (Gelman-Rubin statistic looks good but BFMI is typically hovering between 0.4 and 0.2 so it’s fairly low).
I’m wondering if I can bring the BFMI up by tightening up the priors and sampling for longer…
Given that this model has shown some promising results (can’t share as commercially sensitive), do you think I’d still need the sum-to-zero constraint?
with pm.Model() as model:
# fairly wide hyperpriors, half-Cauchy for long tail
mu = pm.Normal('mu', mu=df[target].mean(), sd=5)
sigma = pm.HalfCauchy('sig', beta=3)
# define model and priors
dists = {}
probs = {}
thetas = []
for i, key in enumerate(encoded_categorical):
if key != target:
dists[key] = pm.Normal(
'{}_logit'.format(key),
mu=mu, sd=sigma,
shape=df[key].nunique()
)
probs[key] = pm.Deterministic(key, pm.invlogit(dists[key]))
thetas.append(pm.invlogit(dists[key][df[key].values]))
# product of individual success probabilities
theta = T.prod(T.stack(thetas, axis=0), axis=0)
fault = pm.Bernoulli(target, p=theta, observed=df[target].values)