649 students in a Portuguese class took a survey. I am interested in estimating the probability of students failing past classes, based on final grade of the current Portuguese class. 0=no past failures, 1=student had past failures
I built my model:
with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.001, testval=0)
alpha = pm.Normal("alpha", mu=0, tau=0.001, testval=0)
p = pm.Deterministic("p", 1.0/(1. + tt.exp((beta*final_grade) + alpha)))
observed = pm.Bernoulli("bernoulli_obs", p, observed=failed_past_classes)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(120000, step=step, start=start)
burned_trace = trace[100000::2]
I got good Rhat results but I’m not confident about the traceplot results.
- Should I be concerned that the two chains in beta do not overlap?
- In iteration 7000 there is a big drop and divergence between the two chains. Is that a sign of the typical set region with high curvature?
- Can you please share an article or a link that has examples of NOT ideal traceplots with explanations of the visualizations?
The trace definitely show high autocorrelations.
Why use Metropolis? I suggest you sample with the default
trace = pm.sample(1000, tune=1000). You get better result and better diagnositics with much less samples.
Also, few suggestions of your model:
- the prior for beta and alpha is likely too wide - a sd = 10 is probably more realistic.
- using the sigmoid function from theano/pymc3.math is probably more numerically stable
Thank you @junpenglao.
I actually used Metropolis because I’m trying to learn more about it. But yes, using the automatic trace you suggested with tighter standard deviations not only made it faster but also increased the number of effective samples. I wasn’t too sure where to implement the pymc3.math.sigmoid. Would that be in the logistic function instead of tt.exp?
That said, I was hoping you could help me understand the following:
- around iteration 7000, the chains jump. What does that indicate?
- What can I do to lower the autocorrelations of alpha and beta to acceptable levels (I get similar autocorrelations with pm.sample(1000, tune=1000))?
- The reading material on pymc3 documentation doesn’t have enough on traceplot visualizations. Do you know where I can find more material?
What I meant is you can do
pm.math.sigmoid(beta*final_grade + alpha) instead of
1.0/(1. + tt.exp((beta*final_grade) + alpha))
- Hard to read into one trace, but if you are seeing these large jumps comes up regularly it usually means the proposal step size are generally too small.
- You can try to do some thinning - it helps if you do Metropolis, but not that much for NUTS
- We are working on breaking out the plotting into a separate package - https://github.com/arviz-devs/arviz. Will add more documentation and usage example there.
Thank you, Thank you @junpenglao
I appreciate the time you take to teach all of us how to become better PYMCeers or PYMCeures. What do you think sounds better?