Traceplot for linear regression

In pm.traceplot(samples) what is the blue and green line? And what is the last plot of mu?
The code is:

N = 10 # only first 10 points
with pm.Model() as m_N:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = pm.Deterministic('mu', alpha + beta * df.weight[:N]) 
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=df.height[:N]) 
    trace_N = pm.sample(1000, tune=1000)

chain_N = trace_N[:]
pm.traceplot(chain_N);

To better answer this, you need to remember what is stored in the trace. The trace holds separate chains of points. These points hold all of the model’s free RV and deterministic values. Some RVs are defined to be arrays with many entries, such as mu, while others are simple scalars, like alpha, beta and sigma.

The old version of traceplot plots a trace for each chain. That is why you see two lines instead of one in the plots that correspond to alpha, beta and sigma. When the trace had non scalar RVs (like mu), the old behavior of traceplot was to draw one like for each element in the RV (mu[0], mu[1], etc) and also draw separate traces for each chain.

The new version of traceplot splits the elements of array-like RVs into different axes in the figure (mu[0], mu[1], etc will be plotted in separate axes). Furthermore, you can pass combined=True to combine the results from every chain, which would then show a single line for the plots of alpha, beta and sigma.

1 Like

Thanks, I got some clarity. Each chain is the value of mu for each data point. When I do np.shape(pm.trace_to_dataframe(trace_N)) I get (2000, 13) so there are 2000 samples. So there are 2000 chains?

The old version of traceplot plots a trace for each chain . That is why you see two lines instead of one in the plots that correspond to alpha , beta and sigma .

I dont quite get this, shouldnt there be 2000 lines then?

When the trace had non scalar RVs (like mu ), the old behavior of traceplot was to draw one like for each element in the RV ( mu[0] , mu[1] , etc) and also draw separate traces for each chain.

Again shouldnt there be 2000 then? The last plot above has around 20.

Also np.shape(trace_N) is (1000, 1) but np.shape(pm.trace_to_dataframe(trace_N)) is (2000, 13). If I did trace_N = pm.sample(1000, tune=1000) i.e sample 1000, why is the dimension 2000?

Depending on your setting (number of CPU etc), pm.sample calls will sample multi-chains for you, in this case, likely you have two chains, which means you have 1000*2 = 2000 total samples, which is what you see in the first number in np.shape(pm.trace_to_dataframe(trace_N)).

And since the old plotting plot one chains on top of the other, you will see 2 lines when the random variable is a scalar or a 1 element tensor.

1 Like