Traceplot for linear regression

Rahul_Deora · May 18, 2019, 4:50pm

In pm.traceplot(samples) what is the blue and green line? And what is the last plot of mu?
The code is:

N = 10 # only first 10 points
with pm.Model() as m_N:
    alpha = pm.Normal('alpha', mu=178, sd=100)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    mu = pm.Deterministic('mu', alpha + beta * df.weight[:N]) 
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=df.height[:N]) 
    trace_N = pm.sample(1000, tune=1000)

chain_N = trace_N[:]
pm.traceplot(chain_N);

lucianopaz · May 19, 2019, 6:25am

To better answer this, you need to remember what is stored in the trace. The trace holds separate chains of points. These points hold all of the model’s free RV and deterministic values. Some RVs are defined to be arrays with many entries, such as mu, while others are simple scalars, like alpha, beta and sigma.

The old version of traceplot plots a trace for each chain. That is why you see two lines instead of one in the plots that correspond to alpha, beta and sigma. When the trace had non scalar RVs (like mu), the old behavior of traceplot was to draw one like for each element in the RV (mu[0], mu[1], etc) and also draw separate traces for each chain.

The new version of traceplot splits the elements of array-like RVs into different axes in the figure (mu[0], mu[1], etc will be plotted in separate axes). Furthermore, you can pass combined=True to combine the results from every chain, which would then show a single line for the plots of alpha, beta and sigma.

Rahul_Deora · May 19, 2019, 10:06am

Thanks, I got some clarity. Each chain is the value of mu for each data point. When I do np.shape(pm.trace_to_dataframe(trace_N)) I get (2000, 13) so there are 2000 samples. So there are 2000 chains?

The old version of traceplot plots a trace for each chain . That is why you see two lines instead of one in the plots that correspond to alpha , beta and sigma .

I dont quite get this, shouldnt there be 2000 lines then?

When the trace had non scalar RVs (like mu ), the old behavior of traceplot was to draw one like for each element in the RV ( mu[0] , mu[1] , etc) and also draw separate traces for each chain.

Again shouldnt there be 2000 then? The last plot above has around 20.

Also np.shape(trace_N) is (1000, 1) but np.shape(pm.trace_to_dataframe(trace_N)) is (2000, 13). If I did trace_N = pm.sample(1000, tune=1000) i.e sample 1000, why is the dimension 2000?

junpenglao · May 19, 2019, 2:34pm

Depending on your setting (number of CPU etc), pm.sample calls will sample multi-chains for you, in this case, likely you have two chains, which means you have 1000*2 = 2000 total samples, which is what you see in the first number in np.shape(pm.trace_to_dataframe(trace_N)).

And since the old plotting plot one chains on top of the other, you will see 2 lines when the random variable is a scalar or a 1 element tensor.

Topic		Replies	Views
Plotting priors in traceplot Questions	2	2520	February 15, 2018
Two questions need some help Questions	0	374	August 14, 2019
How to add_values in MultiTrace for multi-dimensional values Questions	1	674	January 5, 2018
Plotting traces from time series Questions	4	725	May 4, 2020
How to display the trace of the diagonal element of a matrice RV Questions	1	359	November 11, 2019

Traceplot for linear regression

Related topics