Trace scaled by number of data samples

alexisperrier · February 22, 2020, 7:01am

I noticed that the trace is divided by the number of samples of the data and can’t figure out why.
Also it is confusing when plotting or interpreting the summary.

For instance, in the following coin flip code, the mean(trace[‘p’]) is scaled with respect to the number of data samples:

number_trials = 100
p       = 0.4
data    = st.bernoulli.rvs(p, size=number_trials)

with pm.Model() as model:
    p       = pm.Uniform('p', 0,1)
    y       = pm.Binomial('y', n = number_trials, p=p, observed=data)
    trace   = pm.sample(2000)
pm.summary(trace)

gives np.mean(trace[‘p’]) = 0.0043 ~ 0.4 / 100

and if I change the number of data samples, for instance to number_trials = 150, I get

np.mean(trace[‘p’]) = 0.0031 ~ 0.4 / 150

Same scaling by number of samples shows up in trace_plot, and other plots.

Feels like I’m missing something. I would expect the results based on the trace not to be scaled by the number of data samples.

nkaimcaudle · February 23, 2020, 1:35am

In your current setup you are running 100 independant Binomial(100, p) trials. The observed data is either 0 or 1 out of 100, so p is going to be 0.4/100.

I guess you want one of these formulations:

with pm.Model() as model:
    p       = pm.Uniform('p', 0, 1)
    y       = pm.Bernoulli('y', p=p, observed=data)
    trace   = pm.sample(2000)
pm.summary(trace)


with pm.Model() as model:
    p       = pm.Uniform('p', 0, 1)
    y       = pm.Binomial('y', n = number_trials, p=p, observed=data.sum())
    trace   = pm.sample(2000)
pm.summary(trace)


with pm.Model() as model:
    p       = pm.Uniform('p', 0, 1)
    y       = pm.Binomial('y', n = np.ones_like(data, dtype=np.int), p=p, observed=data)
    trace   = pm.sample(2000)
pm.summary(trace)

the first and third are equivalent, they run 100 Binomial(1, p) trials
the second runs a single Binomal(100, p) trial

alexisperrier · February 24, 2020, 7:19am

Thank you very much for your answer. That makes total sense.

Topic		Replies	Views
Shape of Trace: number of columns depends on observations Questions	1	514	October 30, 2019
PyMC3 - Differences in ways observations are passed to model -> difference in results? Questions	2	2172	October 26, 2017
Traceplot for linear regression Questions	3	3082	May 19, 2019
The 95% interval of PyMC3 trace doesn't cove the real values Questions	8	1857	July 21, 2017
To retrieve posterior samples returned by pm.sample Questions	1	408	May 6, 2020

Trace scaled by number of data samples

Related topics