I noticed that the trace is divided by the number of samples of the data and can’t figure out why.
Also it is confusing when plotting or interpreting the summary.
For instance, in the following coin flip code, the mean(trace[‘p’]) is scaled with respect to the number of data samples:
number_trials = 100
p = 0.4
data = st.bernoulli.rvs(p, size=number_trials)
with pm.Model() as model:
p = pm.Uniform('p', 0,1)
y = pm.Binomial('y', n = number_trials, p=p, observed=data)
trace = pm.sample(2000)
pm.summary(trace)
gives np.mean(trace[‘p’]) = 0.0043 ~ 0.4 / 100
and if I change the number of data samples, for instance to number_trials = 150, I get
np.mean(trace[‘p’]) = 0.0031 ~ 0.4 / 150
Same scaling by number of samples shows up in trace_plot, and other plots.
Feels like I’m missing something. I would expect the results based on the trace not to be scaled by the number of data samples.