I’m new to Bayesian modelling so forgive if this question is obvious. I initialize a model,
m1, and an identical model
samples = scipy.stats.norm.rvs(loc=1,sd=1,size=10000)
with pm.Model() as m1:
std = pm.Gamma("std",mu=0.5,sd=3)
output=pm.Normal("output",mu=mu,sd=std, observed = samples)
However, when I do
logp = m1.logp
logps = [logp(trace[i]) for i in range(len(trace))]
m1, and comparing against:
logp_2 = m2.logp
logps_2 = [logp_2(trace[i]) for i in range(len(trace))]
I get the same answers, even though
m1 has been trained but
m2 has not. Can someone please explain why this is?
This is an interesting question and kind of the questions that I like: exactly what is model fitting/calibrating, and how does that related to sampling.
First thing to remember is that, model logp is a function that takes input and split out output. Once you have your model defined, the logp is fixed. It takes free parameters and input and output a scaler. In this case you have two input
Now, think of a traditional sense of model fitting that gives you a single “best” value for each free parameter. But even if you do model fitting, you dont change the model logp. In that sense, I like to think of modeling as constructing a space, and our goal is to get information from this space. Some times taking one point from this multi dimension space is enough for your application, thus we do MLE to get a vector of best value. But most of the time we need more, thus where sampling comes in, which you map out the geometry (approximately) of said space.
What helps is to have a more intuitive understanding of (log)likelihood function, you might find my recent talk @pydataberlin useful: https://github.com/junpenglao/All-that-likelihood-with-PyMC3
That makes a lot more sense! Thank you @junpenglao!