I am of the impression from the bayes’ formula that, to get a posterior probability, we need the prior probability and the likelihood of the observed values.

Having said that, if we define a model “without” passing the “observed” data and then do pm.sample(), the documentation[here] says that it samples from the posterior. So, how is it so that without “observed” data, we get posterior distribution?

In that case it samples from the prior. The documentation is just explaining the most common use which is to call it on a model with observed nodes (or a Potential that corresponds to the likelihood)

One follow up question:
When we call pm.sample without observed data, the returned idata instance contains “posterior” and “sample_stats”. Given that we are sampling from the prior, would it make more sense to return an idata instance with “prior” populater, similar to what is obtained when calling sample_posterior_predictive?

In other words, should we expect the same behavior when pm.sample is queried with no observed data, as we would when calling pm.sample_prior_predictive?

I think this might just be about semantics, but when there is no observed data in the likelihood, pm.sample still is returning the posterior distribution. It’s just that because there was no observed data the posterior = prior, because that’s how Bayesian updating works. If you have a prior, and then you observe no data to update the prior, the “new” posterior must be the same as the prior. Anything else wouldn’t make sense!

In other words, should we expect the same behavior when pm.sample is queried with no observed data, as we would when calling pm.sample_prior_predictive?

Yes, but it’ll be called (I think correctly) posterior in the trace.

My take there was that we can’t really distinguish a prior from a posterior. So we just go with the most common use. pm.sample can be perfectly used for prior, posterior and posterior predictive sampling. We just don’t want to bother the user with specifying which one it is. They can change the InferenceData group easily.

Sometimes, but not always. If there are transforms that distort the prior like ordered or sumto1 in a variable, pm.sample will provide different (and correct) draws from the prior whereas prior predictive won’t. Similarly Potentials are only taken into account in pm.sample. Otherwise, yes they are equivalent.

Speaking of Potentials, the are also ambiguous as to wether they correspond to prior terms, likelihood or both. Therefore we can’t know if a model without observations but potentials corresponds to prior or posterior.

The function names although useful for beginners are a bit misleading. The real distinction is the predictive ones are doing forward/ ancestral sampling while pm.sample is doing mcmc sampling.

@ricardoV94 just to confirm my understanding, when you say “forward / ancestral sampling” and do you mean that “forward sampling” and “ancestral sampling” are the same thing? I can see from the discussion on this issue, that there has been some debate on this previously - just seeking clarification on this point.

Yes, both refer to the same idea of taking random draws from the ancestral nodes and propagating those downstream/forward to other nodes that depend on them.

Thanks @ricardoV94 for the explanation. I have a follow-up question on the same.

Sometimes, but not always. If there are transforms that distort the prior like ordered or sumto1 in a variable, pm.sample will provide different (and correct) draws from the prior whereas prior predictive won’t.

Could you help me understand how ordered and sumto1 transformation would distort the prior when using pm.sample() and not when using pm.sample_prior_predictive()? An example would help.