Requesting for help to understand the basic of bayesian estimation

Please review the following code:

with pm.Model() as model:
        # Prior - search space of mu, σ
        dist_μ = pm.Normal('mu', mu=0, sd=σ_μ)
        dist_σ = pm.Exponential('sigma', lam=1/5)
        likelihood = pm.Normal('y', mu=dist_μ, sd=dist_σ, observed=y)
        trace = pm.sample(1000)
        print('[mu]: ', trace.get_values('mu').mean())
        print('[signa]: ', trace.get_values('sigma').mean())
  1. Is the pm.sample(1000) draws 1000 random samples of μ and σ and produces 1000 outputs using the above likelihood equation?
  2. The target μ and σ would be the parameters of the most likely output amount the 1000 outputs?
  3. My understanding of bayesian is: p(μ|x1,…,xN) ∝ p(x1|μ) * p(x2|μ) * … * p(xN|μ) * p(μ2). How does above code logic relate to the bayes equation?

What confuse me is the observed parameter in the likelihood function. What does it for?
There is a disconnection between the likelihood equation and pm.sample(1000). I don’t know how these two statements work together…

I am not sure if I can explain this one.

Your PyMC model defines the posterior probability model of P(mu, sigma | y) = P(y | mu, sigma) * P(mu, sigma) or, because mu and sigma are independent, = P(y| mu, sigma) * P(mu) * P(sigma) (the = should read as proportional). The pm.sample statement then takes samples from this posterior via the NUTS algorithm.

The observed argument is basically transforming the prior joint model of P(mu, sigma, y) into the posterior P(mu, sigma | y), or in other words, conditioning the model on the observed values of y. If you remove the observed argument, and call pm.sample you will instead obtain samples from this very prior.


Am I correct that the 1000 samples of the “pm.sampe(1000)” aren’t just any random mu and sigma, but 1000 pairs of mu & sigma combo when feed into the likelihood function the output somewhat fit the observed y data points?

No, they aren’t just random samples, they are the samples that represent a kind of weighted average between your prior and your data (via the likelihood function).

More precisely, they are samples from the posterior distribution (you’ll have to understand what that is to fully grasp what pymc is giving you. This may help: Posterior Probability & the Posterior Distribution - Statistics How To).