What confuse me is the observed parameter in the likelihood function. What does it for?
There is a disconnection between the likelihood equation and pm.sample(1000). I don’t know how these two statements work together…
Your PyMC model defines the posterior probability model of P(mu, sigma | y) = P(y | mu, sigma) * P(mu, sigma) or, because mu and sigma are independent, = P(y| mu, sigma) * P(mu) * P(sigma) (the = should read as proportional). The pm.sample statement then takes samples from this posterior via the NUTS algorithm.
The observed argument is basically transforming the prior joint model of P(mu, sigma, y) into the posterior P(mu, sigma | y), or in other words, conditioning the model on the observed values of y. If you remove the observed argument, and call pm.sample you will instead obtain samples from this very prior.
Am I correct that the 1000 samples of the “pm.sampe(1000)” aren’t just any random mu and sigma, but 1000 pairs of mu & sigma combo when feed into the likelihood function the output somewhat fit the observed y data points?
No, they aren’t just random samples, they are the samples that represent a kind of weighted average between your prior and your data (via the likelihood function).