Basic bayesian inference formulation

Hi, a very simple inference question for you:
I have a prior with a mean and std, and some of evidence data.
Is it the correct way to include both mean and std in one distribution like so:
prior = pm.Normal(“prior”,mu=mean, sd=std)
pm.Normal(“likelihood”, mu=prior, observed=data)

or should one use a more hierarchical model like
sd = pm.Normal(“sd”, mu=std)
mu = pm.Normal(“mu”, mu=mean, sd=sd)
pm.Normal(“likelihood”, mu=mu, observed=data)
I would very much appreciate some help!

In the first case, you’re providing a prior distribution with fixed parameters, whereas in the latter you’re providing distributions over the parameters as well, which means they can be learned and changed; though I believe the latter case should be

with pm.Model() as model:
    sd = pm.Normal(“sd”, mu=std)
    mu = pm.Normal(“mu”, mu=mean)
    prior = pm.Normal("prior", mu=mu, sd=sd)
    pm.Normal(“likelihood”, mu=prior, observed=data)

so that you’re providing both a prior distribution for the data and priors for both parameters (whereas the way you have it you’re providing a prior for the standard deviation, and a prior for the data, but not for the mean). Hope that helps

Hi, @Elenchus! Thank you for getting back to me so quickly.
I like the second way you set up the parameters, as this allows increased flexibility! However, the parameters of the first example may also change. Let say you use only a simple normal distribution as your prior:
mu = pm.Normal(“mu”, mu=mean, sd=std)
In this case the posterior mu may no longer have a mu=mean and sd=std, so that even also here, the parameters of the prior are learned and changed. How is this different from the latter case?

1 Like

Ah yes, I phrased that very poorly, sorry. So from the hyperprior page on wikipedia, “use of a hyperprior allows one to express uncertainty in a hyperparameter: taking a fixed prior is an assumption, varying a hyperparameter of the prior allows one to do sensitivity analysis on this assumption, and taking a distribution on this hyperparameter allows one to express uncertainty in this assumption”

So the thing that is being learned in the latter case that’s not in the former is the distribution of likely parameter values, or the uncertainty around those values in the population. The parameter values can change because they can be drawn from a distribution of values, rather than being fixed to the most likely value

I was reading through Rethinking 2 today, and came across this paragraph, which is an interesting analogy for the difference between the first approach and the second:

“In an apocryphal telling of Hindu cosmology, it is said that the
Earth rests on the back of a great elephant, who in turn stands on the back of a massive turtle.
When asked upon what the turtle stands, a guru is said to reply, “it’s turtles all the way down.”
Statistical models don’t contain turtles, but they do contain parameters. And parameters
support inference. Upon what do parameters themselves stand? Sometimes, in some of
the most powerful models, it’s parameters all the way down. What this means is that any
particular parameter can be usefully regarded as a placeholder for a missing model. Given
some model of how the parameter gets its value, it is simple enough to embed the new model
inside the old one. This results in a model with multiple levels of uncertainty, each feeding
into the next—a multilevel model.”

And then later: “We will be interested in multilevel models primarily because they help us deal with over-fitting”


Is there one of the two that are preferred over the other or would you say both are correct, but the latter gives the most flexibility on the posterior?

The former would be less computationally intensive - that’s unlikely to be a problem with a small model like this, but if you have a model where the number of parameters increases exponentially with data (e.g. for some types of time series models) then it becomes a problem. The latter is a more powerful model, because it can express the uncertainty in the parameters, and avoiding overfitting is good for applying your model to new data. I’d say both models are correct (insofar as models can ever be correct…); it depends on the use case which you might choose, but personally I’d lean towards the latter unless there’s a reason not to. Others with more experience than me may have more insight though