Introduction
This post is a follow-up question to What does pm.Flat really do?.
The following example exemplifies part of ricardo’s response to my previous post
The divergences come from NUTS not handling this hard/abrupt constraint not from math.
where I have chosen a distribution which has support over the real numbers. The divergences that occurred in the previous example where p \in (0,1) do not occur for \mu \in \mathbb{R}.
import pymc as pm
import matplotlib.pyplot as plt
import numpy as np
import arviz as az
# Generate some observed data from a normal distribution
np.random.seed(42) # for reproducibility
observed_data = np.random.normal(loc=5.0, scale=2.0, size=100)
# Define the PyMC probabilistic model
with pm.Model() as normal_model:
# Define a flat prior for the mean
flat_prior = pm.Flat('flat_prior')
# Define the likelihood with the flat prior as the mean
likelihood = pm.Normal('likelihood', mu=flat_prior, sigma=2.0, observed=observed_data)
# Perform inference
trace = pm.sample(1000, tune=1000, return_inferencedata=True)
# Create a trace plot using ArviZ
az.plot_trace(trace)
plt.tight_layout()
plt.show()
I guess I am looking at the posterior distribution for the mean parameter.
Since the mean was the only parameter I put a prior on, and it was the flat prior, does that have some explicit connection to maximum likelihood? My guess is that the expectation of the posterior (in this example) would be similar to the maximum likelihood estimate. Is that correct?
Question
More broadly, if I have a model with only Flat or uniform priors, will the mode (if there is a single mode) of the posterior I sample be a maximum likelihood estimate (MLE)?
Why?
The main use case for me doing this would be to show if/how a prior changed the inference from what we would have gotten had we used MLE (without changing over from PyMC!). For example, I tried to implement gamma mixtures without PyMC and found it was difficult to get good behaviour. Although happy to admit that I am not intimately familiar with building mixtures from lower-level optimizers…