Is the mode of a parameter in a model with only pm.Flat priors a maximimum likelihood estimate?

Introduction

This post is a follow-up question to What does pm.Flat really do?.

The following example exemplifies part of ricardo’s response to my previous post

The divergences come from NUTS not handling this hard/abrupt constraint not from math.

where I have chosen a distribution which has support over the real numbers. The divergences that occurred in the previous example where p \in (0,1) do not occur for \mu \in \mathbb{R}.

import pymc as pm
import matplotlib.pyplot as plt
import numpy as np
import arviz as az

# Generate some observed data from a normal distribution
np.random.seed(42)  # for reproducibility
observed_data = np.random.normal(loc=5.0, scale=2.0, size=100)

# Define the PyMC probabilistic model
with pm.Model() as normal_model:
    # Define a flat prior for the mean
    flat_prior = pm.Flat('flat_prior')
    
    # Define the likelihood with the flat prior as the mean
    likelihood = pm.Normal('likelihood', mu=flat_prior, sigma=2.0, observed=observed_data)
    
    # Perform inference
    trace = pm.sample(1000, tune=1000, return_inferencedata=True)

# Create a trace plot using ArviZ
az.plot_trace(trace)
plt.tight_layout()
plt.show()

I guess I am looking at the posterior distribution for the mean parameter.

Since the mean was the only parameter I put a prior on, and it was the flat prior, does that have some explicit connection to maximum likelihood? My guess is that the expectation of the posterior (in this example) would be similar to the maximum likelihood estimate. Is that correct?

Question

More broadly, if I have a model with only Flat or uniform priors, will the mode (if there is a single mode) of the posterior I sample be a maximum likelihood estimate (MLE)?

Why?

The main use case for me doing this would be to show if/how a prior changed the inference from what we would have gotten had we used MLE (without changing over from PyMC!). For example, I tried to implement gamma mixtures without PyMC and found it was difficult to get good behaviour. Although happy to admit that I am not intimately familiar with building mixtures from lower-level optimizers…

I think it should be the same as the MLE, although to find the mode you have to rely on some kernel density interpolation since pm.sample is giving you back a histogram of draws.

pm.find_MAP should certainly give you the same as MLE since it is a maximization routine.

1 Like

The expectation is a different thing, that would be the mean of the posterior and will not match the mode/ MLE unless you have a symmetric/non-skewed (whatever properties make mean==mode) posterior distribution.

1 Like

100% agree. I’m assuming for this example that the expectation and mode align because of using a normal distribution where the underlying sample is also normal. Is that not correct for this example?

Thanks for this.

I changed the example over to pm.find_MAP:

import pymc as pm
import matplotlib.pyplot as plt
import numpy as np
import arviz as az

# Generate some observed data from a normal distribution
np.random.seed(42)  # for reproducibility
observed_data = np.random.normal(loc=5.0, scale=2.0, size=100)

# Define the PyMC probabilistic model
with pm.Model() as normal_model:
    # Define a flat prior for the mean
    flat_prior = pm.Flat('flat_prior')
    
    # Define the likelihood with the flat prior as the mean
    likelihood = pm.Normal('likelihood', mu=flat_prior, sigma=2.0, observed=observed_data)
    
    # Find the MAP estimate
    map_estimate = pm.find_MAP()

# Print the MAP estimate
print("MAP estimate:", map_estimate)

The result was:

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

MAP ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:13 logp = -489.11, ||grad|| = 119.81
MAP estimate: {'flat_prior': array(4.79230697)}

That seems like a pretty reasonable estimate given that the true parameter is 5.

It is a little silly that I named the parameter itself “flat prior”, but hopefully you get my drift.

1 Like