I was curious about pm.Flat, so I tried it in a simple model:
import pymc as pm
import matplotlib.pyplot as plt
import numpy as np
import arviz as az
# Observed data (coin flips: 1 for heads, 0 for tails)
coin_flips_data = np.array([1, 0, 1, 1, 0, 1, 1, 1, 0, 0])
# Define the PyMC probabilistic model
with pm.Model() as coin_model:
# Define prior for the bias of the coin
flat = pm.Flat('flat') # Flat prior
# Define likelihood of the observed data
likelihood = pm.Bernoulli('likelihood', p=flat, observed=coin_flips_data)
# Perform inference
trace = pm.sample(1000, tune=1000)
# Create a trace plot using ArviZ
az.plot_trace(trace)
plt.tight_layout()
plt.show()
So it samples (with many divergences) and it even looks like it kinda learned a distribution for a latent probability, but I’m still confused as to what this prior means. I get from the documentation that zero is added in log probability space, which in terms probability (density) would mean multiplying by one. But if it was truly uniform improper prior over, say, the reals, then I don’t know how you could have stable samples since arbitrarily-large samples would be allowed.
Could someone give me the run-down of how it works and what its intended use cases look like?
This is true, and it’s why you’re not able to do forward sampling from pm.Flat. If you do pm.sample_prior_predictive on your model, you’ll get an error.
If you’re using NUTS, I’d say it has no use case. Maybe if you’re fitting a model via MAP, it would allow you recover the MLE estimates exactly. I use it for predictive modeling in cases where I want to make sure PyMC is correctly sampling from the posterior and not the prior (taking advantage of the error it rasies). But in general it should just be avoided.
pm.sample is not sampling from the prior. The posterior distribution can be well defined even with an improper prior (that doesn’t integrate to one).
In your example the likelihood restrictions immediately impose the constraint that it can’t be smaller than zero or larger than one. That’s enough to make it into a proper posterior. The divergences come from NUTS not handling this hard/abrupt constraint not from math.
There’s nothing very special about Flat, it’s a uniform density over the reals.
Besides the use cases @jessegrabowski mentioned, it can be useful to define probability models in an incremental way, a bit like STAN does, where the Flat is basically an input variable and then you pile up densities on top of it: Multiple priors on the same parameter?
I am not sure if this is the best way of achieving that purpose but I always thought of the flat prior as being quite useful for pedagogical/explanatory reasons.
If you want to demonstrate how the posterior extrapolates between a (non-flat) prior and the likelihood then the former can be sampled with pm.sample_prior_predictive but to get the latter I think you’d need to change the (non-flat) priors to pm.Flat and then use pm.sample