pm.sample
is not sampling from the prior. The posterior distribution can be well defined even with an improper prior (that doesn’t integrate to one).
In your example the likelihood restrictions immediately impose the constraint that it can’t be smaller than zero or larger than one. That’s enough to make it into a proper posterior. The divergences come from NUTS not handling this hard/abrupt constraint not from math.
There’s nothing very special about Flat, it’s a uniform density over the reals.
Besides the use cases @jessegrabowski mentioned, it can be useful to define probability models in an incremental way, a bit like STAN does, where the Flat is basically an input variable and then you pile up densities on top of it: Multiple priors on the same parameter?