Error with sample_posterior_predictive: "Cannot sample from half_flat variable"

Hi all,

I am using pymc v5 installed via conda. I have a model with some improper priors (both HalfFlat and Flat), which I would like to use sample_posterior_predictive with. However, when I call sample_posterior_predictive, I get the error below, saying that these improper priors cannot be sampled from. I found this previously fixed error in pymc3 (ValueError when sampling ppc from HalfFlat prior with Poisson distribution. · Issue #3294 · pymc-devs/pymc · GitHub) that seems similar at face value.

Is this a similar error to the bug in pymc3, or is there some other fix that I should try?

Thank you!

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/compile/function/types.py:1037, in Function.__call__(self, output_subset, *args, **kwargs)
   1036 try:
-> 1037     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1038 except Exception:

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/op.py:531, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    523 @is_thunk_type
    524 def rval(
    525     p=p,
   (...)
    529     cm=node_compute_map,
    530 ):
--> 531     r = p(n, [x[0] for x in i], o)
    532     for entry in cm:

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/random/op.py:402, in RandomVariable.perform(self, node, inputs, outputs)
    400 outputs[0][0] = rng
    401 outputs[1][0] = np.asarray(
--> 402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/distributions/continuous.py:392, in HalfFlatRV.rng_fn(cls, rng, size)
    390 @classmethod
    391 def rng_fn(cls, rng, size):
--> 392     raise NotImplementedError("Cannot sample from half_flat variable")

NotImplementedError: Cannot sample from half_flat variable

During handling of the above exception, another exception occurred:

NotImplementedError                       Traceback (most recent call last)
Cell In[44], line 2
      1 with GFmodel_counts:
----> 2     pm.sample_posterior_predictive(llh, random_seed=rng)

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/sampling/forward.py:948, in sample_posterior_predictive(trace, model, var_names, sample_dims, random_seed, progressbar, progressbar_theme, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
    943 # there's only a single chain, but the index might hit it multiple times if
    944 # the number of indices is greater than the length of the trace.
    945 else:
    946     param = _trace[idx % len_trace]
--> 948 values = sampler_fn(**param)
    950 for k, v in zip(vars_, values):
    951     ppc_trace_t.insert(k.name, v, idx)

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/util.py:441, in point_wrapper.<locals>.wrapped(**kwargs)
    439 def wrapped(**kwargs):
    440     input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 441     return core_function(**input_point)

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/compile/function/types.py:1047, in Function.__call__(self, output_subset, *args, **kwargs)
   1045     if hasattr(self.vm, "thunks"):
   1046         thunk = self.vm.thunks[self.vm.position_of_error]
-> 1047     raise_with_op(
   1048         self.maker.fgraph,
   1049         node=self.vm.nodes[self.vm.position_of_error],
   1050         thunk=thunk,
   1051         storage_map=getattr(self.vm, "storage_map", None),
   1052     )
   1053 else:
   1054     # old-style linkers raise their own exceptions
   1055     raise

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/link/utils.py:526, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
    521     warnings.warn(
    522         f"{exc_type} error does not allow us to add an extra error message"
    523     )
    524     # Some exception need extra parameter in inputs. So forget the
    525     # extra long error message in that case.
--> 526 raise exc_value.with_traceback(exc_trace)

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/compile/function/types.py:1037, in Function.__call__(self, output_subset, *args, **kwargs)
   1035     t0_fn = time.perf_counter()
   1036 try:
-> 1037     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1038 except Exception:
   1039     self._restore_defaults()

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/graph/op.py:531, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    523 @is_thunk_type
    524 def rval(
    525     p=p,
   (...)
    529     cm=node_compute_map,
    530 ):
--> 531     r = p(n, [x[0] for x in i], o)
    532     for entry in cm:
    533         entry[0] = True

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pytensor/tensor/random/op.py:402, in RandomVariable.perform(self, node, inputs, outputs)
    398     rng = deepcopy(rng)
    400 outputs[0][0] = rng
    401 outputs[1][0] = np.asarray(
--> 402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

File ~/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/distributions/continuous.py:392, in HalfFlatRV.rng_fn(cls, rng, size)
    390 @classmethod
    391 def rng_fn(cls, rng, size):
--> 392     raise NotImplementedError("Cannot sample from half_flat variable")

NotImplementedError: Cannot sample from half_flat variable
Apply node that caused the error: half_flat_rv{"->()"}(RNG(<Generator(PCG64) at 0x153998396EA0>), NoneConst{None})
Toposort index: 8
Inputs types: [RandomGeneratorType, <pytensor.tensor.type_other.NoneTypeT object at 0x1539a9df5d00>]
Inputs shapes: ['No shapes', 'No shapes']
Inputs strides: ['No strides', 'No strides']
Inputs values: [Generator(PCG64) at 0x153998396EA0, None]
Outputs clients: [[output[2](half_flat_rv{"->()"}.0)], [ExpandDims{axes=[0, 1]}(sig_2v)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/IPython/core/async_helpers.py", line 128, in _pseudo_sync_runner
    coro.send(None)
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3334, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3517, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_2145468/1720895838.py", line 6, in <module>
    sig_2v = pm.HalfFlat("sig_2v")
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/distributions/distribution.py", line 505, in __new__
    rv_out = cls.dist(*args, **kwargs)
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/distributions/continuous.py", line 405, in dist
    res = super().dist([], **kwargs)
  File "/home/rl796/.conda/envs/pymc_env/lib/python3.12/site-packages/pymc/distributions/distribution.py", line 574, in dist
    rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)

HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

This comment by @ckrapu from the linked issue is correct, as is his advice on how to proceed:

The Flat and HalfFlat distributions don’t correspond to valid probability distributions because a constant density over the entire real line does not integrate to 1. As a consequence, we don’t have a valid mechanism to sample random values from them. Computationally, they’re just used to make sure that any terms with those priors don’t contribute anything to the likelihood. To achieve something close to what you desire, you might want to consider using a normal distribution with large variance.

Thanks for the response!

Is there some way to sample directly from the posterior? It seems like there was some fix for this in pymc3 (Fix for #3294 by lucianopaz · Pull Request #3297 · pymc-devs/pymc · GitHub) since it looks like the samples from the prior weren’t actually used in any computation of the posterior; only the shape and dtype. For my use case, I do really have to use these improper priors.

There exists no way to draw samples from a Flat prior. As suggested, you can replace these with super high variance normals, or super wide uniforms. But literal -infinity to infinity is not well defined.