I discovered de-facto restrictions on variables names, and I’m wondering 1) if they are intended and 2) perhaps someone will benefit from a topic about them because I couldn’t find any mention of them in the docs, here, or on github.
import numpy as np
import pymc3 as pm
debug_y = np.random.normal(loc=1, scale=1, size=200)
with pm.Model() as DebugModel:
_a__ = pm.Normal('_a__', mu=0, sigma=1)
foo = pm.Normal('foo', mu=_a__, sigma=1, observed=debug_y)
debug_trace = pm.sample()
I found several cases that fail, with the “NUTS” message indicating the names were replaced internally
-
a___
“NUTS: [a]” -
a_a__
“NUTS: [a]” -
a____
“NUTS: [a_]” -
__a__
“NUTS: [_]” -
_a__
“NUTS: []” -
_aa__
“NUTS: []”
The following somewhat similar cases don’t fail:
a__
a_a_
__a
aa__
__a_
_a__a
These results are consistent with re.sub(r"_[^_]*?__$", "", original_var_name)
happening at some point.
Although the single variable example above indicates something is wrong at the sampling stage (“No posterior samples. Unable to run convergence checks”), I first discovered something was wrong in a model with some valid and some invalid variables name, but only after the sampling stage, when arviz raised a KeyError about missing var names.
Why did I have such odd variable names?
They were originally names with function calls that patsy can handle, converted to replace brackets and commas with underscores.