While writing a pull request that adds dimension names as alternative to numerical shapes, I wondered what restriction we want to enforce for those names.
The relevant PR is here.
I also noticed that currently we do not have any restrictions for the names of variables. This is perfectly legal code:
with pm.Model():
pm.Normal('')
pm.Normal('\n/ ')
pm.Normal(' ')
# let's hope nobody has `text.usetex` enabled in matplotlib :-)
pm.Normal(r'\immediate\write18{/usr/bin/rm -rf /}')
trace = pm.sample()
I think this raises three questions:
- Can we add minimal restrictions for variable names without breaking user code?
- Should we have similar or stronger restrictions for dimension names when we merge that PR? Here we have no backward compatibility issues.
- What restrictions should we have in pymc4?
About 1:
We could limit variable names in a similar way as python identifiers. But since quite a few people seem to use spaces in variable names, we should at least allow spaces within the name. A regular expression like that might work well: r"^[^\d\W][\w ]*(?<=\w)\Z"
. You can test what it does with var = re.compile(r"^[^\d\W][\w ]*(?<=\w)\Z", re.UNICODE)
and var.match("some_var_name") is not None
. That would mean: No spaces or digits as first character. No special characters like \ / # $ \n \t
no whitespace at the end of the varname. We could start printing warnings for variable names that do not conform to that format, and in a later release throw an exception.
About 2:
We could just have the same restrictions as for variable names, but as this is new we could also disallow spaces entirely. That would mean that dimension names would follow the exact same rules as python identifiers. @rpgoldman pointed out that in some cases column names of some dataset will be obvious choices (or even half-automatically imported) for dimension names, and they often contain spaces. So maybe that is a good reason for also allowing spaces within dim names?
About 3:
For pymc4 we have the opportunity to choose what we want without any regard for backward compatibility. Iâd definitely disallow any special chars. That would make it possible to indicate hierarchy of parameters with #
or /
. But again: Do we allow spaces within names?