Model metaclass for more natural model definition (without string var names on RHS)

I’ve always wondered whether there might be a way to specify the model without the string name on the right-hand side like we do now

with pm.Model() as linear_model:
    weights = pm.Normal('weights', mu=0, sigma=1)   # name 'weights' specified on RHS

Recently I’ve been playing with param (mostly in the Panel context, but it’s independent) and I realized that they know the label/name of a parameter without specifying it as a string in a class constructor call because they are using a smart metaclass which sets the names according to the attribute names.

In theory this should make it possible to specify the model like so

class LinearModel(pm.Model):    # model definition as a class becomes mandatory though for this API
    weights = pm.Normal(mu=0, sigma=1)   # the name weights is encoded as the class-level attr name

if there is the following metaclass implementation (rough sketch) and names in variable constructors become optional

class ModelMetaclass(...):

    def __init__(mcs, name, bases, dict_):
        ....
        variables = ( (n, v) for (n,v) in dict_.items()
                             if isinstance(v, (Distribution, Deterministic))]   # TODO some better baseclass?
        for var_name, var in variables:
            var._set_name(var_name)   # probably has to do more things

In theory this might even be backwards compatible, it could be a next-gen API if people are interested. Personally, I encode models as a class quite often as it is convenient for stacking models, etc., so for me creating classes for models does not seems such a hassle.

One might argue that this is just syntax sugar and perhaps not that necessary. But I think that for quite a few people syntax sugar is an important decision factor (even though they may not admit it even to themselves).

I’m putting this idea here to gather feedback and see if there is any interest.

2 Likes

We have explored something similar, @twiecki and @_eigenfoo have a bit more context on this.

We’ve considered doing something similar to this in PyMC4, but not PyMC3. Even in PyMC4, the discussion was mainly around removing the yield keyword from the model specification API by parsing the Python AST, as I outlined in [1]. We thought that if we were doing this, it would make sense to also add an “autonaming” feature. However, the AST proposal was subsequently dropped [2], and we never really picked up any discussion on autonaming for PyMC4…

That doesn’t address the proposal here, though: it seems sensible enough to me, but the main difficulty is how we would get the dict_ dictionary in the @smartass101’s code snippet. I don’t think that object currently exists anywhere in the PyMC3 codebase, so we would have to construct it ourselves… by parsing the AST :slight_smile:

Which is not to say that I’m opposed to an autonaming feature! Just want to point out that it might be nontrivial to implement.

[1] https://eigenfoo.xyz/manipulating-python-asts/
[2] https://twitter.com/avibryant/status/1150827954319982592

4 Likes

I looked a bit deeper into the metaclass docs and I discovered that since Python 3.6 implementing this is actually trivial using the __set__name__ descriptor method. If you wanted to support older Python versions, then it would be quite easy to backport this functionality into a simple helper metaclass (a quick search on PyPi didn’t reveal such a package, but perhaps I missed it).

So on Py>=3.6 all you need to do is the following in the dsitribution base class

class Distribution:
    """Base class for distributions"""
    def __init__(self, name=None, **params):
        self.name = name
        self.params = params
        
    def __set_name__(self, owner, name):
        """Descriptor method since Py3.6
        
        Could activate other functionality depending on a known name,
        e.g. allocating nodes in the computational graph"""
        if self.name is None:  
            self.name = name
        # otherwise keep the specified one, perhaps to give nicer labels

    def __repr__(self):
        """Show a nice representation of the distribution
        
        if bound to a variable, prints is as 
        var = Distrib(params)
        """
        params = ', '.join(f'{n}={v}' for (n,v) in self.params.items())
        base = f'{type(self).__name__}({params})'
        try:
            return f'{self.name} = ' + base
        except AttributeError:
            return base

Let’s create some simple mock-up distributions for visualization purposes.

class Normal(Distribution):
    
    def __init__(self, name=None, mu=0, sd=1):
        super().__init__(name, mu=mu, sd=sd)
        
class Uniform(Distribution):
    
    def __init__(self, name=None, lower=-1, upper=1):
        super().__init__(name, lower=lower, upper=upper)

So the name argument becomes optional. Now this is what you would get with these commands

>>> Normal()
Normal(mu=0, sd=1)
>>> Normal('x')
x = Normal(mu=0, sd=1)

And now comes the magic. This will actually work on any class that uses (possibly a subclass of) the type metaclass, but I suppose one will likely want to inherit from some PyMC Model class.

#probably useful to inherit, but not actually necessary for this to work
class MyModel(pm.Model):
    a = Normal(mu=2)

And now check out

>>> model = MyModel()     # not actually necessary to create an instance, but this is the likely workflow 
>>> model.a
a = Normal(mu=2, sd=1)
>>> MyModel.a     # still the same object, not sure if this is is a pro or con ... but can be solved
a = Normal(mu=2, sd=1)

discussion was mainly around removing the yield keyword

Yeah, I definitely support that.

but the main difficulty is how we would get the dict_ dictionary in the @smartass101’s code snippet. I don’t think that object currently exists anywhere in the PyMC3 codebase, so we would have to construct it ourselves… by parsing the AST :slight_smile:

Fortunately no, the dict_ namespace is actually populated when the class body is executed and only after that is the metaclass called. So no need to parse the AST, since it’s Python itself doing that.

I’d like to update the post topic to reflect this descriptor easier alternative, but it seems that is not possible anymore.

3 Likes

Should I perhaps create a separate thread for the simpler descriptor protocol? I cannot change the name of this thread it seems.

I’m curious what you think about this descriptor approach. In particular, I wonder whether my comments above have addressed your reservations @_eigenfoo.