Problem with type FreeRV while adding new distribution

Hi all,

(I originally posted this question on stackoverflow [https://stackoverflow.com/questions/56512831/problem-with-type-freerv-while-adding-new-distribution] but it was suggested that here is a better place.)

I’m trying to add a new discrete distribution to PyMC3 (a Wallenius non-central hypergeometric) by wrapping Agner Fogs c++ version of it (https://www.agner.org/random/).

I have successfully put the relevant functions in a c++ extension and added broadcasting so that it behaves as scipy’s distributions. (For now broadcasting is done in Python. … will later try the xtensor-python bindings for more performant vectorization in c++.)

I’m running into the following problem: when I instantiate an RV of the new distribution in a model context, I’m getting a “TypeError: an integer is required (got type FreeRV)” from where “value” is passed to the logp() function of the new distribution.

I understand that PyMC3 might need to connect RVs to the functions, but I find no way to cast them into something my new functions can work with.

Any hints on how to resolve this or general info on adding new distributions to PyMC3 or the internal workings of distributions would be extremely helpful.

Thanks in advance! Jan

EDIT: I noticed that FreeRV inherits from theanos TensorVariable, so I tried calling .eval(). This leads to another error along the lines that no input is connected. (I don’t have the exact error message right now). One thing which puzzles me is why logp is called at instantiation of the variable when setting up the model …

1 Like

A distribution class must have the following structure:

class Foo(Discrete):
    def __init__(self, ...):
        # Set your distribution's parameters
        ...
        super().__init__(...)

    def random (self, point=None, size=None):
        ...

    def logp(self, value):
        ...

random isn’t strictly necessary whereas logp is mandatory. random generates an array of samples from the distribution, you can look at the implementation of the Geometric or any other univariate distribution to see the common structure:

def random (self, point=None, size=None):
    parameters = draw_values(self.parameters_list, point=point, size=size)
    return generate_samples(your_rvs, parameters,...)

random is only used when sampling from the prior or the posterior predictive (the latter only happens if the observations are distributed following your custom distribution).

The logp method returns a theano tensor, or in other words, a symbolic expression that specifies how to compute the log likelihood that an observation comes from your distribution. This means that you don’t return a numerical value (no eval needed). The distribution’s logp is them wrapped by some other methods in the Model and in FreeRV, which effectively compile the tensor into a callable. These are then used under the hood to do SMC, HMC, metropolis or any other step method in the main sample function.

Hi lucianopaz,

thanks for the reply.

My problem is that to get the logp I have to call the wrapped c++ function. So I somehow need to cast the FreeRV into a normal int. I’m not sure how to do that … (Especially if it should not be evaluated right away)

Thanks
Jan

It’s hard to say exactly what’s causing the problem without seeing your code.

This is the main problem. You need to make the logp return a theano tensor. I don’t know what you’re doing now but I suppose you wrote a small extension module that communicates with the c++ compiled library. In order to get this to work with theano, you need to either write a custom Op or wrap your present function with the as_op decorator. The former is the hard way as it gives you finer grained control on the implementation while the latter is easier but sacrifices control and potential optimizations.

Yes, exactly.
Thanks for the hints! I will try with the decorator first.

Thanks again, Jan

EDIT: I solved the problem by now. At least I’m no longer getting an error at instantiation of the RV.

There were two problems in my code:

  • I was using tt.log inside the custom op (which apparently leads to problems)
  • The otype of course needs to be a float and not an int type.

Original message below for reference.

Thanks again for the help!
Jan


Dear @lucianopaz, dear all,

I think this helped a bit. I’m now running into the next problems. I have not worked tẃith theano much so far … So any help is really appreciated.

This is how the op looks now:

@as_op(itypes=[tt.lscalar, tt.iscalar, tt.iscalar, tt.iscalar, tt.dscalar],
otypes=[tt.lscalar])

def wallenius_logp(value, n, m ,N ,odds)
    vwalprob = np.vectorize(wallenius.wallenius_prob)
    p = vwalprob(value, n, m, N, odds)
    return tt.log(p)

Where wallenius.wallenius_prob is the wrapped c++ funtion.

I’m unsure about the itypes and otypes of the op. For testing, I thought going with scalars (and extend to multidimensional types later). But maybe that’s already a problem?

This is the distribution’s logp function

def logp(self, value):
    n = self.n
    m = self.m
    N = self.N
    odds = self.odds
    return bound(
        wallenius_logp(value, n, m, N, odds),
        0 <= value, value <= n,
        0 <= m, m <= N)

And here is a trace of the error I get now:

with pymc3.Model():
w = pymc3.Wallenius(‘wal’, 20,20,40,0.5)

leads to the error message below. The last part is maybe the most informative …

Any further hints are highly welcome.

Many thanks
Jan

Error message:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
1 import pymc3; m = pymc3.Model();
2 with m:
----> 3 w = pymc3.Wallenius(‘wal’, 20,20,40,0.5)
4

~/D/test/pymc3/pymc3/distributions/distribution.py in new(cls, name, *args, **kwargs)
44 total_size = kwargs.pop(‘total_size’, None)
45 dist = cls.dist(*args, **kwargs)
—> 46 return model.Var(name, dist, data, total_size)
47 else:
48 raise TypeError(“Name needs to be a string but got: {}”.format(name))

~/D/test/pymc3/pymc3/model.py in Var(self, name, dist, data, total_size)
824 with self:
825 var = FreeRV(name=name, distribution=dist,
–> 826 total_size=total_size, model=self)
827 self.free_RVs.append(var)
828 else:

~/D/test/pymc3/pymc3/model.py in init(self, type, owner, index, name, distribution, total_size, model)
1272 self.tag.test_value = np.ones(
1273 distribution.shape, distribution.dtype) * distribution.default()
-> 1274 self.logp_elemwiset = distribution.logp(self)
1275 # The logp might need scaling in minibatches.
1276 # This is done in Factor.

~/D/test/pymc3/pymc3/distributions/discrete.py in logp(self, value)
102 wallenius_logp(value, n, m, N, odds),
103 0 <= value, value <= n,
–> 104 0 <= m, m <= N)
105
106 def repr_latex(self, name=None, dist=None):

~/D/test/pymc3/pymc3/distributions/dist_math.py in bound(logp, *conditions, **kwargs)
48 alltrue = alltrue_scalar
49
—> 50 return tt.switch(alltrue(conditions), logp, -np.inf)
51
52

~/anaconda3/envs/dycose/lib/python3.7/site-packages/theano/gof/op.py in call(self, *inputs, **kwargs)
623 for i, ins in enumerate(node.inputs):
624 try:
–> 625 storage_map[ins] = [self._get_test_value(ins)]
626 compute_map[ins] = [True]
627 except AttributeError:

~/anaconda3/envs/dycose/lib/python3.7/site-packages/theano/gof/op.py in _get_test_value(cls, v)
560 # ensure that the test value is correct
561 try:
–> 562 ret = v.type.filter(v.tag.test_value)
563 except Exception as e:
564 # Better error message.

~/anaconda3/envs/dycose/lib/python3.7/site-packages/theano/tensor/type.py in filter(self, data, strict, allow_downcast)
85 if isinstance(data, Variable):
86 raise TypeError(
—> 87 'Expected an array-like object, but found a Variable: ’
88 'maybe you are trying to call a function on a (possibly ’
89 ‘shared) variable instead of a numeric array?’)

TypeError: For compute_test_value, one input test value does not have the requested type.

Backtrace when that variable is created:

File “/home/jeti/anaconda3/envs/dycose/lib/python3.7/site-packages/IPython/core/interactiveshell.py”, line 3020, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File “/home/jeti/anaconda3/envs/dycose/lib/python3.7/site-packages/IPython/core/interactiveshell.py”, line 3185, in run_ast_nodes
if (yield from self.run_code(code, result)):
File “/home/jeti/anaconda3/envs/dycose/lib/python3.7/site-packages/IPython/core/interactiveshell.py”, line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 3, in
w = pymc3.Wallenius(‘wal’, 20,20,40,0.5)
File “/home/jeti/D/test/pymc3/pymc3/distributions/distribution.py”, line 46, in new
return model.Var(name, dist, data, total_size)
File “/home/jeti/D/test/pymc3/pymc3/model.py”, line 826, in Var
total_size=total_size, model=self)
File “/home/jeti/D/test/pymc3/pymc3/model.py”, line 1274, in init
self.logp_elemwiset = distribution.logp(self)
File “/home/jeti/D/test/pymc3/pymc3/distributions/discrete.py”, line 102, in logp
wallenius_logp(value, n, m, N, odds),

The error when converting the test value to that variable type:
Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?
`

1 Like

Did you ever solve this? Do you have a gist of your code?