Integrating new distribution (wrapped c++ functions) - Problems with broadcasting and shapes

Dear all,
I’m still working on making “Biased Urn” distributions (like hypergeometric, but with biases to certain colors) available for parameter estimation in PyMC3. Thanks to @lucianopaz I made some progress and can run PyMC3’s samplers to estimate parameters, but only for single observations and not with any parameter shapes other than a single scalar.

Right now I’m stuck with shapes and broadcasting not working. I’m new to working with theano on that level and any help would be really appreciated. The situation is now as follows:

  • I have wrapped the relevant functions from a c++ library with the xtensor+python bindings. Hence, I have logp, random and mode available as function in python. On the c++ side xtensor takes care of vectorization, so they all exhibit broadcasting as expected.

  • To use them for a pymc3 distribution (a Discrete), I wrapped the functions mentioned above as theano ops using as_op. If everything is of unitary shape, everything works fine I can sample
    (NUTS reports that initialization fails; probably because the ops lack gradients? But sampling goes fine anyway)

  • What is not working: When I set up the distribution with different shapes or in a model connect RVs of different shapes to it, I get type errors from theano, which makes sense, because I specified the itypes and otypes of the op as scalars. If I specify them as row, vectors or matrices, the simple-shaped stuff leads to type errors, too.

Here are some links to the problematic code:

This is the new pymc3 distribution and the ops for the wrapped functions:
https://paste.ofcode.org/vbEVMMacRCZudFTJj5NSNk

This (maybe less of interest) is the xtensor-python-based wrapper:
https://paste.ofcode.org/qRPbRTgdeP3nEAgYLPHrAD
(The c++ implementation of the distribution is from https://www.agner.org/random/stocc.zip)

I’m not really sure how to go on; but here are a few questions I have:

  • Could an explicitly implemented theano op (not using as_op) help with the shape problem? Are there any examples?

  • I understand that C++ code can be associated directly with theano ops. Could this also be linked with an external library? Then I could maybe skip the xtensor-python wrapping …

  • While I find it very intuitive to implement pymc3 models with broadcasting and automatic propagation of the shapes, it still is somewhat nebulous to me what is expected from a distribution’s functions in that regard. When I look at existing functions, the magic appears to happen mostly automatically by theano or numpy behavior. Any hints / documentation in that regard would also be very helpful.

Many thanks in advance
Jan

1 Like

Yes and yes. You have to check out the extending theano sections and in particular the scalar ops.

You made a very good question and it deserves a longer answer. When I get some time, if you haven’t figured out how to write the custom op yourself before, I’ll try to write up an example distribution.

1 Like

Thanks a lot. That’s very kind.

I already made some progress (by modifying one of the c op examples). After adding all my external includes to c_headers (and setting the CPLUS_PATH environment variable) I was able to use the desired functions in c_code.

Now again the shape stuff gives me headaches. I have a rough idea on how to work on the arrays inside the c code, but still are confused about what is expected of the op so that it will work with a PyMC3 distribution.

Here are a few (vague) questions (numbered for convenience)

  1. Should the op deal with brocading and shape stuff at all? Or should the op just work on scalars and some wrapper can vectorize?

  2. If I implement dealing with shapes in the c code, do I also have to take care of broadcasting? Or are 1-sized dimensions expanded at some other stage?

  3. For my logp op, can the distribution parameters have arbitrary shape? And the “value” (whose logp is determined) and consequently the output can have the same shape or one more dimension, if several values are requested at the same time?

Many thanks
Jan

Dear @lucianopaz, dear all,

today found some time to work in this project and I made some progress:
I implemented a c++ theano opp that calls the logp function of the external c++ library:

https://pastebin.com/vZpk8dcr

I’m still unsure about dimensions and shapes … How I did it now, it can only handle vectors as input. If my variables are scalars, it segfaults (due to PyArray_EMPTY being called with 0 dimensions, I think). There is probably something wrong with my typecasting in make_node. I could implement some conditional behavior based on the dimensionality of the inputs, but there probably is a better way? As I want to use the op for the logp in a PyMC3 compatible distribution, I want it (or ultimately the distribution) to behave as existing distribution.

With the current version, this works (estimating odds for two distributions):

w = W(“w”, n = [20, 21], m = [30, 34], N = [60, 32], odds=[OTHER VAR WITH SHAPE 2], observations = [10,15]),

where W is a discrete PyMC3 distribution that calculates the logp by calling the new op.

This does currently not work (estimating single odds from several observations)
w = W(“w”, n = 20, m = 30, N = 60, odds= [OTHER VAR WITH SHAPE 1], observations = [10, 15, 23, 45, 23])

(I know how to make this work, but then the vector version would not work any more)

And probably something like this should work (but is not; it’s a combination of the two above):

w = W(“w”, n = [20, 21], m = [30, 34], N = [60, 32], odds=[OTHER VAR WITH SHAPE 2], observations = [[10, 15], [11, 15], [10, 14], [13, 15],[10, 15]])

So I’m a bit lost about the following things:

  • What behavior does PyMC3 expect in terms of the shapes and dimensions of the distribution; I see that parameters are broadcasted … but what about multiple observations (are they treated element-wise; if so, do the elements have the shapes of parameters?)

  • Who broadcasts what and when. Should I handle broadcasting in the op or in the distribution’s logp? Or does PyMC3 take care of that? I realize that this is maybe a design decision … But I’d be interested how you suggest to do it. (If I can do it in the c++ part of the op, it’s probably fast. But it might also result in awkward conditions to check the input shapes … note that my distribution has four parameters …)

  • Do you have any further suggestions looking at the code of the op? Just now I realize that making the output type equal to one of the input types (which I copied from the example) does not make so much sense in my case. There are probably more strange parts like that … Any hints are welcome.

Many thanks in advance
Jan