Trouble using DensityDist with a subclassed Op

Understood, many thanks again. And apologies, just to be clear, I misspoke somewhat with the word “recommendation.” Coming from the black-box likelihood notebook left me a little bit confused, as Potential does not require a logp function that follows the signature required by Distributions, and the notebook gives the example of a more permissive logp function that does embed the data within it. As I’m still a student myself, the importance of the “model parameters”-“observed data” distinction wasn’t immediately clear in working through the notebook.

It’s quite clear to me now what the API is for a Distribution’s logp function, though.

why does having the observed data be the input in observed kwarg make a difference on this?

I don’t mean to take this thread too far out of scope. But to answer this: the primary reason I can imagine that someone would wish to avoid passing in data via the observed kwarg is if they have data which isn’t easily converted to a TensorVariable. To the best that I can tell, using observed requires that the data is numeric. And, given some Op which calculates the logp, it passes the data to the op as an ndarray.

What I was meaning with the “conversion” thing is that I’m using an external library which uses a custom format for storing data, essentially performing some transformations on a dataframe or ndarray. It even accepts non-numeric types in the dataframe/array representing factor levels in the data. Hence, so long as I’m understanding the pymc and aesara API correctly (which I definitely could be misunderstanding!) I believe that in order to use the correct logp call signature, I would need to:

  1. Every time that logp is called, make my op call the external library and perform those transformations to create the library-friendly data object
  2. Either create an op for every relevant combination of factor levels, or convert those factors to integer indices and make my op resolve those indices

I imagine that this is a very very small edge case, and I have a lot of work to do before I’ll be able to assess how the first point could impact sampling performance, so I’m happy for now leaving this as a challenge for future me to figure out. :smiley: But there’s where I’m coming from. (If it seems like I’m still misunderstanding something significant, I’d love to hear!) Thanks again for the discussion and help.