Understood, many thanks again. And apologies, just to be clear, I misspoke somewhat with the word “recommendation.” Coming from the black-box likelihood notebook left me a little bit confused, as Potential does not require a logp function that follows the signature required by Distributions, and the notebook gives the example of a more permissive logp function that does embed the data within it. As I’m still a student myself, the importance of the “model parameters”-“observed data” distinction wasn’t immediately clear in working through the notebook.
It’s quite clear to me now what the API is for a Distribution’s logp function, though.
why does having the observed data be the input in
observedkwarg make a difference on this?
I don’t mean to take this thread too far out of scope. But to answer this: the primary reason I can imagine that someone would wish to avoid passing in data via the observed kwarg is if they have data which isn’t easily converted to a TensorVariable. To the best that I can tell, using observed requires that the data is numeric. And, given some Op which calculates the logp, it passes the data to the op as an ndarray.
What I was meaning with the “conversion” thing is that I’m using an external library which uses a custom format for storing data, essentially performing some transformations on a dataframe or ndarray. It even accepts non-numeric types in the dataframe/array representing factor levels in the data. Hence, so long as I’m understanding the pymc and aesara API correctly (which I definitely could be misunderstanding!) I believe that in order to use the correct logp call signature, I would need to:
- Every time that
logpis called, make my op call the external library and perform those transformations to create the library-friendly data object - Either create an op for every relevant combination of factor levels, or convert those factors to integer indices and make my op resolve those indices
I imagine that this is a very very small edge case, and I have a lot of work to do before I’ll be able to assess how the first point could impact sampling performance, so I’m happy for now leaving this as a challenge for future me to figure out.
But there’s where I’m coming from. (If it seems like I’m still misunderstanding something significant, I’d love to hear!) Thanks again for the discussion and help.