Potential with array of logps

I have a model where I will need each parameter to contain independent samples for each observation. So i create the parameters with shape=N and hope to produce samples numchains x numsamples x numparameters x N.

Can the potential function return N logps instead of a traditional scalar when sampling with NUTS?

The potential function internally calls an aesara op to get the logps which in turn calls a black-box model to provide the logps and gradients, but not sure if that’s really relevant. And it all works when N=1 (i.e. the op returns a scalar), but I get weird array dimension mismatch errors in
func = model.logp_dlogp_function(vars, dtype=dtype, **aesara_kwargs) otherwise.

Not sure if its related to something incorrect in my op (although i could get it working for a scalar logp) or whether we cannot have the potential function return an array of logps, so a confirmation will help me identify the issue.

TIA

The pm.Potential requires a scalar as the input, which can be the .sum() of your vector.
If you want to track the vector elements in the trace/InferenceData, you can wrap the vector into a pm.Deterministic

Does that help?

1 Like

Thanks for the suggestion for pm.Deterministic. Unfortunately i cannot do a .sum() since it would be incompatible with my model objective, so that rules out pm.Potential.

But i am still struggling to make aesara op/Deterministic work for vectors i.e. an aesara op that returns a vector that can be used inside Deterministic. It all works for a scalar!

It’s difficult to post the full-code, but here is a pseudo-code with a simple scalar to vector change that actually breaks. The code is adapted from this v3 code.

class MyOp:
   make_node():
       # 1st value is return type and it WORKS if its at.dscalar.type()
      out_t = [at.dvector.type()] + [...] 
      Apply(.., out_t)

  perform():
      outstorage[0][0] = blackbox_output   # this is a  vector from blackbox
      outstorage[1][0] = blackbox_grads # these are grads from blackbox

# this is expected to use the return value of the custom op
pm.Deterministic("obs", my_op(*model_vars)[0])

If I use at.dscalar as return type and blackbox_output is a scalar, with both pm.Potential and pm.Deterministic all works correctly.

However, if out_t is a at.dvector (with blackbox_output as a vector), i get a compile_pymc error:

AssertionError: Something inferred a shape with 0 dimensions for a variable with 1 dimensions for the variable.
MyOp.0 [id A] <TensorType(float64, (None,))>

My input parameters to aesara op are all vectors, so i am confident that vectors work correctly. Its only the return value as vectors that I am struggling. Not sure if I have to pass anything extra (e.g. a shape parameter somewhere). Is there such an example of theano op that returns a vector and is wrapped from pymc with potential/deterministic?

What are you trying to do? If you want to add the logp term directly to the graph, you should use Potential. Whether you sum it or not is irrelevant, Potential will call sum on your output regardless.

But Potential does not return anything (useful), it simply adds a logp term to the model. Why would you need it to return something? What are you planning to do downstream of it?

Edit: It looks like there is something wrong with you Op. Can you share the whole code, including the part that leads to an error message?

Maybe these v4 guides can help?

Thanks for taking a look. Here is my usecase:
I have a blackbox model (in pytorch) that accepts as input a D dim parameter values, a D dim observations and returns a scalar logp value. I also can get the gradients of the parameters.

I use pymc to set the priors for the D dim parameter variables, and want to sample posteriors based on the logp returned by blackbox. I have an aesara op that wraps this torch code, and uses perform/grad as normal (i.e. sets the scalar logp, grad etc. in outputs). This op is called from a pm.Potential function, followed by pm.Sample. All works as expected, similar to the sample here .

The blacbox model is vectorizable - i.e. it can also accept NxD parameters and NxD observations, and return N logp values. Now the logp values are calculated “independent” of each observation/parameter (lets just say that is the intended model design).

In pymc, I now have priors for NxD parameter variables. I have set the aesara inputs appropriately. Now if I sum the logps and return a scalar logp to use by pm.Potential, all works correctly (so I know the inputs are not an issue). But the logps sum is NOT what I want. I want to have N x D x numsamples posteriors generated independently for each llh, i.e. assuming N different logps and corresponding gradients.

Of course, I can call this instead in a loop N times - but that is pretty slow.

I see now that wrapping in pm.Deterministic of the aesara op vector output is the correct way (since pm.Potential will imply sum).

But I am stuck with making the aesara op output variable be a N dimensional vector (instead of a scalar) and pass it to Deterministic. In the JAX sample you provided as well, outputs is [at.dscalar()] - a sample for [at.dvector()] that is used by Deterministic will be useful.

I will wait for you to comment on the usecase before posting the code (it is difficult to extract and provide self-contained working code snippet).

I don’t understand why you don’t want to sum the logps. In any PyMC model the logps of every variable (and regardless of whether they are scalar, vectors or ndimensional variables) are always summed (and then summed together) for the purpose of sampling. For instance, you can only take gradients wrt to scalar quantities, so if we didn’t sum, NUTS couldn’t be used.

The only cases where you need elementwise probabilities (in PyMC) are for model comparison. If you really need them, you should create a new distribution or use DensityDist. Deterministic is not what you need from your description.

In the JAX guide you have examples for non scalar outputs in the Grad Ops. There’s nothing special about non-scalar outputs, you just have yo make sure you define them correctly.

Regardless, check my comment above. I think you are confusing what we mean by Potential summing the output. The independent logps (regardless of how you bring them to the model) will always be summed by PyMC and that’s not a problem. It isnt “confusing” or “mixing them”, since the graph is symbolic and gradients will be properly partitioned wrt to each input even after summing.