Dynamic shaping, "round" function, JAX, and a "few" more questions

Thank you for all the hard work you have put into providing us with a powerful and flexible Bayesian inference framework in Python! Looking forward to seeing what future developments PyMC has in store for us!

I am trying to model as accurately as possible the process involved behind the signal measured with a light detector that I need to characterize. The number “Ne-” of photon-electrons generated each time the process happens follows a Poisson law, and each photo-electron then experiences a different multiplication gain given by a Gamma law. In other words, the shape of the Gamma distribution is not fixed in advance, but depends on Ne-. Pending the implementation of dynamic shaping, the trick presented here (Hierarchical changepoint detection - #5 by ckrapu), based on dynamic indexing, should address this first challenge.

Q1- Sorry, it is a redundant question … could you please confirm that at the moment, there are no other method to get around this dynamic shaping?

Then the analogue signal is digitized (~binned) by an ADC when the detector is being read. I was thinking of modelling this effect as follows:
1- Either directly using the “pymc.math.floor” function, provided that PyMC (via aePPL) is able to derive the logp for this operator, which seems to be the case according to (Transforms),
2- Or in the same vein as (Estimating parameters of a distribution from awkwardly binned data — PyMC example gallery), i.e. by segmenting the logpdf into several bins by differentiating the logcdf, then by matching the data to a multinomial distribution.

Q2- Would the first option, which I prefer for its simplicity, work?

Q3- I saw the use of the “rv_register” function in a video from Ricardo. Can this function be used in the same way as the “CustomDist” function to sample and constrain a model with observed data?

Q4- Is it possible to translate the PyTensor graph into JAX even if the sampler used is not in JAX? I don’t think I’ve seen an example with this scenario.

Q5- Is it possible to sample on more than one CPU core when there is a JAX jitted and vectorised function wrapped/encapsulated in a PyTensor Op in the model graph? This example (How to wrap a JAX function for use in PyMC — PyMC example gallery) suggests that it is not possible.

Q6- Out of curiosity, is PyMC still connected to the aeMCMC project, or to a similar project, promising to take PyMC to the next level including graph optimization of the combined/full {logp model+sampler} graph, automatic sampler selection, model reparameterization/marginalization, …?

(If I may say so, there is currently a great deal of developments (PyTensor, JAX, DaCe, Jul**, etc.) - initially stimulated in part by the rise of neural networks - around graph representation, high and low level optimizations, algebraic simplification, auto-differentiation, parallelization and JIT, and so on. Of course, it’s not PyMC’s role to provide an in-depth tutorial on these concepts. That said, it would probably be very useful, especially for someone unfamiliar with these concepts, to explain where PyTensor and PyMC stand in order to help users make an informed choice between these different tools, and also to make the most of PyMC. For example, unless I’m mistaken, the automatic derivation of the logp thanks to aePPL is a truly unique feature of PyMC that should perhaps be emphasized more strongly.)

Thank you for your help and your time.

A lot of great questions! Forgive me if I skip/bundle up some of them:

Yes. The easiest way is to exploit a CustomDist and let PyMC infer the logprob of the rounding operation. Something like this (untested code):

import pytensor.tensor as pt
import pymc as pm

def round_dist(mu, sigma, size):
  raw_dist = pm.Normal.dist(mu, sigma, size=size)
  return pt.round(raw_dist)

data = [1, 2, 3]  # dummy_data
with pm.Model() as m:
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  llike = pm.CustomDist("llike", mu, sigma, dist=round_dist, observed=data)

 # just to confirm it can figure out the logp 
m.point_logps()  # {'mu': -0.92, 'sigma': -0.73, 'llike': -9.34}

Yes (or numba). You can do this by setting the PyTensor mode (if in PyMC V5, Aesara in V4).
For example, after defining your model if you want to try using JAX in the PyMC Nuts sampler (for example)

with pm.Model() as m:
  ...
  with pytensor.config.change_flags(mode="JAX"):
    pm.sample()  # or model.compile_logp() or whatever you want to do

Now you might find some issues with gradient-based samplers, because the way PyTensor (and PyMC specifically) defines gradient graphs tends to not be very liked by Jitted JAX (shape unravelling and things like that)

I don’t remember the example that well, does it state somewhere that this is not possible? I have a vague idea of seeing custom JAXed Ops fail in multiprocessing but I am not sure. It could very well work out of the box… or maybe JAX doesn’t like how we pickle and unpickle things for multiprocessing. If you test it out and find problems feel free to open an issue in the github repository. Feedback on that area is incredibly valuable!

We are not connected to aeMCMC/aePPL in terms of development, but share many of the goals with those libraries. Most progress in this area is being done in PyMC’s logprob submodule (which is a direct fork of Aeppl, and contains the whole lopgrob inference machinery) and also in pymc-experimental. Some examples of the latter:

Definitely, there has been a focus on establishing the core logic and documentation is lagging, but functionality is slowly coming to the surface. The CustomDist as I answered above is one way we try to offer these capabilities to users (you can find some simple examples about using dist and allowing inference already): pymc.CustomDist — PyMC v5.0.1 documentation

We have some collaborators expanding on the logprob submodule this summer and I have been nudging them to start adding more documentation, so maybe that will improve soon :smiley:

We also use it in the codebase without users knowing about, for instance for pm.RandomWalk and pm.Censored. There has also been a big push to infer probability of timeseries defined with Scan, which now lives in a gist, but one day @jessegrabowski might make a pymc-example out of it :smiley:

Thanks for your input. Let me know if you have more questions / suggestions.

1 Like

This one is trickier, because there are no standard “samplers” for dynamic-shaped variables. There was a guy thinking about RJMCMC but it has bee idle for some time: rjmcmc stepper and example test case by TeaWolf · Pull Request #20 · pymc-devs/pymc-experimental · GitHub

Until some sampler is implemented that can in theory handle such kind of space-switches, the best alternatives are marginalization or clever padding/masking.

You are allowed to pass observed in to a random variable created via rv_register. You can see an example of this here, in conjunction with a scan to automatically infer the logp of a time series distribution.

Edit: This was already demonstrated in the gist Ricardo linked. But that’s the explicit answer to Q3.

1 Like

While that’s true, I would suggest using CustomDist as it will play nicer with dims/shapes (it will feel like using a vanilla distribution). model.register_rv is more internal and was used mostly as a work-around while CustomDist was being revamped.

We don’t promise to maintain the register_rv API stable, while we will try to do that for CustomDist

1 Like

@ricardoV94, @jessegrabowski, Thank you so much for all the information and clarifications provided in response to my questions, along with the code samples. It is very, very helpful. PyMC’s ability to automatically infer the logp, even for complex models like time series, is an extremely powerful and exciting feature! Looking forward to learning more about the ongoing developments such as automatic discrete marginalization and derivation of arbitrary censoring logp!

Now that I have a better understanding of the inner workings of PyTensor/PyMC, I would like to put it into practice to implement my first model, what I’ve been trying to do for the last few days, sorry for the late reply!

You are right, it is not stated in this example that it is not possible. I suppose I concluded too quickly that it was a fundamental limitation of PyMC, given that I didn’t see any reason to set the maximum number of CPU cores to only 1, and also I must have misread the issue reported on this github post (New `mp_ctx` defaults in 4.3.0+ (#6218) for M1 Macs could break parallel sampling for M1 Mac users using JAX · Issue #6362 · pymc-devs/pymc · GitHub)!

Below is a draft of my PyMC model (here using “rv_register”). It doesn’t work as it is due to the issues I have outlined below along with the solutions I have in mind to address them. Any comments and help would be greatly appreciated!

import pytensor.tensor as pt
import pymc as pm

data = np.array([1., 2., 3.])
with pm.Model() as mymodel:
    
    #**SOME CONFIG PARAMS
    avrg_rn_prior = 8.
    chrggn = 3.6
    mu_rn = 0.
    max_avlchgn_drws = 30
    sigma_avlchgn_scl = 10.
    
    #**PHOTON-ELECTRONS GENERATION
    phe_rate = pm.Gamma(name="phe_rate", mu=3., sigma=3.**0.5)
    n_phe = pm.Poisson.dist(name="n_phe", mu=phe_rate)
    
    #**AVALANCHE GAIN GENERATION
    mu_avlch_gain = pm.Uniform(name="mu_avlch_gain", lower=50., upper=100.)
    sigma_avlch_gain = (pm.Exponential(name="sigma_avlch_gain", lam=1.) * 
                            sigma_avlchgn_scl)
    
    # 1st option - using "shape" parameter.
    # avlch_gains = pm.Gamma.dist(name="avlch_gains",
    #                         mu=mu_avlch_gain, sigma=sigma_avlch_gain,
    #                         shape=(max_avlchgn_drws,))
    # Case "n_phe = 0" not handled explicitly, but "pm.draw" still works and
    # returns 0 as required.
    # Raises "ShapeError: Dimensionality of data and RV don't match.
    #         (actual 1 != expected 0)" if "keepdims=False" (default).
    # n_chrgs = pt.sum(avlch_gains[:n_phe], keepdims=True) * chrggn
    
    # 2nd option - using properties (summation + scaling) of the Gamma law.
    # Case "n_phe = 0" handled explicitly using a "switch" statement, but
    # still get "ValueError: Domain error in arguments. The `scale` parameter
    #           "must be positive [..]".
    avlch_gains = pt.switch(n_phe>0.,
        pm.Gamma.dist(name="avlch_gains",
                      mu=mu_avlch_gain*n_phe*chrggn,
                      sigma=sigma_avlch_gain*n_phe**0.5*chrggn,
                      shape=(1,)), 0.)
    n_chrgs = avlch_gains
    
    #**READOUT NOISE GENERATION
    rn = pm.HalfNormal(name="rn", sigma=1) * avrg_rn_prior
    rd_noise = pm.Normal.dist(name="rd_noise", mu=mu_rn, sigma=rn)
    
    #**OUTPUT SIGNAL GENERATION
    gen_sgnl = pt.round(n_chrgs + rd_noise)
    
    #**CUSTOM DIST REGISTRATION
    mymodel.register_rv(gen_sgnl, "gen_sgnl", observed=data)
    
    # llike = pm.CustomDist("llike", phe_rate, mu_avlch_gain, sigma_avlch_gain,
    #             rn, dist=custom_dist, observed=data)

1- The parameters of the Gamma distribution depend on the RV “n_phe” following a Poisson distribution. In other words, the RV “avlch_gains” is defined by a compound distribution. PyMC can’t marginalise on the discrete RV “n_phe” (at least not yet!), so can’t automatically derive the logp of “avlch_gains”. I am thinking of addressing this challenge using a mixture distribution (“pymc.Mixture”). The weights will be given by the PMF of the Poisson distribution (truncated for “n_phe” high enough, here >~10). In addition, this will have the advantage of fixing this error happening in the “switch” function for some unknown reason when “n_phe=0”.

Q1- Could PyMC compute the logp of the following term “n_chrgs = pt.sum(avlch_gains[:n_phe])” if “n_phe” was drawn from a prior dist (=a parameter of the model to be determined)? (can’t wait for this video to be posted to learn more about that ([PyMCon Web Series 07] Automatic Probability with PyMC (June 28, 2023) (Ricardo Vieira)).

2- The RV “gen_sgnl” results from the sum of a Gaussian + Gamma RVs - omitting for the moment the “round” operation. PyMC doesn’t currently know how to derive the logp of this sum, but maybe one day … (ENH: Add an op corresponding to `scipy.integrate.quad` · Issue #290 · pymc-devs/pytensor · GitHub). I think I have two options:

a- Compute by convolving the 2 PDFs, either symbolically (using Sympy, or Mathematica, …), or directly numerically (using Scipy) the logp of this sum, as well as its derivatives, then create a custom RV “Op” to encapsulate the derived formulas coded in Scipy/Numpy. This sounds a bit tedious …

b- In fact, the logp of this sum can be calculated symbolically (for example, see What is the convolution of a normal distribution with a gamma distribution? - Cross Validated). So I could implement the logp directly expressed in PyTensor operators, and provide it to ‘CustomDist’, along with the required “random” function. Nothing new so far! Two questions though:

Q2- Will PyTensor be able to derive the logp function as a function of its parameters, despite its complexity?

Q3- All the operators involved in the logp should already be available in PyTensor, except for the “hyp1f1” function (=“confluent hypergeometric function”). But, the “hyp2f1” function is already implemented in PyTensor, and is very close to “hyp1f1”, also available in Scipy. Would it work if I copied and modified the “hyp2f1” function in my code? (this is a detail, I think the following comment should rather be “d/dc[A(k)/B(k)] = - A(k)/B(k) / (c + k)”, line #1630 of the file “[…]/scalar/math.py”. the code line #1702 looks correct though).

Q4- Out of curiosity, the “hyp2f1” function is only executable in pure Python (as it is provided by Scipy), but can its gradient be converted into another backend (like JAX) via this new “ScalarLoop” operator? So assuming that the “hyp2f1_grad” function can be fully recognized as a PyTensor graph.

Thank you again for your help and your time!

Hi, I would like please to ask you a few more questions about the creation of a custom distribution:

Q1- If I go back to your example given above of the “round_dist” function, this new dist function is in fact already usable/functional as it is, even outside a “model” context, and without the need to wrap it within a “CustomDist” function, is it true? In addition, PyMC has already automatically derived the logp considering the “round” operator, so the code below should work as expected? However, I will have to manually specify the “size” argument of the dist, and it will have to be consistent/compatible with the shapes of the “mu” and “sigma” parameters. Do you agree with this?

mydist = round_dist(mu=[0., 0.], sigma=1., size=2)
vv = pt.vector(name="vv")
dist_logp = pm.logp(rv=mydist, value=vv)
rr = dist_logp.eval({vv: [0., 0.5+0.0001]})
# -> array([-0.95991632, -1.41993248]) => OK!

Q2- Is it possible to use a “CustomDist” outside a model “context” in order to create a new, completely “self-sufficient” distribution and benefit from the automatic handling of the “size/shape” parameters?

Q3- This is related to the previous question Q2. In the case where my “CustomDist” is based on a custom logp function (+ a custom “random” function) (which is in addition implemented in JAX following this great tutorial “How to wrap a JAX function for use in PyMC — PyMC example gallery”), the “size/shape” parameters (whether explicit or implicit) can be >1 only if my custom logp function is already vectorized along the parameters (for example “mu”, “sigma”, …) of the distribution, right? PyMC can’t vectorize the logp itself? As a corollary, if I want to use this custom distribution as components of a “Mixture” dist, I wouldn’t be able to benefit from vectorization using the “shape” argument, but I would have to manually create the stack of Mixture components/distributions I need, for example by generating the stack using a list comprehension?

Q4- Could you confirm that in order to use the “round” operator with my custom dist, and benefit from the automatic logp derivation, I must also provide the logcdf function?

Thank you again for your help and your time.

I’m trying to get the derivative of dist_logp above. I’ve tried different ways with the pymc.gradient and pytensor.tensor.grad functions, but it doesn’t work. I get various errors, for example “TypeError: Cost must be a scalar”. What is the correct way to do this? I couldn’t find an example. Thank you for your help.

You probably want the derivative of the joint log probability over all observations: p(v_1, v_2, \dots, v_n | \theta) = \Pi_{i=1}^n p(v_i | \theta) = \sum_{i=1}^n \log p(v_I | \theta). So try pt.grad(dist_logp.sum(), theta) (where theta are the parameters of your distribution).

This all assumes the components of vv are independent draws from mydist.

2 Likes

Thanks for all the great questions. Let me try to address your first batch of questions :slight_smile:

There’s two things going on there. Slicing a vector RV by another variable, could be done if n_phe is a model variable (which in your case I think it is). PyMC could give you the logp for that as it’s simply the logp of the first n_entries…, It’s not implemented yet, but it’s actually something I have a draft on from other work done on supporting mixture probability…

However, all of the the current samplers would still not be able to handle the change in shape.

Sidenote: A simpler case of this happens if you try to sample unobserved Multinomial counts with changing n. Because n and the multinomial counts are sampled separately, it’s impossible to propose any new n. The probability with the old counts is always 0 for any different n. You would need a custom sampler that proposes both n and counts together.

Back to the question, you also have a sum operation, whose probability requires convolving the multiple RVs in the vector (after the slice), which is harder and currently not supported. We don’t even support the sum of Normals yet, even though those have a closed from solution. For finite-discrete-convolutions without closed-form solution we could do nested loops, but this will quickly hit dimensionality issues, even for short vectors of just 4-5 RVs.

I notice you linked to the dynamic-switch model above. I don’t know if I linked to my version already or not (apologies if I did), where I use MarginalModel to marginalize the discrete uniform RV: multiple_changepoint_marginalization.ipynb · GitHub.

This should also work for Truncated discrete variables like Poisson in the future, but that functionality is still missing: Extend marginalization support to Truncated discrete RVs · Issue #95 · pymc-devs/pymc-experimental · GitHub

However, note we still need all the slicing magic so that the sampler can keep proposing a fixed number of “innovations” across change-points.

More broadly, I am still unsure about this kind of model because I expect there should be strong dependencies between number of change points and values of “innovations”, that I think aren’t properly addressed by marginalizing over the latent discrete variable. The sampler will have to propose innovations that work on average across all possible number of change points (weighted by their probability of course), which might actually be terrible for any specific number of change points. A conditional-sampler would probably be needed for this kind of models (I think RJMCMC would achieve this, by keeping separate sampling spaces for the different values of the discrete variable). This lightly touches on the issue of pseudo-priors as well if you have ever encountered them.

If the logp function is provided by you, and already parametrized, then there is nothing to infer. By the way, it would be nice to implement it in PyMC, if you are interested!

Perhaps it would be easier to start with the sum of Normals, which was implemented by @larryshamalama in aeppl sometime ago but not ported to PyMC. Our approach works a bit differently nowadays, but shouldn’t be too hard: Add rewrite for sum of normal RVs by larryshamalama · Pull Request #239 · aesara-devs/aeppl · GitHub

Adding the hyp1f1 (and gradient) to PyTensor would be a great addition as well (so other’s benefit from the work you had to do once anyway). Addressing the comments is also fine (if nothing else, just opening an issue in our PyTensor repo would be much appreciated)

The main function hyp2f1 must be provided by JAX so we can support it via dispatching. I don’t know if JAX offers this one currently (or Tensorflow Probability which can also be used). If it does, it would be a pretty simple PR in PyTensor to make them compatible. Instructions here: Adding JAX and Numba support for Ops — PyTensor dev documentation

The ScalarLoop hasn’t been ported to JAX yet, but should be! Once it is, any graphs using it should work out of the box. I think we can implement it with a WhileLoop or Fori in JAX. It should be simpler than our implementation of the more general Scan which you can read here at your own peril :smiley: :https://github.com/pymc-devs/pytensor/blob/main/pytensor/link/jax/dispatch/scan.py

However, if you are using a JAX sampler like Numpyro or BlackJAX, the gradients will be obtained via JAX auto-diff machinery, and not PyTensor. So the ScalarLoop is likely to not matter. It will depend on whether JAX can take the gradient of the Hyp1F1 (if they provide it).

1 Like

Now for the second batch of questions!

Yes. The reason I suggest wrapping in CustomDist is that users then benefit of the extra distribution functionality like being able to specify shape/dims/observed and not having to know about the lower level model.register_rv. If you only want to take draws or obtain the logp graph you don’t need a CustomDist at all.

Another place where CustomDist is useful is if you want to use it as an input to another “Distribution factory” like Mixture/Truncated/RandomWalk which can’t accept arbitrary graphs, because they need to be able to resize them to respect the user provided shape/dims/observed. If you pass a CustomDist they will be happy to take it.

Yes. CustomDist.dist(...) will do it :wink:

Right. The logp function is expected to work with whatever shape of arguments/ value the RV is originally defined. You could however call jax.vmap inside your helper function to do the vectorization for you automatically. We are working on a PyTensor equivalent (for now at the level of a single Op: https://github.com/pymc-devs/pytensor/pull/306, but it’s straightforward to vectorize a whole graph once we have this functionality in place), but if you are already working with JAX, you already have that functionality from them.

round requires being able to take the logcdf of the underlying RV. If that already has a logcdf, there’s nothing you need to do. The logcdf of the “rounded variable” isn’t needed unless you are using it inside a Censored or Truncated. We could easily implement the logcdf of rounded variables as well. Note that pm.logcdf(..., value) can also infer the logcdf of some RVs (less often than the logp works, but that’s more because of lack of demand so far). Similarly, the CustomDist will make use of this to automatically derive the logcdf if it can do it, from the dist function.

1 Like

@ricardoV94, @jessegrabowski, Thank you a lot for all the information provided in your replies above. It is extremely useful to get so many insights into how PyTensor and PyMC work. I will need some time to digest everything. I am still trying to get my model working, I think I am very close. I have only been using PyTensor/PyMC/JAX for a few weeks, so it is a lot to assimilate in a short time! I am still facing a few challenges. Any comments or help would be greatly appreciated:

1- I ran into exactly the same issue as the one reported in this thread (How can I output a gradient in vector format in Op.grad instance?) with my custom PyTensor Op ((also wrapping a function in JAX)). If we go back to this “simple” example given by @HajimeKawahara, the issue comes from the fact that the parameter phase is the same scalar used for all the elements of the vector x. In other words, PyTensor doesn’t know that the parameter phase needs to be vectorized. That is why it expects a zero dimensional gradient for the gradient of the Op function with respect to this variable. I solved this in my case by adding inputs = list(pt.broadcast_arrays(*inputs)) at the beginning of the function make_node … I don’t know if there is a recommended strategy to deal with this case. I guess it is only an implementation issue in this specific example, since this problem doesn’t occur for the rest of PyTensor/PyMC. But perhaps it would be useful to complete the PyMC examples by considering this edge case, particularly for (How to use JAX ODEs and Neural Networks in PyMC - PyMC Labs). This is not relevant for (https://www.pymc.io/projects/examples/en/latest/case_studies/wrapping_jax_function.html) since the cost function to minimize is already provided reduced via the sum to a function Potential.

2- Thanks @jessegrabowski. I get the expected result by taking the sum of the function, i.e. the “vectorized” gradient of the function with respect to its input parameters as indicated on this page (Derivatives in PyTensor — PyTensor dev documentation). I don’t find it very intuitive to have to take the sum, I still don’t really understand the logic … but now it works!

3- Is it possible to take a CustomDist of a CustomDist. Let’s take the simplified example below:

def _custom_dist1(mu, sigma, size,):
    _dist = pm.CustomDist.dist(mu, sigma,
        random=_myrndfunc, logp=_mylogp, logcdf=_mylogcdf,
        size=size,)
    return _dist

def _round_dist(mu, sigma, size,):
    _dist = pt.round(_custom_dist1(mu, sigma, size=size))
    return _dist

def _custom_dist2(mu, sigma, size,):
    _dist = pm.CustomDist.dist(mu, sigma,
        dist=_round_dist,
        size=size)
    return _dist

I get the following error when I try to “instantiate” _custom_dist2: “Model variables cannot be created in the dist function. Use the .dist API”.

4- This may be a recurring question, sorry about that: is it possible to get the logp from the PyMC model (via model.logp or model.compile_logp) which accepts the un-transformed parameters of the model as inputs? Or, how can we recover from PyMC the functions associated with these transformations? (It would be great to be able to overlap the model logp curve over the histogram of the observed data (again, I suppose that’s a fairly classic question!)).

5-My Mixture model is made up of a finite/limited number of components weighted by a truncated Poisson Law (~convolution of a Normal Law with a Poisson dist.). However, the random process generating the observed data is not “truncated”. In other words, there may be data to fit which is not “covered” by any component of the Mixture and which is therefore seen as outliers by the model. Even a few points can have a dramatic effect and “pull the whole distribution to the right” and induce a significant bias in the model parameters to be inferred. I was wondering if there was a common strategy for dealing with this scenario. We can of course increase the number of components, but at the expense of a heavier/slower model, and not necessarily a more accurate one. Another option, provided PyMC allows it, would be to trim “in real time” at each iteration of the MCMC procedure the data located too “far” from the last component of the Mixture.

Thank you again for your time and help.

You should be able to initialize a CustomDist inside another in the last version of PyMC (5.6.0) but you could just do it in a single CustomDist as well. Why do you want 2 of them?

I’ll try to answer your other questions sometime next week :wink:

Thank you for the quick reply! No rush, it can wait, please take your time.

((That’s good to know, I haven’t yet updated my PyMC version 5.5 to the latest 5.6 released this week. Just to answer your question, maybe I’m missing something obvious: I don’t see how in a single call to the function CustomDist, I can both provide my custom logp Op, and also apply the function round, so that in the end I have the CustomDist I need that I can pass as components to the Mixture of my model. We’ll see about that next week, thanks again!))

Yes if you have a custom dist that PyMC doesn’t understand natively being rounded, then your approach is correct. I forgot that was your case :slight_smile:

Let me know if it works when you update pymc. I fixed it so you could use CustomDist inside a Mixture but I didn’t test a CustomDist inside another directly.

1 Like

Got it, thanks for the clarification. I will try it out with the new version of PyMC in the next few days, and will let you know if it works as expected.

Hi @ricardoV94, I updated my PyMC version to the latest version 5.6, and tested whether it was now possible to take a CustomDist of another CustomDist. Unfortunately, I get the same error as before:

BlockModelAccessError: Model variables cannot be created in the dist function. Use the .dist API

In the case of my model, the first CustomDist is created by providing the custom functions random, logp, and logcdf. Below is an example of a simplified and reproducible code where this first CustomDist “_cstm_dist1” is replaced by a “dummy” CustomDist used to simulate this distribution by wrapping a common Normal dist. Although not exactly equivalent to my model, the same error is obtained, please see below. The error occurs at the line _cstm_dist = _cstm_dist2(mu, sigma).

import numpy as np
import pymc as pm
import pytensor.tensor as pt

#**"_cstm_dist3" definition
def _rnded_norm_dist(mu, sigma, size=None):
    _dist = pt.round(pm.Normal.dist(mu=mu, sigma=sigma, shape=size))
    return _dist

def _cstm_dist3(mu, sigma, size=None):
    _dist = pm.CustomDist.dist(mu, sigma,
                dist=_rnded_norm_dist,
                shape=size)
    return _dist

#**"_cstm_dist2" definition
def _norm_dist(mu, sigma, size=None):
    _dist = pm.Normal.dist(mu=mu, sigma=sigma, shape=size)
    return _dist

def _cstm_dist1(mu, sigma, size=None):
    _dist = pm.CustomDist.dist(mu, sigma, dist=_norm_dist, shape=size)
    return _dist

def _rnded_cstm_dist2(mu, sigma, size=None):
    _dist = pt.round(_cstm_dist1(mu, sigma, size=size))
    return _dist

def _cstm_dist2(mu, sigma, size=None):
    _dist = pm.CustomDist.dist(mu, sigma, dist=_rnded_cstm_dist2, shape=size)
    return _dist

mu = 0.
sigma = 1.
data = np.array([0.5, 0.5+0.0001])

#**test#1 - "_cstm_dist3"
# WORKS!
_cstm_dist = _cstm_dist3(mu, sigma)
vv = pt.vector(name="v")
_cstm_dist_logp = pm.logp(rv=_cstm_dist, value=vv)
rr3 = np.exp(_cstm_dist_logp.eval({vv: data}))
print(rr3)

#**test#2 - "_cstm_dist2"
# ERROR RAISED: "BlockModelAccessError: Model variables cannot be created
#               "in the dist function. Use the `.dist` API"
_cstm_dist = _cstm_dist2(mu, sigma)
vv = pt.vector(name="v")
_cstm_dist_logp = pm.logp(rv=_cstm_dist, value=vv)
rr2 = np.exp(_cstm_dist_logp.eval({vv: data}))
print(rr2)

I am sorry it’s not the outcome we expected … there may also be a mistake in the logic of my implementation?

Thank you for your time and help.

Thanks, I’ll have a look

Thanks for the investigation @caclav, the nested CustomDist error should be fixed: Allow creating CustomDist inside another CustomDist by ricardoV94 · Pull Request #6822 · pymc-devs/pymc · GitHub

Will try to do a patch release soon :wink:

1 Like

Great, thank you for fixing this so quickly! Glad it works! (I’ll try it with my code later once the patch release is available, it’s not a big deal for my project at the moment.)