Dynamic shaping, "round" function, JAX, and a "few" more questions

@ricardoV94, @jessegrabowski, Thank you so much for all the information and clarifications provided in response to my questions, along with the code samples. It is very, very helpful. PyMC’s ability to automatically infer the logp, even for complex models like time series, is an extremely powerful and exciting feature! Looking forward to learning more about the ongoing developments such as automatic discrete marginalization and derivation of arbitrary censoring logp!

Now that I have a better understanding of the inner workings of PyTensor/PyMC, I would like to put it into practice to implement my first model, what I’ve been trying to do for the last few days, sorry for the late reply!

You are right, it is not stated in this example that it is not possible. I suppose I concluded too quickly that it was a fundamental limitation of PyMC, given that I didn’t see any reason to set the maximum number of CPU cores to only 1, and also I must have misread the issue reported on this github post (New `mp_ctx` defaults in 4.3.0+ (#6218) for M1 Macs could break parallel sampling for M1 Mac users using JAX · Issue #6362 · pymc-devs/pymc · GitHub)!

Below is a draft of my PyMC model (here using “rv_register”). It doesn’t work as it is due to the issues I have outlined below along with the solutions I have in mind to address them. Any comments and help would be greatly appreciated!

import pytensor.tensor as pt
import pymc as pm

data = np.array([1., 2., 3.])
with pm.Model() as mymodel:
    
    #**SOME CONFIG PARAMS
    avrg_rn_prior = 8.
    chrggn = 3.6
    mu_rn = 0.
    max_avlchgn_drws = 30
    sigma_avlchgn_scl = 10.
    
    #**PHOTON-ELECTRONS GENERATION
    phe_rate = pm.Gamma(name="phe_rate", mu=3., sigma=3.**0.5)
    n_phe = pm.Poisson.dist(name="n_phe", mu=phe_rate)
    
    #**AVALANCHE GAIN GENERATION
    mu_avlch_gain = pm.Uniform(name="mu_avlch_gain", lower=50., upper=100.)
    sigma_avlch_gain = (pm.Exponential(name="sigma_avlch_gain", lam=1.) * 
                            sigma_avlchgn_scl)
    
    # 1st option - using "shape" parameter.
    # avlch_gains = pm.Gamma.dist(name="avlch_gains",
    #                         mu=mu_avlch_gain, sigma=sigma_avlch_gain,
    #                         shape=(max_avlchgn_drws,))
    # Case "n_phe = 0" not handled explicitly, but "pm.draw" still works and
    # returns 0 as required.
    # Raises "ShapeError: Dimensionality of data and RV don't match.
    #         (actual 1 != expected 0)" if "keepdims=False" (default).
    # n_chrgs = pt.sum(avlch_gains[:n_phe], keepdims=True) * chrggn
    
    # 2nd option - using properties (summation + scaling) of the Gamma law.
    # Case "n_phe = 0" handled explicitly using a "switch" statement, but
    # still get "ValueError: Domain error in arguments. The `scale` parameter
    #           "must be positive [..]".
    avlch_gains = pt.switch(n_phe>0.,
        pm.Gamma.dist(name="avlch_gains",
                      mu=mu_avlch_gain*n_phe*chrggn,
                      sigma=sigma_avlch_gain*n_phe**0.5*chrggn,
                      shape=(1,)), 0.)
    n_chrgs = avlch_gains
    
    #**READOUT NOISE GENERATION
    rn = pm.HalfNormal(name="rn", sigma=1) * avrg_rn_prior
    rd_noise = pm.Normal.dist(name="rd_noise", mu=mu_rn, sigma=rn)
    
    #**OUTPUT SIGNAL GENERATION
    gen_sgnl = pt.round(n_chrgs + rd_noise)
    
    #**CUSTOM DIST REGISTRATION
    mymodel.register_rv(gen_sgnl, "gen_sgnl", observed=data)
    
    # llike = pm.CustomDist("llike", phe_rate, mu_avlch_gain, sigma_avlch_gain,
    #             rn, dist=custom_dist, observed=data)

1- The parameters of the Gamma distribution depend on the RV “n_phe” following a Poisson distribution. In other words, the RV “avlch_gains” is defined by a compound distribution. PyMC can’t marginalise on the discrete RV “n_phe” (at least not yet!), so can’t automatically derive the logp of “avlch_gains”. I am thinking of addressing this challenge using a mixture distribution (“pymc.Mixture”). The weights will be given by the PMF of the Poisson distribution (truncated for “n_phe” high enough, here >~10). In addition, this will have the advantage of fixing this error happening in the “switch” function for some unknown reason when “n_phe=0”.

Q1- Could PyMC compute the logp of the following term “n_chrgs = pt.sum(avlch_gains[:n_phe])” if “n_phe” was drawn from a prior dist (=a parameter of the model to be determined)? (can’t wait for this video to be posted to learn more about that ([PyMCon Web Series 07] Automatic Probability with PyMC (June 28, 2023) (Ricardo Vieira)).

2- The RV “gen_sgnl” results from the sum of a Gaussian + Gamma RVs - omitting for the moment the “round” operation. PyMC doesn’t currently know how to derive the logp of this sum, but maybe one day … (ENH: Add an op corresponding to `scipy.integrate.quad` · Issue #290 · pymc-devs/pytensor · GitHub). I think I have two options:

a- Compute by convolving the 2 PDFs, either symbolically (using Sympy, or Mathematica, …), or directly numerically (using Scipy) the logp of this sum, as well as its derivatives, then create a custom RV “Op” to encapsulate the derived formulas coded in Scipy/Numpy. This sounds a bit tedious …

b- In fact, the logp of this sum can be calculated symbolically (for example, see What is the convolution of a normal distribution with a gamma distribution? - Cross Validated). So I could implement the logp directly expressed in PyTensor operators, and provide it to ‘CustomDist’, along with the required “random” function. Nothing new so far! Two questions though:

Q2- Will PyTensor be able to derive the logp function as a function of its parameters, despite its complexity?

Q3- All the operators involved in the logp should already be available in PyTensor, except for the “hyp1f1” function (=“confluent hypergeometric function”). But, the “hyp2f1” function is already implemented in PyTensor, and is very close to “hyp1f1”, also available in Scipy. Would it work if I copied and modified the “hyp2f1” function in my code? (this is a detail, I think the following comment should rather be “d/dc[A(k)/B(k)] = - A(k)/B(k) / (c + k)”, line #1630 of the file “[…]/scalar/math.py”. the code line #1702 looks correct though).

Q4- Out of curiosity, the “hyp2f1” function is only executable in pure Python (as it is provided by Scipy), but can its gradient be converted into another backend (like JAX) via this new “ScalarLoop” operator? So assuming that the “hyp2f1_grad” function can be fully recognized as a PyTensor graph.

Thank you again for your help and your time!