Sampling fails after upgrading to v4


I was previously using pymc3 v3.11.2 and decided to upgrade to v4. After upgrading and making only the necessary changes, the NUTS sampler fails where before it didn’t, with the error message:

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV Uptime_interval__.ravel()[[0]] is zero.
The derivative of RV Downtime_interval__.ravel()[[0]] is zero.
The derivative of RV L.ravel()[] is zero.

I know this is notoriously difficult to debug and I’m not really sure what could be causing it, I had encountered it before with pymc3, but the current version of my model runs fine on pymc3.
The only changes I made were renaming theano to aesara and changing how I pass variables to pm.DensityDist.

Does anyone know what change in pymc could cause this?

It would be quite a bit of extra work to post a working version of my model here, so I would prefer not to do that.
Some details that I think may be relevant:
Using the NUTS sampler, both init="jitter+adapt_diag" and init="adapt_diag" run into this problem.
I had seen on other posts that this sometimes solves the issue.
The model is essentially a product of pm.Normal distributions.

I’m thankful for any insights on how to solve this problem.

You’ll have to share your model for us to be able to help.

I don’t think I can post a fully working version here, because it requires quite a bit of data stored in other files.
But here are the essentials:

s = {
    'draws': 10000,
    'tune': 5000,
    'chains': 2,
    'cores': 1,
ev_B = {
    'A': 0.16128047381554936,
    'B': 0.005666983263328754,
    'C': 8.388210965590476,
    'D': 0.004900584215030823,
    'E': 0.903214838018954
ids = ['A', 'B', 'C', 'D', 'E']
with pm.Model():
    ## load_models returns a dictionary with `ids` as keys
    ## and objects as items. Each object loads data from a file
    ## and has a `predict` method.
    model_dict = load_models(ids, filepaths)
    priors = [
        pm.Uniform('Uptime', lower=100, upper=2000),
        pm.Uniform('Downtime', lower=0, upper=10000),
    evidence = [ev_B[i] for i in ids]
    sigma = [0.1 * ev_B[i] for i in ids]
    models = [model_dict[i].predict(priors) for i in ids]
    distrib = [pm.Normal(i, mu=m, sigma=s, observed=o)
               for i, m, s, o in zip(ids, models, sigma, evidence)
    def joint(*args):
        return np.product([*args])
    l = pm.DensityDist('L', *distrib, logp=joint)

The sampling actually works, if I use tune = 50 and draws=100. Is there some way to gain more insight into which specific sample is causing it to crash?