Do posterior predictive with customDist function with error message (random() missing 1 required...)

Hello all,

I am writing a code to do posterior predictive with a customDist function. My code is listed below. The error message there is returned is “random() missing 1 required positional argument: ‘beta’”. If my code is just doing sampling (pm.sample), it is fine. The error message came up when I did posterior predictive. I do not know where to add the positional argument accordingly. Thank you very much for your kind reply.

with pm.Model(coords=coords) as model:

        x = pm.MutableData("x", x_train)
        rloss = pm.MutableData("rloss", rloss_train)
        
        # Priors
        a = pm.Normal("a", mu=0, sigma=1, dims="coeffs")
        b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")
    
        # Linear model
        a1 = pm.math.dot(x, a)
        b1 = pm.math.dot(x, b)
    
        # Link function
        mu = pm.Deterministic("mu", pm.math.invlogit(a1))
        phi = pm.Deterministic("phi", pm.math.exp(b1))
        alpha = pm.Deterministic("alpha", mu * phi)
        beta = pm.Deterministic("beta", (1 - mu) * phi)


        def logp_beta(obsLoss, alpha, beta): 
            return pm.logp(pm.Beta.dist(alpha=alpha, beta=beta), obsLoss)


        def random(obsLoss, alpha, beta, rng= None, size=None) :
            return pm.Beta.dist(alpha, beta).random(size=size)


        # Likelihood using the custom distribution
        rloss_custom = pm.CustomDist('rloss_custom', alpha, beta, logp=logp_beta, random=random, observed=rloss)

    with model:
        step = pm.Metropolis()
        idata = pm.sample(1200, step=step, chains=4)


    with model:
        pm.set_data({"x": x_test}, coords = {"coeffs": labels})
        pp = pm.sample_posterior_predictive(idata, predictions=True).predictions

If you check the docstring for pm.CustomDist, it says the signature for the random function should be random(*dist_params, rng=None, size=None). So basically you don’t need obsLoss as an input there.

Thank you so much. I removed obsLoss in the def random. Now there is another error came up “The rv.random() method was removed. Instead use pm.draw(rv).`”

Any quick thought?

Thank you

Basically do what it says? Since you can’t use pm.Beta().random, instead use pm.draw(pm.Beta.dist(...). But this will be slow; you shouldn’t do it either. The random function isn’t a pytensor graph, so you can just directly use scipy.stats.beta(...).rvs(size)

1 Like

Or you can use the new dist kwarg and then neither logp or random are needed, the docs have some code examples

1 Like

I would really appreciate it if you could show me the link to the document?
Much appreciated

Google should always find the page you need: pymc.CustomDist — PyMC 5.14.0 documentation

1 Like

Thank you so much. I have read it before and I will read it again.

Thank you for your help. I have changed my code as follows. I use scipy.stats.beta(alpha, beta).rvs(size) and the old problem is solved. The code does posterior_predictive. However, the error message returned is “ValueError: size does not match the broadcast shape of the parameters. (499,), (499,), (215,)”. The number of x_train is 499 and the number of x_test is 215. Since I use set_data and it should be 215. Not sure why 499 appears. Should I run the posterior_predictive with a new model? Thank you very much.

with pm.Model(coords=coords) as model:
    # Data containers
    x = pm.MutableData("x", x_train)
    rloss = pm.MutableData("rloss", rloss_train)
    
    # Priors
    a = pm.Normal("a", mu=0, sigma=1, dims="coeffs")
    b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")

    # Linear model
    a1 = pm.math.dot(x, a)
    b1 = pm.math.dot(x, b)

    # Link function
    mu = pm.Deterministic("mu", pm.math.invlogit(a1))
    phi = pm.Deterministic("phi", pm.math.exp(b1))
    alpha = pm.Deterministic("alpha", mu * phi)
    beta = pm.Deterministic("beta", (1 - mu) * phi)
    
    
    def logp_beta(obsLoss, alpha, beta): 
        return pm.logp(pm.Beta.dist(alpha=alpha, beta=beta), obsLoss)
    
    def random(alpha, beta, rng= None, size=None) :
        return scipy.stats.beta(alpha, beta).rvs(size)

    # Likelihood using the custom distribution
    rloss_custom = pm.CustomDist('rloss_custom', alpha, beta, logp=logp_beta, random=random, observed=rloss)
    
with model:
    step = pm.Metropolis()
    idata = pm.sample(1200, step=step, chains=4)

az.plot_posterior(idata, var_names=["a", "b"])
plt.show()


# Generate posterior predictive samples

with model:
    pm.set_data({"x": x_test}, coords = {"coeffs": labels})
    pp = pm.sample_posterior_predictive(idata, predictions=True).predictions

This is a common “gotcha” when using pm.set_data. Although the rloss data is not used in posterior predictive sampling, it’s still being used to infer the shape of the outputs, which is then causing an error. You can prevent this by explicitly linking the shape of rloss_custom to the shape of x like this:

    rloss_custom = pm.CustomDist('rloss_custom', alpha, beta, logp=logp_beta, random=random, observed=rloss, shape=x.shape[0])

PyMC will then know to automatically update that shape when x changes. You can see the data container example notebook for more discussion on this point.