Transforming a general distribution to that of a power

Hi,

I’m quite new to pymc (I’m using v5.6.1), so my question may not be a good one…

I want to use LKJCholeskyCov with the sd_dist parameter, which requires a distribution of the standard deviation. I want to use InverseGamma for the variance, so I need to get the “square-root of a distribution”.

So I tried to write a general transformation of a (continuous, non-negative) distribution in_dist, to an out_dist, such that for some c != 0, if X ~ in_dist, then Y = X**c ~ out_dist

I found CustomDist, and tried the following:

import pymc

class PowerTransformed:
    """
    Power-transformed distribution: Y = X**c, X ~ in_dist
    in_dist: a PyMC distribution instance
    c: power (any nonzero real)
    """
    def __init__(self, in_dist, c: float):
        assert c != 0.0
        self.in_dist = in_dist
        self.c = c

    def logp(self, y: float):
        assert y >= 0.0
        c = self.c
        x = pymc.math.pow(y, 1.0 / c)
        log_jac = -pymc.math.log(pymc.math.abs(c)) + (1.0 / c - 1) * pymc.math.log(y)
        return self.in_dist.logp(x) + log_jac

    def random(self, size):
        x = pymc.draw(self.in_dist, draws=size)
        return x ** self.c

    def dist(self, **kwargs):
        return pymc.CustomDist(
            name=f"X**{self.c})",
            logp=self.logp,
            #random=lambda rng, size=None: self.random(size=size),
            shape=kwargs.get("shape", ()),
            dtype=kwargs.get("dtype", "float64"),
        )

The “random=…” in CustomDist is commented-out because it raises an error:

BlockModelAccessError: Model variables cannot be created in the random function. Use the .dist API to create such variables.

But I don’t mange to use the “dist=” parameter instead.

I hope that’s not too much of a mess.
How can this (or something else) be made to work?

You want to call CustomDist.dist from the dist method, and not use the dist kwarg…

However you should actually use it. You get everything for free by implementing a CustomDist with a dist kwarg instead of with a logp and random kwargs. Probably sounds confusing so let me show with one less layer of abstraction.

(untested code)

import pymc as pm

def square_invgamma_dist(mu, sigma, size):
  return pm.InverseGamma.dist(mu=mu, sigma=sigma, size=size) ** 2

with pm.Model() as m:

  mu = 2
  sigma = 1
  sd_dist = pm.CustomDist.dist(mu, sigma, dist=square_invgamma_dist)
  pm.LKJCholeskyCov("lkj", n=3, eta=1, sd_dist=sd_didt)

The random method is trivially replaced because dist is already a symbolic random object. PyMC can then use this to infer the logp with the change of variables for the squared inversegamma automatically.

That relies on pymc logp inference which you can test directly by calling:

pm.logp(
  pm.InverseGamma.dist(mu=1, sigma=2) ** 2,
  3.5,
).eval()

Hi Ricardo,

thanks - your code essentially works for me.
I include my code checking your solution, for future reference.

Still, I didn’t manage to do it retaining the “PowerTransformed” layer of abstraction…

import numpy as np
import pymc

def sqrt_invgamma_dist(alpha, beta, size=None):
  return pymc.InverseGamma.dist(alpha=alpha, beta=beta, size=size) ** 0.5

# Check we get the covariance matrix with the desired variances
with pymc.Model():
    n = 4
    ig_alpha = 3.5
    ig_beta = 50

    # distribution of standard deviations as square roots of inverse-gamma variances
    sd_dist = pymc.CustomDist.dist(ig_alpha, ig_beta, dist=sqrt_invgamma_dist, shape=n) 

    chol, corr, sigmas = pymc.LKJCholeskyCov(
      "chol_cov",
      n=n, 
      eta=1, 
      sd_dist=sd_dist,
      compute_corr=True,
      store_in_trace=True,
    )
    Sigma = pymc.Deterministic("Sigma", chol @ chol.T)    # distribution of covariance matrix

    res = pymc.draw(Sigma, draws=100_000, random_seed=42) # sample of covariance matrices
    vars = res[:, np.arange(n), np.arange(n)].flatten()   # sample of variances, should ~ InverseGamma
    print(f"InverseGamma(alpha={ig_alpha}, beta={ig_beta}):")
    print(f"expected variances: mean = {ig_beta / (ig_alpha - 1):.3f}, std = {np.sqrt(ig_beta**2 / ((ig_alpha - 1)**2 * (ig_alpha - 2))):.3f}")
    print(f"empirical variances : len = {len(vars)} , mean = {vars.mean():.3f}, std = {vars.std():.3f}")

# InverseGamma(alpha=3.5, beta=50):
# expected variances: mean = 20.000, std = 16.330
# empirical variances : len = 400000 , mean = 19.982, std = 16.227

If you like the PowerTransformed layer, you can make a function that takes a pymc distribution and returns a customdist:

import pymc as pm

def PowerTransformed(name, in_dist, c, **kwargs):
    # Mini pytensor lesson:
    # All symbolic variables are the outputs of a computational graph (unless they are root nodes!)
    # Each output (that is, thing you compute) is a the result of an Operation, called an "Op".
    # "Op"s, along with inputs and outputs that go into that op, are owned by "Apply" nodes. 
    # Nodes know what are the inputs and outputs to the Op. 
    # in_dist is itself an output. We can ask for the owner, which is an Apply. The Apply, in turn, knows
    # what is the Op (the function that created in_dist in the first place), as well as the inputs
    # that were used to compute it (the parameter you passed when you make it, plus some internal metadata stuff)
    
    def in_dist_power(*args, **kwargs):
        # Get the Op and use it to build a new distribution of the same type (with whatever new args/kwargs),
        # but also raise it to c
        return in_dist.owner.op(*args, **kwargs) ** c
    
    # Get the inputs to the Op that computed `in_dist`. In pytensor random variables, the first two inputs are
    # always a random generator and a size. We discard these -- CustomDist will make new ones. 
    rng, size, *inputs = in_dist.owner.inputs

    # This is totally unnecessary, but illustrates what the owner (the Apply) does: it holds 
    # *all* information about a specific piece of computation.
    # An output is itself a part of that computation, so we get loops of ownership like this:
    assert in_dist in in_dist.owner.outputs  # True

    
    # Make a new PyMC variable
    return pm.CustomDist(name, 
                         *inputs,
                         dist=in_dist_power,
                         **kwargs)

Usage:

with pm.Model() as m:
    alpha = pm.Exponential('alpha')
    beta = pm.Exponential('beta')
    
    ig = pm.InverseGamma.dist(alpha, beta)
    
    x = PowerTransformed('sqrt_inv_gamma', ig, c=0.5)
    
    # Since we forwarded the arguments inside PowerTransformed, the resulting RV will still have 
    # whatever we want as inputs. So `x` still depends on alpha and beta!
    assert all(rv in x.owner.inputs for rv in [alpha, beta])
    
    # Sample the prior using NUTS to show that we can evalute the logp. 
    # Obviously it would be easier to just call sample_prior_predictive
    idata = pm.sample()

For more info about how all this works under the hood, you can read our docs about graph structures.