Custom Distribution - Multivariate SkewNormal

I am trying to implement a multivariate skewnormal distribution as a custom distribution in PyMC. I am going based on this implemenation of it: A Python Implementation of the Multivariate Skew Normal

My question is what is the best way to implement this? I have tried to use it as a custom distribution but have run into numerous errors. Would it be better to extend the MvNormal class or use CustomDist?

CustomDist is the best route

Does this implementation make sense?


class multivariate_skewnorm(multi_rv_generic):

    def __init__(self, mu, shape, cov=None):
        self.dim   = len(shape)
        self.shape = np.asarray(shape)
        self.mean  = np.asarray(mu)
        self.cov   = np.eye(self.dim) if cov is None else np.asarray(cov)

    def pdf(self, x):
        return np.exp(self.logpdf(x))

    def logpdf(self, x):
        x    = mvn._process_quantiles(value, self.dim)
        pdf  = mvn(self.mean, self.cov).logpdf(x)
        cdf  = norm(0, 1).logcdf(np.dot(x, self.shape))
        return _squeeze_output(np.log(2) + pdf + cdf)

    def rvs(self, size=1, random_state=43):
        random_state = self._get_random_state(random_state)
        aCa      = self.shape @ self.cov @ self.shape
        delta    = (1 / np.sqrt(1 + aCa)) * self.cov @ self.shape
        cov_star = np.block([[np.ones(1),     delta],
                            [delta[:, None], self.cov]])
        if size == None:
            size = 1
        else:
            size = size[0]
        x        = random_state.multivariate_normal(np.zeros(self.dim+1), cov_star, size=size)
        x0, x1   = x[:, 0], x[:, 1:]
        inds     = x0 <= 0
        x1[inds] = -1 * x1[inds]
        return x1


@as_op(itypes=[pt.dmatrix, pt.dmatrix, pt.dmatrix, pt.dvector], otypes=[pt.dscalar])
def logp(value, mu, cov, alpha):
    return multivariate_skewnorm(mu, alpha, cov).logpdf(value)

def random(mu, cov, alpha, rng=None, size=None):
    return multivariate_skewnorm(mu, alpha, cov).rvs(size=size, random_state=23)

~That looks like the old pymc3 API, I suggest you use PyMC. CustomDist should require much less code: https://www.pymc.io/projects/docs/en/latest/api/distributions/generated/pymc.CustomDist.html~

Nevermind, I got confused. You’re just showing the scipy implementation.

One thing to note is that you won’t be able to use NUTS because you’re using a custom Op. It would be better if you could define the logp using PyTensor operations, in which case you would get gradients for free, but those percentiles may be tricky.

Gotcha.

I am probably in over my head if I don’t fully understand the math behind the MV SkewNormal then?

Yeah could be tricky, special because it looks like it may not have a closed form solution so you need to do some approximation

Any resources to look at if I wanted to take a stab at it?

Actually the function _process_quantiles is not doing anything fancy so you should be able to implement a logp function for CustomDist without too much trouble.

You can get the MvNormal logp and Univariate Normal logcdf from pymc via pm.logp(pm.MvNormal.dist(mu, cov), x) and pm.logcdf(pm.Normal.dist(0, 1), x)

I believe this is the correct logp:

def logp(value, mu, cov, alpha):
    return pm.math.log(2) + pm.logp(pm.MvNormal.dist(mu, cov), value) + pm.logcdf(pm.Normal.dist(0, 1), pm.math.dot(value, alpha))

What would the random function be? Would I just convert the rvs function above to use pm.MvNormal.dist?

edit: these are my attempts at logp and random. I am running into a shape mismatch error somewhere:

def logp(value, mu, cov, alpha):
    mv_normal = pm.MvNormal.dist(mu=mu, cov=cov)
    
    # Calculate the log probability of the value under the multivariate normal distribution
    log_prob_mv_normal = pm.logp(mv_normal, value - mu)
    
    # Calculate omega inverse
    omega_inv = pt.nlinalg.matrix_inverse(cov)
    
    # Calculate the argument for the standard normal CDF
    arg_cdf = pm.math.dot(pm.math.dot(alpha.T, omega_inv), (value - mu))
    
    # Instantiate the standard normal distribution for the CDF
    std_normal = pm.Normal.dist(mu=0, sigma=1)
    
    # Calculate the log CDF of the argument under the standard normal distribution
    log_cdf_std_normal = pm.logcdf(std_normal, arg_cdf)
    
    # Return the log of the skew-normal density
    return pm.math.log(2) + log_prob_mv_normal + log_cdf_std_normal
def random(mu, cov, alpha, rng=None, size=None):
    aCa = alpha @ cov @ alpha
    delta = (1/ pm.math.sqrt(1 + aCa)) * cov @ alpha
    a = pt.concatenate((pt.ones(1), delta)).reshape((1, -1))
    b = pt.concatenate((delta[:, None], cov), axis=1)
    cov_star = pt.concatenate((a, b))
    x = pm.MvNormal.dist(pt.zeros(alpha.shape[0]+1), cov=cov_star, shape=(alpha.shape[0]+1, -1))
    x0 = x[:, 0]
    x1 = x[:, 1:]
    inds = x0 <= 0
    new_x1 = set_subtensor(x1[inds], -1*x1[inds])
    new_x1 = new_x1 + mu
    return new_x1.eval()

Should the logp function return the log prob of each row or for the entire dataset? When using it in the CustomDist, it passes the entire dataset into the value argument but I am unsure if it handles the summation on the backend or if I should be summing all the log probs.

I believe the summation is handled my PyMC.

The random method of CustomDist uses python code so no need to convert it to PyTensor. You can just use what the blogpost does.

There are some examples at the end of the CustomDist documentation.

For the shape error, can you provide a small reproducible snippet that triggers it? The smaller the better

I figured out the shape error. However I am now getting this warning and the sampling is very slow:

/usr/local/lib/python3.10/dist-packages/pytensor/tensor/slinalg.py:391: LinAlgWarning: Ill-conditioned matrix (rcond=1.5831e-19): result may not be accurate.

I tried running with numpyro on a GPU in Colab but it just ran through the sampling without accepting any steps. I am guessing there is an issue with the compatibility of custom distributions and numpyro?

The error is saying that cov_star is not full rank, so the matrix inverse is breaking. You could try adding some small numbers to the diagonal to stabilize it?

How do I go about choosing what to add? I tested it with 0.1 but it is still showing the same warning.

To debug, have you checked if unit covariance (without having to invert) is working correctly already?

1 Like

The warning does not show up there. It still samples slowly but I should note that it is running on a 24x24 covariance matrix so maybe this is to be expected? The dataset is only ~4000 rows however.

I tested it on mock data generated from my random function and it still gets the same warning and runs at the same speed.

Here is the mock model and data as well as my updated logp and random functions if anyone wants to test it.