The sinh-arcsinh normal distribution

Hi,

I was wondering if PyMC is planning on adding new distributions. I’m particularly interested in a continuous distribution which is a generalized normal distribution that has parameters for both skewness and kurtosis.

I am particularly thinking of the SHASH family (check this). So my first question is whether there are plans to add such distributions.

And my second question is a follow-up of the first one. I have heard that one could use any arbitrary distributions by creating new classes. Could anyone explain what functions/methods are required to define a new distribution (e.g. I can think of dist() and logp(), but am not sure about the complete set).

I should also note that I would like the distribution to be compatible with PyMC’s implementation of ADVI, so if ADVI requires any particular additional methods please also let me know.

Thanks,
Sina

P.S. to provide more context, I’m thinking of implementing a type of non-gaussian hierarchical bayesian regression similar to what’s described here

We are not planning on adding more non-basic distributions. Instead we are focusing on allowing PyMC to infer the logprob for different transformations of simple distributions.

The goal is to make use of CustomDist together with PyMC’s ability to infer logprob. Right now we support sinh operations but not yet arcsinh (you can track progress in this issue: Implement more types of measurable elemwise transformations · Issue #6631 · pymc-devs/pymc · GitHub)

If arcsinh was already supported all you would need would be the dist function:

import pymc
import pytensor.tensor as pt

def sinh_arcsinh_dist(mu, sigma, nu, tau, size):
  Z = pm.Normal.dist(shape=size)
  X = mu + sigma * pt.sinh((pt.arcsinh(Z) + nu) / tau)
  return X

with pm.Model() as m:
  # Parameter priors
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  nu = pm.Normal("nu")
  tau = pm.HalfNormal("tau")

  pm.CustomDist("sinh_arcsinh", mu, sigma, nu, tau, dist=sinh_arcsinh_dist, shape=(5, 3))

m.point_logps()  
# RuntimeError: The logprob terms of the following value variables could not be derived: {sinh_arcsinh}

Logp and Logcdf would be automatically inferred by PyMC without the user having to specify these functions. For now they would need to provide them to CustomDist.

To illustrate this works you could replace the arcsinh by a tanh.

import pymc
import pytensor.tensor as pt

def sinh_tanh_dist(mu, sigma, nu, tau, size):
  Z = pm.Normal.dist(shape=size)
  X = mu + sigma * pt.sinh((pt.tanh(Z) + nu) / tau)
  return X

with pm.Model() as m:

  # Parameter priors
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  nu = pm.Normal("nu")
  tau = pm.HalfNormal("tau")

  pm.CustomDist("sinh_tanh", mu, sigma, nu, tau, dist=sinh_tanh_dist, shape=(5, 3))

m.point_logps()
 # {'mu': -0.92, 'sigma': -0.73, 'nu': -0.92, 'tau': -0.73, 'sinh_tanh': -13.78}
1 Like

If you don’t feel like waiting, the fastest way to get going is to implement the CustomDist and provide the logp function. The documentation includes examples of this: pymc.CustomDist — PyMC dev documentation

They should work just fine with ADVI

1 Like

Thanks for this very helpful suggestion, I wish there were more tutorials/documentations on this topic. It would be really neat to be able to work with any arbitrary distribution right away.

So if I understand correctly I need to provide a closed form solution for the logp. Are there any other functions that I should provide other than the logp? looking at the link you shared, do I also need to provide the other optional arguments e.g. random, or moment? Which arguments are particularly needed for the distribution to work with ADVI?

You only need the logp for ADVI.

If you want to take prior or posterior predictive draws you also need either the random (numpy function) or dist (pytensor graph with PyMC variables). PyMC can use either of those two to take random draws.

We will definitely add some documentation on these arbitrary distributions. We have been first focusing on increasing the type of distributions you can define.