Drawing values from custom distribution

deckert · September 14, 2022, 11:38am

Hi everyone,

I am trying to code a model in PyMC3 within which I would like to draw N points from a custom distribution. I would like to do something like

with pm.Model() as model:

      a = pm.Normal('a', mu=1.0, sd=1.0)

      b = pm.Normal('b', mu=1.0, sd=1.0)

      my_dist = MyDist('MyDist', a, b, shape=(nval))

      pred = f(my_dist)

      obs = pm.Poisson('obs', mu=pred, observed=my_values)

      trace = pm.sample()

In my case “MyDist” is a custom distribution which is basically a power law with an exponential cut-off multiplied by an error function, and f(my_dist) is some arithmetic operation with the values drawn from the custom distribution. Basically the line calling MyDist would be equivalent to using some built-in distribution here and drawing from it. I looked into pm.DensityDist and pm.Potential but I couldn’t find a way to use them other than as a custom likelihood (i.e. with observed data). Can someone help me out?

ricardoV94 · September 14, 2022, 11:44am

DensityDist should work fine for unobserved variables (and require nothing different). I do recommend installing the new version of PyMC (v4) where we improved several things related to the DensityDist

The API changed just slightly in that logp is a keyword only argument that must follow the distribution inputs.

import pymc as pm

def logp(value, mu, sigma):
  return pm.logp(pm.Normal.dist(mu, sigma), value)

with pm.Model() as m:
  # x = pm.Normal("x", 0, 1, shape=(2,))  # Completely equivalent
  x = pm.DensityDist("x", 0, 1, logp=logp, shape=(2,))
  y = pm.math.exp(x)  # f(x)
  z = pm.Poisson("z", mu=y, observed=[3, 4])

# Probability of variables at initial point
print(m.point_logps())  # {'x': -1.84, 'z': -6.97}

deckert · September 14, 2022, 11:59am

Thanks very much for the answer @ricardoV94 . I installed pymc4 and tried to follow your example. This is my code (leaving for the time being the last two lines):

def schechter_theano(logL, alpha, n42, a, b):
        
    L = 10. ** logL
    
    schechter = 10 ** n42 * (L / L0) ** (-alpha) * T.exp(- L / Lstar) * (L / Lstar)
    
    selfunc = 0.5 + T.erf(a * (logL - b)) + 0.5
    
    totmod = selfunc * schechter
    
    return totmod

def logp(value, alpha, n42, a, b):
    
    return pm.logp(schechter_theano, value)


with pm.Model() as model:
    
    alpha = pm.Uniform('alpha', lower=0.5, upper=4.0)
    
    n42 = pm.Uniform('n42', lower=0, upper=6)
    
    a = pm.Uniform('a', lower=1, upper=8)
    
    b = pm.Uniform('b', lower=40, upper=43)
    
    obs = pm.DensityDist('obs', alpha, n42, a, b, logp=logp, observed=realiz)
    
    trace = pm.sample(return_inferencedata=True)

But this gives me the following error:

File ~/opt/anaconda3/envs/pymc4/lib/python3.10/site-packages/pymc/distributions/logprob.py:285, in logp(rv, value)
    282 def logp(rv: TensorVariable, value) -> TensorVariable:
    283     """Return the log-probability graph of a Random Variable"""
--> 285     value = at.as_tensor_variable(value, dtype=rv.dtype)
    286     try:
    287         return logp_aeppl(rv, value)

AttributeError: 'function' object has no attribute 'dtype'

What am I doing wrong?

ricardoV94 · September 14, 2022, 12:54pm

You don’t need pm.logp that was just for the example. If you have a Aesara function that returns the logp you can pass that directly as the logp function

deckert · September 14, 2022, 1:32pm

Many thanks again @ricardoV94 . The code now runs properly but does not converge to the correct values. First I generate some random values from my distribution:

from scipy.special import erf

L0 = 1e42
Lstar = 2.59e44

def schechter(logL, alpha, n42, a, b):
    
    L = 10. ** logL
    
    schechter = 10 ** n42 * (L / L0) ** (-alpha) * np.exp(- L / Lstar)  * (L / Lstar)
    
    selfunc = 0.5 + 0.5 * erf(a * (logL - b))
    
    totmod = selfunc * schechter
    
    return totmod

logLt = np.linspace(40., 45., 1000)

alpha = 1.8

n42 = np.log10(6e-3)

a = 5.

b = 41.

mod = schechter(logLt, alpha, n42, a, b)

prob_schechter = mod / np.sum(mod)

nsim = 1000

realiz = np.random.choice(logLt, nsim, p=prob_schechter)

Now here is how I coded it:

import aesara.tensor as T

def schechter_theano(logL, alpha, n42, a, b):
        
    L = 10. ** logL
    
    schechter = 10 ** n42 * (L / L0) ** (-alpha) * T.exp(- L / Lstar) * (L / Lstar)
    
    selfunc = 0.5 + 0.5 * T.erf(a * (logL - b))
    
    totmod = selfunc * schechter
    
    return T.log(totmod)

with pm.Model() as model:
    
    alpha = pm.Uniform('alpha', lower=0.5, upper=4.0)
    
    n42 = pm.Uniform('n42', lower=0, upper=10)
    
    a = pm.Uniform('a', lower=1, upper=8)
    
    b = pm.Uniform('b', lower=40, upper=43)
    
    obs = pm.DensityDist('obs', alpha, n42, a, b, logp=schechter_theano, observed=realiz)
    
    trace = pm.sample()

This code runs fine but the fitted parameter values are way off. In the way I am calling my theano function, where is it specified that the variable is logL and the other arguments are parameters?

ricardoV94 · September 14, 2022, 2:19pm

The value of the distribution (your observations in this case) is always the first argument into the logp function, followed by the parameters in the order they were provided to the DensityDist.

You seem to have some very large values going on in your logp expression, perhaps you are facing numerical issues? Can you try the same approach with a simpler logp and see if it then converges?

deckert · September 14, 2022, 4:07pm

Hi @ricardoV94

I have updated the distribution to work directlly in log, which makes sure there is no overflow. The results are still way off, the distribution is basically unconstrained. I tested with a manual Gaussian distribution and it is working fine.

I prepared a clean notebook with my code, that you can in principle run directly (I had to change the extension to .txt). It is attached. Please let me know if you see where I went wrong.

Custom_Dist.txt (544.0 KB)

ricardoV94 · September 14, 2022, 4:20pm

When you want to share code in the future, the easiest way is to use gist: https://gist.github.com/ or just share a Google colab.

I will try to have a look when I have sometime, but it seems your distribution is still very challenging for NUTS. Do you get divergences or converge warnings?

deckert · September 14, 2022, 6:29pm

Indeed Google Colab looks nice, thanks for the tip. Here is my notebook:

Indeed there are tons of divergences during sampling.

Is there a quick way to draw some values from the distribution to check that they match my input?

Topic		Replies	Views
Custom Distribution with pm.CustomDist v5 modeling	4	1447	June 27, 2023
Custom distribution by pm.DensityDist not coincide with built-in distribution in PyMC4 v5 modeling	2	379	July 27, 2022
Discrete DensityDist v5 modeling	5	445	January 7, 2023
Need help drawing a RV from a custom prior function Questions	6	2089	September 17, 2019
Likelihood Specification and DensityDist Questions doc	3	981	October 5, 2021

Drawing values from custom distribution

Related Topics