M-H algorithm with custom distribution with scipy

I’m trying to launch a Metropolis-Hastings likelihood with a custom distribution function as :

        dist = getattr(scipy.stats, dist_func)
        y_obs = dist.rvs(loc=loc, scale=scale, *arg, size = 10)  

        def custom(sigma):
            y = dist.rvs(loc=loc, scale=scale, *arg, size = 10)  + (np.random.uniform(-1, 1, 10) * sigma)   
            return y

        with pm.Model() as model:
            sigma = pm.Uniform("sigma", 0, 90, testval=45)
            custom = pm.DensityDist('custom', sigma=sigma, observed=y_obs)  # likelihood

            trace = pm.sample(
                draws=2000,
                tune=5000,
                chains=2,
                cores=1,
                target_accept=0.9,
                return_inferencedata=True,
                init="adapt_full",
            )

However, it tells me that I do not have logp function. I’m very new and I guess that I’m missing something huge. May I have some help ?

Note that :

dist.rvs

is not defined, as the theorical law changes through each loop (not mentionned above)

Hello! The error means that you don’t have a function that returns the likelihood of the data for a given sigma value. Looking at custom, it seems like custom generates a bunch of samples from some probability distribution. But instead of samples, we need a density or likelihood. One analogy that might be helpful - if we are working with the scipy.stats module, it has a normal distribution with two methods: this one generates samples.

stats.norm.rvs()

This one returns probabilities.

stats.norm.pdf()

It’s only the latter type of function that would work here.

So my recommendation is to figure out what density formula or likelihood formula goes with the custom distribution you are trying to build. If that doesn’t exist or is unknown, you might look into using simulator distribution in pymc which measures the distance between your data and the rvs samples.

Oh one other things - if you want to use metropolis-hastings, you’ll need to tell the pm.sample function that explicitly with step = pm.Metropolis

3 Likes

Hi ! Thanks a lot for your time.
Ok I get it, I think. However, is there a way to automatically get the function that returns the likelihood of the data for a given sigma value with scipy.stats ?
For example, one of my dist_func is gausshyper, is there a way to have density from dist_func variable, and where to include it, as a logp(x) in custom ? Because I guess replacing rvs by pdf is not as simple ?

More specifically, where does x parameter come from ?

rvs(a, b, c, z, loc=0, scale=1, size=1, random_state=None) Random variates.
pdf(x, a, b, c, z, loc=0, scale=1) Probability density function.

Thanks also for the pm.Metropolis : >

  step = pm.Metropolis()
  trace = pm.sample(step=step,
      draws=2000,
      tune=5000,
      chains=2,
      cores=1,
      target_accept=0.9,
      return_inferencedata=True,
      init="adapt_full",
  )

PyMC can give you the logp of single variables and also some more complicated expressions.

CustomDist can even figure it out if you provide a dist function which returns PyMC variables that represent the random generation process.

import pymc as pm

def dist(mu, sigma, size):
  return pm.math.exp(pm.Normal.dist(mu, size=size) * sigma)

with pm.Model() as m:
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  x = pm.CustomDist("x", mu, sigma, dist=dist, observed=2.5)

print(m.point_logps())  # {'mu': -0.92, 'sigma': -0.73, 'x': -2.26}

However your CustomDist involves a convolution of two variables: what you called dist and a uniform.
PyMC cannot automatically infer that logp (is there a general solution)?

import pymc as pm

def dist(mu, sigma, size):
  return pm.Normal.dist(mu, size=size) + pm.Uniform.dist(-1, 1, size=size) * sigma

with pm.Model() as m:
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  x = pm.CustomDist("x", mu, sigma, dist=dist, observed=2.5)

print(m.point_logps())  # RuntimeError: The logprob terms of the following value variables could not be derived: {x{2.5}}

I guess you are trying to do something like: probability - Finding convolution of exponential and uniform distribution- how to set integral limits? - Mathematics Stack Exchange

If you know what your logp should be you can implement a custom logp function.
PyMC can give you logp and logcdf expressions for any vanilla distributions

import pymc as pm

def dist(mu, sigma, size):
  return pm.Normal.dist(mu, size=size) + pm.Uniform.dist(-1, 1, size=size) * sigma

def wrong_logp(value, mu, sigma):
  # Just a random example that does not mean anything mathematically!
  norm_logp = pm.logp(pm.Normal.dist(mu, sigma), value)
  uniform_logcdf = pm.logcdf(pm.Uniform.dist(-1, 1) * sigma, value / 2)
  return norm_logp + uniform_logcdf
  
with pm.Model() as m:
  mu = pm.Normal("mu")
  sigma = pm.HalfNormal("sigma")
  x = pm.CustomDist("x", mu, sigma, dist=dist, logp=wrong_logp, observed=2.5)

print(m.point_logps())  # {'mu': -0.92, 'sigma': -0.73, 'x': -4.04}

There is a PyMC bug preventing the last example from running (I’ll fix it ASAP), but you get the idea.