Custom sampling step

Hello pymc3 community!

I research analytical methods for protein structure modeling and have recently used pymc3 to estimate uncertainty in experimentally data obtained modeling parameters. Now, I want to make the model that we are using more general and have a question (actually two) concerning a custom sampling step and was hoping maybe some of you could provide some input.

Okay, here is the problem: The whole model is pretty complex and has quite a lot of parameters, so I don’t really have a minimal example to post right now. But basically what I want to do is to have a custom sampling step for parameter (vector) P. P is very similar to a multivariate Gaussian but must be constrained to nonnegative values. We implemented this step (but not the full model) in MATLAB in previous works (for those curious, here is the link) and want to port this to python and pymc3 for obvious reasons.

Here are my two questions

  • I am struggling a little with telling pymc3 to do a custom step for one of the parameters while using NUTS for all the others. I found a few examples online, but those are either old and don’t work with the current pymc3 version or are not what I want to do.

  • What is the best approach for the custom draw for P. It is quite a bit of code that involves matrix inverses and a fast nonnegative least squares solution (fnnls), and also includes a while condition. Converting the sampler into python code was pretty straight forward, but I am having issues with making it compatible with theano, in particular the fnnls part with the while loop. Is there an easier way of doing this? This draw also depends on a hyperparameter δ that is sampled via NUTS.

On a side note: What I am doing currently is to sample P with

P = pm.MvNormal("P", mu=P0, chol = CL, shape = nr)

which runs but doesn’t give acceptable results. I also played around with restraining this

P = pm.Bound(pm.MvNormal, lower=0.0, upper=5.0)("P", mu=P0, chol = CL, shape = nr)  

but this leads to issues with the gradient.

I’d appreciate any help and I’ll try to come up with a MWE (or minimal non working example :smiley: ) in the meanwhile.

Thanks,

Stephan

1 Like

The custom step method for P does not need to be in Theano - it just needs to return a dict object with the right keys.

Are there any steps in this notebook which are nonfunctional or confusing?

Hi @ckrapu, thanks for your response.

Yes, that notebook is one of the examples I came across. But I must admit that there is too much going on in it (the stick breaking, etc), and that I can’t figure out how to apply it to my problem. Is there somewhere a list of what keys the dict object must have? Or maybe an easier example?

As a MWE I can think of a linear fit:

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az

b = [2,1.5]
sigma = 2

n = 200
x = np.linspace(start=-20,stop = 20, num = n)
y = b[0]*x+b[1]
y_obs = y + sigma*np.random.randn(n)

image
And then I can use pymc3 like:

with pm.Model() as linear_model:
    bm = pm.Normal("bm", mu=0, sigma=2,shape=2)
    noise = pm.Gamma("noise", alpha=2, beta=1)
    
    y_observed = pm.Normal("y_observed",mu=bm[0]*x+bm[1],sigma=noise,observed=y_obs)

posterior = pm.sample(model = linear_model,chains=2, draws=3000, tune=1000,return_inferencedata=True)

Now, if I want to sample b_m, manually, I could define my own sampler

def my_sampler(mu=0,sigma=2):
    b = np.random.normal(mu,sigma,2)
    return b

and then construct a new model sort of like this (I kind of pieced this together from above link as well as this older example that I found:

class MySamplingStep(object):
    def __init__(self, var, mu, sigma):
            self.vars = var
            self.mu = mu
            self.sigma = sigma

    def step(self, point: dict):
        new = point.copy()
        new[self.var] = my_sampler(self.mu,self.sigma)

        return new

mu = 0
sigma = 2

with pm.Model() as my_linear_model:
    bm = pm.Normal("bm", mu=0, sigma=2,shape=2)
    step_bm = MySamplingStep(bm, mu, sigma)
    noise = pm.Gamma("noise", alpha=2, beta=1)

    y_observed = pm.Normal("y_observed",mu=bm[0]*x+bm[1],sigma=noise,observed=y_obs)

posterior2 = pm.sample(model = my_linear_model,step = [step_bm], chains=2, draws=3000, tune=1000,return_inferencedata=True)

Something like this, and then include the bm step as step in pm.sample(...). But I still keep getting error messages… Length of bm ~ Normal cannot be determined.

If you or someone could point me towards how to do that, that would be highly appreciated :slight_smile: Once this example runs, I should be able to apply it to my actual problem.

I went ahead and fixed your attempt at this gist: fixed-sampling-step · GitHub

Here’s some things I had to do:

  1. I inherited your sampling step object from BlockedStep since this will automatically handle PyMC3’s request for sampler statistics. Otherwise, this will throw an error since your object won’t have that attribute set up.

  2. Each sampler step object needs a vars attribute which is a list of PyMC3 variables which this sampler is responsible for.

  3. I made sure your var argument got passed through to the step method by retaining it as an instance variable. I also made it so that the new point object is indexed into using var.name which is a string, rather than var which would be a PyMC3 random variable.

  4. I made sure there was only a single chain running for this. There are some rough edges here when it comes to using multiprocessing for custom methods that haven’t been dealt with yet.

Is there somewhere a list of what keys the dict object must have?

Yes, in general, you need to make sure that the dict has a key for each of the variables listed in model.free_RVs

Also, here’s an example I just wrote up on multiple regression: Conjugate step for linear regression in PyMC3 · GitHub.

2 Likes

Thank you. This totally helped me implement the custom sampling step I needed for my bigger model. Once my work there has progressed a little further I’ll post some of it here for anyone that might be interested!