Custom sampling step

stephan · April 14, 2021, 8:56pm

Hello pymc3 community!

I research analytical methods for protein structure modeling and have recently used pymc3 to estimate uncertainty in experimentally data obtained modeling parameters. Now, I want to make the model that we are using more general and have a question (actually two) concerning a custom sampling step and was hoping maybe some of you could provide some input.

Okay, here is the problem: The whole model is pretty complex and has quite a lot of parameters, so I don’t really have a minimal example to post right now. But basically what I want to do is to have a custom sampling step for parameter (vector) P. P is very similar to a multivariate Gaussian but must be constrained to nonnegative values. We implemented this step (but not the full model) in MATLAB in previous works (for those curious, here is the link) and want to port this to python and pymc3 for obvious reasons.

Here are my two questions

I am struggling a little with telling pymc3 to do a custom step for one of the parameters while using NUTS for all the others. I found a few examples online, but those are either old and don’t work with the current pymc3 version or are not what I want to do.
What is the best approach for the custom draw for P. It is quite a bit of code that involves matrix inverses and a fast nonnegative least squares solution (fnnls), and also includes a while condition. Converting the sampler into python code was pretty straight forward, but I am having issues with making it compatible with theano, in particular the fnnls part with the while loop. Is there an easier way of doing this? This draw also depends on a hyperparameter δ that is sampled via NUTS.

On a side note: What I am doing currently is to sample P with

P = pm.MvNormal("P", mu=P0, chol = CL, shape = nr)

which runs but doesn’t give acceptable results. I also played around with restraining this

P = pm.Bound(pm.MvNormal, lower=0.0, upper=5.0)("P", mu=P0, chol = CL, shape = nr)

but this leads to issues with the gradient.

I’d appreciate any help and I’ll try to come up with a MWE (or minimal non working example ) in the meanwhile.

Thanks,

Stephan

ckrapu · April 16, 2021, 2:55am

The custom step method for P does not need to be in Theano - it just needs to return a dict object with the right keys.

Are there any steps in this notebook which are nonfunctional or confusing?

stephan · April 23, 2021, 11:06pm

Hi @ckrapu, thanks for your response.

Yes, that notebook is one of the examples I came across. But I must admit that there is too much going on in it (the stick breaking, etc), and that I can’t figure out how to apply it to my problem. Is there somewhere a list of what keys the dict object must have? Or maybe an easier example?

As a MWE I can think of a linear fit:

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az

b = [2,1.5]
sigma = 2

n = 200
x = np.linspace(start=-20,stop = 20, num = n)
y = b[0]*x+b[1]
y_obs = y + sigma*np.random.randn(n)

And then I can use pymc3 like:

with pm.Model() as linear_model:
    bm = pm.Normal("bm", mu=0, sigma=2,shape=2)
    noise = pm.Gamma("noise", alpha=2, beta=1)
    
    y_observed = pm.Normal("y_observed",mu=bm[0]*x+bm[1],sigma=noise,observed=y_obs)

posterior = pm.sample(model = linear_model,chains=2, draws=3000, tune=1000,return_inferencedata=True)

Now, if I want to sample b_m, manually, I could define my own sampler

def my_sampler(mu=0,sigma=2):
    b = np.random.normal(mu,sigma,2)
    return b

and then construct a new model sort of like this (I kind of pieced this together from above link as well as this older example that I found:

class MySamplingStep(object):
    def __init__(self, var, mu, sigma):
            self.vars = var
            self.mu = mu
            self.sigma = sigma

    def step(self, point: dict):
        new = point.copy()
        new[self.var] = my_sampler(self.mu,self.sigma)

        return new

mu = 0
sigma = 2

with pm.Model() as my_linear_model:
    bm = pm.Normal("bm", mu=0, sigma=2,shape=2)
    step_bm = MySamplingStep(bm, mu, sigma)
    noise = pm.Gamma("noise", alpha=2, beta=1)

    y_observed = pm.Normal("y_observed",mu=bm[0]*x+bm[1],sigma=noise,observed=y_obs)

posterior2 = pm.sample(model = my_linear_model,step = [step_bm], chains=2, draws=3000, tune=1000,return_inferencedata=True)

Something like this, and then include the bm step as step in pm.sample(...). But I still keep getting error messages… Length of bm ~ Normal cannot be determined.

If you or someone could point me towards how to do that, that would be highly appreciated Once this example runs, I should be able to apply it to my actual problem.

ckrapu · April 24, 2021, 4:38pm

I went ahead and fixed your attempt at this gist: fixed-sampling-step · GitHub

Here’s some things I had to do:

I inherited your sampling step object from BlockedStep since this will automatically handle PyMC3’s request for sampler statistics. Otherwise, this will throw an error since your object won’t have that attribute set up.
Each sampler step object needs a vars attribute which is a list of PyMC3 variables which this sampler is responsible for.
I made sure your var argument got passed through to the step method by retaining it as an instance variable. I also made it so that the new point object is indexed into using var.name which is a string, rather than var which would be a PyMC3 random variable.
I made sure there was only a single chain running for this. There are some rough edges here when it comes to using multiprocessing for custom methods that haven’t been dealt with yet.

Is there somewhere a list of what keys the dict object must have?

Yes, in general, you need to make sure that the dict has a key for each of the variables listed in model.free_RVs

Also, here’s an example I just wrote up on multiple regression: Conjugate step for linear regression in PyMC3 · GitHub.

stephan · May 5, 2021, 4:35am

Thank you. This totally helped me implement the custom sampling step I needed for my bigger model. Once my work there has progressed a little further I’ll post some of it here for anyone that might be interested!

perrette · September 23, 2022, 12:38pm

Hi @ckrapu,

thanks much for your reply, that was extremely useful. It seems though, as it does not apply for transformed variables (in my case, a uniform distribution). Would you extend your reply to account for these? (your second notebook looks like it addresses that case, but I was not able to adapt it to the simple example that follows):

import numpy as np
import pymc3 as pm
from pymc3.step_methods.arraystep import BlockedStep

class FixedDistStep(BlockedStep):
    
    def __init__(self, vars, low, high, seed=None):
        self.vars = vars
        self.rng = np.random.default_rng(seed=seed)
        self.low = low
        self.high = high
        
    def step(self, point: dict):
        new = point  # why a copy?
        for v in self.vars:  # there is only one variable in our example
            new[v.name] = self.rng.uniform(self.low, self.high)
                
        return new

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

with pm.Model() as model:
    
    a = pm.Normal('a', 0, 3)
    aT0 = pm.Uniform('aT0', -1, 0)  # should not be used
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)

    fixed_step = FixedDistStep([aT0], -0.5, -0.3)    
    trace = pm.sample(return_inferencedata=True, step=[fixed_step], chains=1)

The result is not as expected with the NUTS step still acting on aT0:

Sequential sampling (1 chains in 1 job)
CompoundStep
FixedDistStep: [aT0]
NUTS: [aT0, a]

(and without chains=1, there is a KeyError: aT0 is not found)

I figured that’s because T0 is transformed and the free RV internally is aT0_interval__. So I tried the following:

import numpy as np
import pymc3 as pm
from pymc3.step_methods.arraystep import BlockedStep

class FixedDistStep(BlockedStep):
    
    def __init__(self, vars, low, high, seed=None):
        self.vars = vars
        self.rng = np.random.default_rng(seed=seed)
        self.low = low
        self.high = high
        
    def step(self, point: dict):
        new = point  # why a copy?
        for v in self.vars:  # there is only one variable in our example
            val = self.rng.uniform(self.low, self.high)
            if v.name.endswith('_interval__'):
                new[v.name] = v.distribution.transform_used.forward_val(val)                
            else:
                new[v.name] = val
                
        return new

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

with pm.Model() as model:
    
    a = pm.Normal('a', 0, 3)
    aT0 = pm.Uniform('aT0', -1, 0)  # won't be used
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)

    fixed_step = FixedDistStep([model.free_RVs[1]], -0.5, -0.3)    
    trace = pm.sample(return_inferencedata=True, step=[fixed_step], chains=1)

where model.free_RVs[1] is aT0_interval__.

Sequential sampling (1 chains in 1 job)
CompoundStep
FixedDistStep: [aT0]
NUTS: [a]

The hack above seems to work, but it takes a loong while to run (28 seconds instead of just 1 second), suggesting that something is not quite right.

For now, I think the simplest work around is just to replace the to-be-imposed prior distribution in the base model with a normal distribution. That seems to work just fine. But I had hoped I can avoid having various versions of the same model (for cleaner code).

Another workaround that I previously used, was to use a random stream:

import numpy as np
import pymc3 as pm
from theano.tensor.random.utils import RandomStream

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

srng = RandomStream()

with pm.Model() as model:
    
    a = pm.Normal('a', 0, 3)
    aT0 = pm.Deterministic('aT0', srng.uniform(-0.5, -0.3)) # won't be updated
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)

    trace = pm.sample(return_inferencedata=True, chains=1)

Seems to work fine, though it takes 4 seconds, instead of just one when using a custom step (in this example), for some reason. The main reason why I don’t use this technique anymore though, is that it causes an error in pymc v4.

ckrapu · September 24, 2022, 4:17am

I would definitely agree with you that the most practical strategy for doing this sort of thing is to use normal / unconstrained priors and then do deterministic transforms later on. Unfortunately I’m not sure what else to recommend! I do hope to give some v4 examples later.

perrette · September 24, 2022, 6:16pm

Thanks. Yes I ended up doing just that (it occurred to me as I was writing), and that’s a reasonable solution.
As far as pymc v4, I encountered this example:

But it is certainly more involved than just subclassing the BlockedStep class with a vars attribute and a step method. It involves ArrayStepShared and RaveledVars classes, and half a dozen pymc functions and methods. I’d be curious to understand what it does, and what the differences are with the pymc3 example here. (Maybe it does solve the constrained variable issue?)

perrette · September 26, 2022, 12:31pm

For pymc v4, in the standard case of simple normal distribution, that seems to work, and in the cases I tried, is faster than the ArrayStepShared approach from https://discourse.pymc.io/t/two-stage-bayesian-regression-enforcing-a-fixed-distribution-not-just-hierarchical-regression/10036/15:

import numpy as np
import pymc as pm
from pymc.step_methods.arraystep import BlockedStep

class FixedDistStep(BlockedStep):
    
    def __init__(self, vars, low, high, seed=None, model=None):
        model = pm.modelcontext(model)
        vars = [model.rvs_to_values.get(var, var) for var in vars]
        self.vars = pm.inputvars(vars)        
        self.rng = np.random.default_rng(seed=seed)
        self.low = low
        self.high = high
        
    def step(self, point: dict):
        for v in self.vars:  # there is only one variable in our example
            point[v.name] = self.rng.uniform(self.low, self.high)
                
        return point

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

with pm.Model() as model:
    
    a = pm.Normal('a', 0, 3)
    aT0 = pm.Normal('aT0', 0, 1)  # won't be used, but MUST be Normal
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)

    fixed_step = FixedDistStep([aT0], -0.5, -0.3)    
    trace = pm.sample(return_inferencedata=True, step=[fixed_step], chains=1)

It is essentially the same approach as illustrated in that thread for pymc3, with the notable (and necessary) addition of vars = [model.rvs_to_values.get(var, var) for var in vars].

ricardoV94 · September 26, 2022, 2:10pm

Here is some code that works with the model distribution directly, and also in the case there are RVs above the fixed distribution that one still wants to sample with NUTS:

import numpy as np
import pymc as pm
from pymc.aesaraf import rvs_to_value_vars
from pymc.step_methods.arraystep import BlockedStep

class FixedDistStep(BlockedStep):
    
    def __init__(self, vars, seed=None, model=None):
        model = pm.modelcontext(model)
        self.vars = [model.rvs_to_values.get(var, var) for var in vars]
        
        # Replace upstream RVs by their values. We clone the vars of the
        # sampler to remove the value variable in the tag
        vars_wo_values = []
        for var in (model.values_to_rvs[v] for v in self.vars):
       
            # TODO: Raise if var as a value with a transform!
            var_wo_value = var.owner.clone().default_output()
            var_wo_value.tag = {}
            vars_wo_values.append(var_wo_value)
        
        vars_out = rvs_to_value_vars(vars_wo_values)
        self.fn = model.compile_fn(inputs=model.value_vars, outs=vars_out, random_seed=seed, on_unused_input="ignore")      

        
    def step(self, point: dict):
        
        for v, value in zip(self.vars, self.fn(point)):  # there is only one variable in our example
            point[v.name] = value
                
        return point

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

with pm.Model() as model:
    
    a = pm.Normal('a', 0, 3)
    aT0 = pm.Uniform('aT0', -0.5, -0.3, transform=None)
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)

    fixed_step = FixedDistStep([aT0])    
    trace = pm.sample(return_inferencedata=True, step=[fixed_step], chains=1)

perrette · September 26, 2022, 3:05pm

Hi @ricardoV94,

thanks much for being so responsive.

May I ask, what would be an alternative version of the code with a really “custom” sampling, something that cannot easily be expressed in the model?

In practice, like others in the various related threads, I’d like to sample results from an existing calculation that occurred outside of the model, and are contained in a numpy.ndarray with shape draws x variables (several variables correlated with one another, not necessarily following a multivariable normal distribution).

In the form you are suggesting now, I guess I’d define an random integer index (Categorical distribution), and just index the the ndarray (or maybe it has to be aesara.shared(…) version of it). I had successfully tried that approach with a RandomStream integer in pymc3. But I kind of liked the aesthetics of having a model defined as stand-alone, and doing the “coupling” at the sampling step level. But of course aesthetics does not really matter, as long as it works.

ricardoV94 · September 26, 2022, 3:09pm

@perrette I would need some more details to understand exactly what you’re trying to achieve. You have a grid of pre-computed values that somehow get fed into your model but don’t depend on any variables you are inferring? What are those values?

perrette · September 27, 2022, 9:13am

Hi @ricardoV94,

Right, let me explain what I am up to. I have a model representing some global quantity (global in the literal sense of worldwide aggregate – global mean sea-level rise). Now I have a second, local model that extends the global model by adding location specific processes (here it is mostly vertical land movements at tide-gauges). What I am trying to achieve, is to feed the posterior samples from the global model into the local model. I need to do this at about 400 stations, but here I do not want the global model parameters to adjust to accomodate to each station in different ways. I already have a dataset for my global model (say worldwide glaciers melt) and in sampling instances of the local model I only want the additional, station-specific parameters to adjust.

Here I made a conceptual example of what I am trying to achieve.

Global model:

import pymc as pm
import numpy as np

temp = np.array([0., 1.])
slr_obs = np.array([0.3, 3.5])

with pm.Model() as global_model:
    
    a = pm.Uniform('a', -3, 5)
    aT0 = pm.Uniform('aT0', -3, 0)
    slr = pm.Deterministic('slr', a*temp-aT0)
    pm.Normal('slr_obs', slr, sigma=0.3, observed=slr_obs)
    trace = pm.sample(chains=1)

The class to ensure fixed sampling from global model:

from pymc.step_methods.arraystep import BlockedStep

class FixedDistStep(BlockedStep):
    
    def __init__(self, vars, trace, seed=None, model=None):
        model = pm.modelcontext(model)
        vars = [model.rvs_to_values.get(var, var) for var in vars]
        self.vars = pm.inputvars(vars)        
        self.rng = np.random.default_rng(seed=seed)
        self.trace = trace
        
    def step(self, point: dict):
        
        i = self.rng.integers(self.trace.posterior.draw.size)
        
        for v in self.vars:
            point[v.name] = self.trace.posterior[v.name].sel(draw=i, chain=0).values
                
        return point

The local model

n_stations = 1  # in real case that would be close to 500
local_trends = 3.5 + np.random.normal(0, 0.5, size=n_stations)

for k in range(n_stations):
    with pm.Model() as local_model:
        
        a = pm.Normal('a', 0, 3)  # won't be used
        aT0 = pm.Normal('aT0', -0.5, 2)  # won't be used
        
        fixed_global_step = FixedDistStep([a, aT0], trace)
        
        global_slr = pm.Deterministic('global_slr', a*temp[1]-aT0)
        land_motion = pm.Normal('land_motion', 0, 2)
        local_slr = pm.Deterministic('local_slr', global_slr + land_motion)
        
        pm.Normal('local_slr_obs', local_slr, sigma=0.2, observed=local_trends[k])
        
        local_trace = pm.sample(step=[fixed_global_step], chains=1)

That works well actually. I just have to be careful to redefine the global model parameters to a normal distribution for the step to work as expected. I was just curious to use your suggestion that deals with transformed distributions.

Another issue I have, but I do not expect it to be addressed here, is that for some reason, in my real case it takes 2 to 3 times longer with pymc v4 as it used to with pymc3. So I will probably rollback the upgrade.

ricardoV94 · September 27, 2022, 9:32am

Okay so you are trying to use posterior parameters from a separate global model as a fixed prior to your local models?

Dumb question: why must they be fixed? Don’t your local models give you more information about your global parameters?

Suggestion: It seems unnecessary to take a random draw every step. I would cache these draws in advance when you initialize the model, or at least cache a batch of draws and only resample every 500 steps or what have you. That should make your step pretty much “cost-free”.

Also careful with the step seed if you have multiple chains. Our seeding for multiple chains is still a bit old-school: Refactor step methods to use their own random stream · Issue #5797 · pymc-devs/pymc · GitHub

ricardoV94 · September 27, 2022, 9:39am

One problem I see with using a fake distribution is that the joint model logp is incorrect. NUTS will be considering the written priors, not the actual ones “enforced” by your step sampler.

Edit: Perhaps you want something more like a discrete version of pm.Interpolated? pymc.Interpolated — PyMC dev documentation

perrette · September 27, 2022, 9:44am

Okay so you are trying to use posterior parameters from a separate global model as a fixed prior to your local models?

Exactly. I believe that is similar to Two-stage Bayesian regression enforcing a fixed distribution (not Just Hierarchical regression) - #18 by Ray_Kruger. Including the maths.

Dumb question: why must they be fixed? Don’t your local models give you more information about your global parameters?

That’s certainly not a dumb question. A bad reason, but a practical one, is that the sampling fails with more than 20 or so stations. That likely has to do with the model formulation. Possibly because the stations are correlated in some ways, and accounting for the correlation is tricky. Dealing with the stations separately bypass that difficulty.

I am working towards having one global model, but even then, I find it insightful to have a decoupled model as point of comparison.

ricardoV94 · September 27, 2022, 9:45am

Perhaps don’t use a distribution altogether? Use a sharedvariable for those hyperparameters, and have a step method update it between draws (can a step method have zero variables? If not, give it an isolated Flat). That ought to be possible.

That’s directly what you’re trying to achieve

ricardoV94 · September 27, 2022, 9:55am

Something like this seems to work:

import aesara
import aesara.tensor as at
import pymc as pm
import numpy as np

from pymc.step_methods.arraystep import BlockedStep

class UpdateSharedVariablesStep(BlockedStep):

    def __init__(self, shared_vars, trace, seed=None, model=None):
        self.vars = shared_vars
        self.rng = np.random.default_rng(seed=seed)
        self.trace = trace

    def step(self, point: dict):
        i = self.rng.integers(100)
        #i = self.rng.integers(self.trace.posterior.draw.size)

        for v in self.vars:
            #v.set_value(self.trace.posterior[v.name].sel(draw=i, chain=0).values)
            v.set_value(i)

        return point


temp = np.array([0, 1])
obs = np.array([0.3, 3.5])

with pm.Model() as local_model:
   a = aesara.shared(0.0, name='a')
   aT0 = aesara.shared(0.0, name='aT0')
   update_shared_step = UpdateSharedVariablesStep([a, aT0], trace=None)

   global_slr = pm.Deterministic('global_slr', a*temp[1]-aT0)
   land_motion = pm.Normal('land_motion', 0, 2)
   local_slr = pm.Deterministic('local_slr', global_slr + land_motion)

   pm.Normal('local_slr_obs', local_slr, sigma=0.2, observed=obs)

   local_trace = pm.sample(step=[update_shared_step], chains=1)

perrette · September 27, 2022, 9:55am

I’ll look into it. Thanks for the suggestion and the warnings.

In pymc3 I used a RandomStream integer like this:

from theano.tensor.random.utils import RandomStream
from theano import shared

srng = RandomStream(seed=234)
a_shared = shared(trace.posterior['a'].sel(chain=0).values)
aT0_shared = shared(trace.posterior['aT0'].sel(chain=0).values)

for k in range(n_stations):
    with pm.Model() as local_model:
        
        i = pm.Deterministic('global_ensemble_i', srng.categorical(np.ones(trace.draw.size)))
        a = a_shared[i]
        aT0 = aT0_shared[i]        
        global_slr = pm.Deterministic('global_slr', a*temp[1]-aT0)
        ...

but that approach fails in pymc v4

perrette · September 27, 2022, 9:56am

Oh. Great. I’ll go and try this out. Many thanks, really.

Topic		Replies	Views
Using a custom step method for latent mixing variables Questions	4	1703	April 16, 2018
Initial mass matrix Questions	1	758	June 17, 2019
Isolate NUTS into a new library Development	7	1546	October 12, 2020
Drawing values from custom distribution Questions	8	973	September 14, 2022
Reuse tuning for next sampling call Questions	9	2213	February 8, 2019

Custom sampling step

Related topics