Switch point Metropolis tuning

adam · April 13, 2018, 2:56am

I have data with potentially 3 switch points:
data
The model will automatically choose Metropolis for the sampling of the tau_1 and tau_2, and NUTS for the lambdas.

N = np.arange(0, all_points)
with pm.Model() as model:
alpha = 1.0/data.mean()
lambda_1 = pm.Exponential(“lambda_1”, alpha)
lambda_2 = pm.Exponential(“lambda_2”, alpha)
lambda_3 = pm.Exponential(“lambda_3”, alpha)
tau1 = pm.DiscreteUniform("tau1", lower=N.min(), upper=N.max()) 
tau2 = pm.DiscreteUniform("tau2", lower=tau1, upper=N.max()) 

_mu = T.switch(tau1>=N,lambda_1,lambda_2)
mu = T.switch(tau2>=N,_mu,lambda_3)

observation = pm.Poisson("obs", mu, observed=data)
trace = pm.sample(10000, tune=1000)

When tune=500 it doesn’t converge, but with tune=1000, both tau’s converge successfully.

Can you please share with me and the Pymc3 community how tuning works or what should be considered when tuning a Metropolis. Do you know of articles/websites where I could read more about tuning of the Metropolis?

junpenglao · April 13, 2018, 5:03am

I am surprised that discrete switch point works well here, still, I suggest you to use a continuous step function instead of switch, more details see: https://stackoverflow.com/questions/49144144/convert-numpy-function-to-theano/49152694#49152694

Back to your original question, in pm.sample each sampler has some target that it would try to reach during tuning. For example, in Metropolis, it adjusts the scale of the transition kernel so that the acceptance probability is around 50%

github.com

pymc-devs/pymc3/blob/a0ec8e2c1b51370490fb8bb0ef79b80fd15f9a64/pymc3/step_methods/metropolis.py#L141-L146


# Tune scaling parameter
self.scaling = tune(
    self.scaling, self.accepted / float(self.tune_interval))
# Reset counter
self.steps_until_tune = self.tune_interval
self.accepted = 0

In NUTS, tuning is much more complex as it optimizes the step size for leapfrog integrator, the mass matrix of the Hamiltonian transition kernel. The best reference for this is the Stan User Manuel http://mc-stan.org/users/documentation/ chapter 34.2, although we dont use the excat tuning algorithm as Stan.

adam · April 13, 2018, 5:25am

When I try to run what you created:

def logistic(L, x0, k=500, t=np.linspace(0., 1., 1000)):
    return L/(1+tt.exp(-k*(t_-x0)))

with pm.Model() as m2:
    lambda0 = pm.Normal('lambda0', mu, sd=sd)
    lambdad = pm.Normal('lambdad', 0, sd=sd, shape=nbreak-1)
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    b = pm.Beta('b', 1., 1., shape=nbreak-1, transform=trafo,
                    testval=[0.3, 0.5])
    theta_ = pm.Deterministic('theta', tt.exp(lambda0 +
                                              logistic(lambdad[0], b[0]) +
                                              logistic(lambdad[1], b[1])))
    obs = pm.Poisson('obs', theta_, observed=y)
    
    trace = pm.sample(1000, tune=1000)

I get the following error:

AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

Is there any way around that?

junpenglao · April 13, 2018, 5:37am

Yes, more details is in the notebook, link copied below:

github.com

junpenglao/Planet_Sakaar_Data_Science/blob/master/PyM3QnA/stackoverflow_49144144.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sb\n",

This file has been truncated. show original

adam · April 13, 2018, 5:44am

I’m sorry for the confusion. I already ran the whole notebook the first time, but it gives the error as soon as you try to run the model.

AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

I pretty much copy-pasted everything but the error keeps repeating. I tried changing some parameters but I feel the problem is within the class Composed(Transform):

Not sure…

junpenglao · April 13, 2018, 5:58am

That’s pretty strange… what is your pymc3 version?
Also, could you please paste the full error trace?

adam · April 13, 2018, 10:51am

Pymc3 version 3.3
I’m running it in jupyter notebook (version 5.3.1).
When I run the model, it gets up to this part:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
c:\anaconda\lib\site-packages\pymc3\model.py:384: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
if not np.issubdtype(var.dtype, float):
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [b_logodds_ordered__, lambdas]

Then it just gets stuck, but if I you look at the Jupyter command prompt you’d see:

Process SpawnPoolWorker-2:
Traceback (most recent call last):
File “c:\anaconda\lib\multiprocessing\process.py”, line 249, in _bootstrap
self.run()
File “c:\anaconda\lib\multiprocessing\process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “c:\anaconda\lib\multiprocessing\pool.py”, line 108, in worker
task = get()
File “c:\anaconda\lib\site-packages\joblib\pool.py”, line 362, in get
return recv()
File “c:\anaconda\lib\multiprocessing\connection.py”, line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

Do you recommend I update Jupyter or Pymc3 versions?

junpenglao · April 13, 2018, 11:18am

Looks like you have problem of the multiprocessing - try upgrading joblib.
For now, setting njobs=1 in pm.sample() should temporally suppress the error.

adam · April 13, 2018, 4:15pm

Thank you, junpenglao, and thank you for your patience. Upgrading joblib didn’t do it, but it worked when changed the njobs=1.
When I try to apply your model to other datasets that I am analyzing, is it correct that the testval would most likely change for ‘b’?

In the current model, the testval for ‘b’ is [0.3,0.5]:

with pm.Model() as m2:
    lambda0 = pm.Normal('lambda0', mu, sd=sd)
    lambdad = pm.Normal('lambdad', 0, sd=sd, shape=nbreak-1)
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    b = pm.Beta('b', 1., 1., shape=nbreak-1, transform=trafo,
                    testval=[0.3, 0.5])

Can you please share why you chose testval=[0.3, 0.5]. I know these are starting values for ‘b’, but would those change if my initial observed data was different?

junpenglao · April 13, 2018, 4:25pm

The testval here is to avoid a bug in the Ordered transformed: the default testval of the Beta(1, 1) is .5, and with shape=2 it would be [.5, .5], however it generate an error for Ordered as the two value is the same. So as long as you specify a value that is sorted but not the same then it would work.

adam · April 13, 2018, 4:26pm

thank you.

narendramukherjee · April 28, 2018, 3:29pm

Jutting in here as I have a very similar model that I am working with. I was previously using the DiscreteUniform distribution to model the switchpoints in my data, but now am trying to shift to the logistic as explained here.

I am trying to model my data with a single regression discontinuity - so two straight lines fit my data, and at some point in time, the model has to switch from one straight line to the other. Here’s an example:

I tried the same logistic idea as in @junpenglao 's gist, here’s the code:

    def logistic(L, x0, k=500, t=np.linspace(0., 1., 2500)):
         return L/(1+tt.exp(-k*(t-x0)))

And the model:

    In [34]: with pm.Model() as model_before_700:                    
        ...:     alpha = pm.Normal("alpha", mu = 0, sd = 3.0, shape = 2)           
        ...:     beta = pm.Normal("beta", mu = 0, sd = 1.0, shape = 2)
        ...:     switchpoints = pm.Beta("switchpoints", 1, 1)                       
        ...:     sd = pm.HalfCauchy("sd", 0.5, shape = 2)
        ...:     intercept = alpha[0] + logistic(alpha[1], switchpoints)
        ...:     slope = beta[0] + logistic(beta[1], switchpoints)
        ...:     dev = sd[0] + logistic(sd[1], switchpoints)
        ...:     regression = intercept + slope*np.arange(2500)                    
        ...:     observed = pm.Normal("observed", mu = regression, sd = dev, observed = data_before_700)                
    In [34]: with model_before_700:                                  
        ...:     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 3)
        ...:

I consistently get:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

I have seen previous discussions saying that this might be due to overflow errors, so I think I might be missing something simple in my model. Any help is much appreciated

Here’s relevant versions etc:

    In [38]: pm.__version__
    Out[38]: '3.3'

    In [39]: theano.__version__
    Out[39]: '1.0.1'

    In [40]: joblib.__version__
    Out[40]: '0.11'

junpenglao · April 29, 2018, 6:51am

The error is likely cause by the exponential of large value, check the logistic function:

x0=.5
k=500
t=np.linspace(0., 1., 2500)
np.exp(-k*(t-x0))
Out[38]: 
array([3.74645461e+108, 3.06709213e+108, 2.51092169e+108, ...,
       3.98260131e-109, 3.26041722e-109, 2.66919022e-109])

As you can see, the output is in a wildly large range. You should find a way to scale the parameter k.

narendramukherjee · April 29, 2018, 2:29pm

Thanks for the suggestion, I didn’t really need k to be as large as 500, for my purposes having k as 10 would be enough.

In [62]: x0=.5
    ...: k=10
    ...: t=np.linspace(0., 1., 2500)
    ...: np.exp(-k*(t-x0))
    ...: 
Out[62]: 
array([  1.48413159e+02,   1.47820456e+02,   1.47230119e+02, ...,
         6.79208851e-03,   6.76496359e-03,   6.73794700e-03])

The range of the exponential is much smaller now, but the mass matrix issue persists

The model again:

In [69]: def logistic(L, x0, k=10, t=np.linspace(0., 1., 2500)):
    ...:     return L/(1+tt.exp(-k*(t-x0)))
    ...: 

In [70]: with pm.Model() as model_before_700:
    ...:     alpha = pm.Normal("alpha", mu = 0, sd = 3.0, shape = 2)
    ...:     beta = pm.Normal("beta", mu = 0, sd = 1.0, shape = 2)
    ...:     switchpoints = pm.Beta("switchpoints", alpha =1, beta=1)
    ...:     sd = pm.HalfCauchy("sd", 0.5, shape = 2)
    ...:     intercept = alpha[0] + logistic(alpha[1], switchpoints)
    ...:     slope = beta[0] + logistic(beta[1], switchpoints)
    ...:     dev = sd[0] + logistic(sd[1], switchpoints)
    ...:     

In [71]: with model_before_700:
    ...:     regression = intercept + slope*np.arange(2500)
    ...:     observed = pm.Normal("observed", mu = regression, sd = dev, observed = data_before_700)
    ...:     
In [73]: with model_before_700:
    ...:     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 3)
    ...:

I don’t know if this is helpful but: the sampler does sample quite a few iterations before halting with that mass matrix exception. The sampling is pretty slow (slower than using NUTS + Metropolis while using the DiscreteUniform switchpoint model) while it happens.

narendramukherjee · April 29, 2018, 2:31pm

Also, the exception is raised as a joblib value error (though that might just be due to each sampling process running into the exception on its own, and might not be an issue with joblib).

adam · April 29, 2018, 2:33pm

if it won’t take too long, I suggest changing njobs=3 to njobs=1

narendramukherjee · April 29, 2018, 4:22pm

Tried njobs=1, the error trace is smaller here so posting the whole thing:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [sd_log__, switchpoints_logodds__, beta, alpha]
 13%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u258e                                                                                                                                                   | 393/3000 [00:01<00:11, 233.55it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-82-b868e0738426> in <module>()
      1 with model_before_700:
----> 2     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 1)
      3 

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
    460             _log.info('Sequential sampling ({} chains in 1 job)'.format(chains))
    461             _print_step_hierarchy(step)
--> 462             trace = _sample_many(**sample_args)
    463 
    464     discard = tune if discard_tuned_samples else 0

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _sample_many(draws, chain, chains, start, random_seed, step, **kwargs)
    508     for i in range(chains):
    509         trace = _sample(draws=draws, chain=chain + i, start=start[i],
--> 510                         step=step, random_seed=random_seed[i], **kwargs)
    511         if trace is None:
    512             if len(traces) == 0:

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _sample(chain, progressbar, random_seed, start, draws, step, trace, tune, model, live_plot, live_plot_kwargs, **kwargs)
    552     try:
    553         strace = None
--> 554         for it, strace in enumerate(sampling):
    555             if live_plot:
    556                 if live_plot_kwargs is None:

/home/narendra/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py in __iter__(self)
    951 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    952 
--> 953             for obj in iterable:
    954                 yield obj
    955                 # Update and possibly print the progressbar.

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    650                 step = stop_tuning(step)
    651             if step.generates_stats:
--> 652                 point, states = step.step(point)
    653                 if strace.supports_sampler_stats:
    654                     strace.record(point, states)

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py in step(self, point)
    220 
    221         if self.generates_stats:
--> 222             apoint, stats = self.astep(array)
    223             point = self._logp_dlogp_func.array_to_full_dict(apoint)
    224             return point, stats

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py in astep(self, q0)
    113 
    114         if not np.isfinite(start.energy):
--> 115             self.potential.raise_ok()
    116             raise ValueError('Bad initial energy: %s. The model '
    117                              'might be misspecified.' % start.energy)

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py in raise_ok(self)
    188     def raise_ok(self):
    189         if np.any(self._stds == 0):
--> 190             raise ValueError('Mass matrix contains zeros on the diagonal. '
    191                              'Some derivatives might always be zero.')
    192         if np.any(self._stds < 0):

ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

Again, a few samples were drawn before the ValueError came up. The sampling was faster in this case while it happened, but since few samples were drawn before the ValueError, those might just be the initial tuning steps.

adam · April 30, 2018, 5:04am

sorry. not sure what to tell you

junpenglao · April 30, 2018, 7:40am

Actually I am not sure whether changing the k is the right answer - it change the shape of the sigmoid which controls the sharpness of the switch point.

I will try to simulate some data and get back to you.

junpenglao · April 30, 2018, 10:50am

Right, so the problem is:

regression = intercept + slope*np.arange(2500)

where np.arange(2500) scaled the parameter to a large range that is difficult to sample from.
Standardizing the predictor seems to do the trick:

regression = intercept + slope*np.linspace(0., 1., 2500)

Topic		Replies	Views
Sampling time is very long (minutes per sample) Questions	13	4488	November 2, 2017
Runtime Exception with NUTS v3	16	1096	March 16, 2022
Correctly applying switchpoint model? Questions	2	2331	April 23, 2019
Complex model samples very slowly - wondering about possible improvements Questions	5	728	July 31, 2020
Geometric variable not being properly Sampled Questions from_github	22	3254	June 22, 2017

Switch point Metropolis tuning

Related topics