Switch point Metropolis tuning

I have data with potentially 3 switch points:
data
The model will automatically choose Metropolis for the sampling of the tau_1 and tau_2, and NUTS for the lambdas.

N = np.arange(0, all_points)
with pm.Model() as model:
alpha = 1.0/data.mean()
lambda_1 = pm.Exponential(“lambda_1”, alpha)
lambda_2 = pm.Exponential(“lambda_2”, alpha)
lambda_3 = pm.Exponential(“lambda_3”, alpha)

tau1 = pm.DiscreteUniform("tau1", lower=N.min(), upper=N.max()) 
tau2 = pm.DiscreteUniform("tau2", lower=tau1, upper=N.max()) 

_mu = T.switch(tau1>=N,lambda_1,lambda_2)
mu = T.switch(tau2>=N,_mu,lambda_3)

observation = pm.Poisson("obs", mu, observed=data)
trace = pm.sample(10000, tune=1000)

When tune=500 it doesn’t converge, but with tune=1000, both tau’s converge successfully.

Can you please share with me and the Pymc3 community how tuning works or what should be considered when tuning a Metropolis. Do you know of articles/websites where I could read more about tuning of the Metropolis?

I am surprised that discrete switch point works well here, still, I suggest you to use a continuous step function instead of switch, more details see: https://stackoverflow.com/questions/49144144/convert-numpy-function-to-theano/49152694#49152694

Back to your original question, in pm.sample each sampler has some target that it would try to reach during tuning. For example, in Metropolis, it adjusts the scale of the transition kernel so that the acceptance probability is around 50%

In NUTS, tuning is much more complex as it optimizes the step size for leapfrog integrator, the mass matrix of the Hamiltonian transition kernel. The best reference for this is the Stan User Manuel http://mc-stan.org/users/documentation/ chapter 34.2, although we dont use the excat tuning algorithm as Stan.

When I try to run what you created:

def logistic(L, x0, k=500, t=np.linspace(0., 1., 1000)):
    return L/(1+tt.exp(-k*(t_-x0)))

with pm.Model() as m2:
    lambda0 = pm.Normal('lambda0', mu, sd=sd)
    lambdad = pm.Normal('lambdad', 0, sd=sd, shape=nbreak-1)
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    b = pm.Beta('b', 1., 1., shape=nbreak-1, transform=trafo,
                    testval=[0.3, 0.5])
    theta_ = pm.Deterministic('theta', tt.exp(lambda0 +
                                              logistic(lambdad[0], b[0]) +
                                              logistic(lambdad[1], b[1])))
    obs = pm.Poisson('obs', theta_, observed=y)
    
    trace = pm.sample(1000, tune=1000)

I get the following error:

AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

Is there any way around that?

Yes, more details is in the notebook, link copied below:

I’m sorry for the confusion. I already ran the whole notebook the first time, but it gives the error as soon as you try to run the model.

AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

I pretty much copy-pasted everything but the error keeps repeating. I tried changing some parameters but I feel the problem is within the class Composed(Transform):

Not sure…

That’s pretty strange… what is your pymc3 version?
Also, could you please paste the full error trace?

Pymc3 version 3.3
I’m running it in jupyter notebook (version 5.3.1).
When I run the model, it gets up to this part:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
c:\anaconda\lib\site-packages\pymc3\model.py:384: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
if not np.issubdtype(var.dtype, float):
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [b_logodds_ordered__, lambdas]

Then it just gets stuck, but if I you look at the Jupyter command prompt you’d see:

Process SpawnPoolWorker-2:
Traceback (most recent call last):
File “c:\anaconda\lib\multiprocessing\process.py”, line 249, in _bootstrap
self.run()
File “c:\anaconda\lib\multiprocessing\process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “c:\anaconda\lib\multiprocessing\pool.py”, line 108, in worker
task = get()
File “c:\anaconda\lib\site-packages\joblib\pool.py”, line 362, in get
return recv()
File “c:\anaconda\lib\multiprocessing\connection.py”, line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can’t get attribute ‘Composed’ on <module ‘main’ (built-in)>

Do you recommend I update Jupyter or Pymc3 versions?

Looks like you have problem of the multiprocessing - try upgrading joblib.
For now, setting njobs=1 in pm.sample() should temporally suppress the error.

1 Like

Thank you, junpenglao, and thank you for your patience. Upgrading joblib didn’t do it, but it worked when changed the njobs=1.
When I try to apply your model to other datasets that I am analyzing, is it correct that the testval would most likely change for ‘b’?

In the current model, the testval for ‘b’ is [0.3,0.5]:

with pm.Model() as m2:
    lambda0 = pm.Normal('lambda0', mu, sd=sd)
    lambdad = pm.Normal('lambdad', 0, sd=sd, shape=nbreak-1)
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    b = pm.Beta('b', 1., 1., shape=nbreak-1, transform=trafo,
                    testval=[0.3, 0.5])

Can you please share why you chose testval=[0.3, 0.5]. I know these are starting values for ‘b’, but would those change if my initial observed data was different?

The testval here is to avoid a bug in the Ordered transformed: the default testval of the Beta(1, 1) is .5, and with shape=2 it would be [.5, .5], however it generate an error for Ordered as the two value is the same. So as long as you specify a value that is sorted but not the same then it would work.

1 Like

thank you. :slight_smile:

Jutting in here as I have a very similar model that I am working with. I was previously using the DiscreteUniform distribution to model the switchpoints in my data, but now am trying to shift to the logistic as explained here.

I am trying to model my data with a single regression discontinuity - so two straight lines fit my data, and at some point in time, the model has to switch from one straight line to the other. Here’s an example:
image

I tried the same logistic idea as in @junpenglao 's gist, here’s the code:

    def logistic(L, x0, k=500, t=np.linspace(0., 1., 2500)):
         return L/(1+tt.exp(-k*(t-x0)))                                    

And the model:

    In [34]: with pm.Model() as model_before_700:                    
        ...:     alpha = pm.Normal("alpha", mu = 0, sd = 3.0, shape = 2)           
        ...:     beta = pm.Normal("beta", mu = 0, sd = 1.0, shape = 2)
        ...:     switchpoints = pm.Beta("switchpoints", 1, 1)                       
        ...:     sd = pm.HalfCauchy("sd", 0.5, shape = 2)
        ...:     intercept = alpha[0] + logistic(alpha[1], switchpoints)
        ...:     slope = beta[0] + logistic(beta[1], switchpoints)
        ...:     dev = sd[0] + logistic(sd[1], switchpoints)
        ...:     regression = intercept + slope*np.arange(2500)                    
        ...:     observed = pm.Normal("observed", mu = regression, sd = dev, observed = data_before_700)                
    In [34]: with model_before_700:                                  
        ...:     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 3)
        ...: 

I consistently get:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

I have seen previous discussions saying that this might be due to overflow errors, so I think I might be missing something simple in my model. Any help is much appreciated :slight_smile:

Here’s relevant versions etc:

    In [38]: pm.__version__
    Out[38]: '3.3'

    In [39]: theano.__version__
    Out[39]: '1.0.1'

    In [40]: joblib.__version__
    Out[40]: '0.11'

The error is likely cause by the exponential of large value, check the logistic function:

x0=.5
k=500
t=np.linspace(0., 1., 2500)
np.exp(-k*(t-x0))
Out[38]: 
array([3.74645461e+108, 3.06709213e+108, 2.51092169e+108, ...,
       3.98260131e-109, 3.26041722e-109, 2.66919022e-109])

As you can see, the output is in a wildly large range. You should find a way to scale the parameter k.

Thanks for the suggestion, I didn’t really need k to be as large as 500, for my purposes having k as 10 would be enough.

In [62]: x0=.5
    ...: k=10
    ...: t=np.linspace(0., 1., 2500)
    ...: np.exp(-k*(t-x0))
    ...: 
Out[62]: 
array([  1.48413159e+02,   1.47820456e+02,   1.47230119e+02, ...,
         6.79208851e-03,   6.76496359e-03,   6.73794700e-03])

The range of the exponential is much smaller now, but the mass matrix issue persists :frowning:

The model again:

In [69]: def logistic(L, x0, k=10, t=np.linspace(0., 1., 2500)):
    ...:     return L/(1+tt.exp(-k*(t-x0)))
    ...: 

In [70]: with pm.Model() as model_before_700:
    ...:     alpha = pm.Normal("alpha", mu = 0, sd = 3.0, shape = 2)
    ...:     beta = pm.Normal("beta", mu = 0, sd = 1.0, shape = 2)
    ...:     switchpoints = pm.Beta("switchpoints", alpha =1, beta=1)
    ...:     sd = pm.HalfCauchy("sd", 0.5, shape = 2)
    ...:     intercept = alpha[0] + logistic(alpha[1], switchpoints)
    ...:     slope = beta[0] + logistic(beta[1], switchpoints)
    ...:     dev = sd[0] + logistic(sd[1], switchpoints)
    ...:     

In [71]: with model_before_700:
    ...:     regression = intercept + slope*np.arange(2500)
    ...:     observed = pm.Normal("observed", mu = regression, sd = dev, observed = data_before_700)
    ...:     
In [73]: with model_before_700:
    ...:     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 3)
    ...:     

I don’t know if this is helpful but: the sampler does sample quite a few iterations before halting with that mass matrix exception. The sampling is pretty slow (slower than using NUTS + Metropolis while using the DiscreteUniform switchpoint model) while it happens.

Also, the exception is raised as a joblib value error (though that might just be due to each sampling process running into the exception on its own, and might not be an issue with joblib).

if it won’t take too long, I suggest changing njobs=3 to njobs=1

Tried njobs=1, the error trace is smaller here so posting the whole thing:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [sd_log__, switchpoints_logodds__, beta, alpha]
 13%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u258e                                                                                                                                                   | 393/3000 [00:01<00:11, 233.55it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-82-b868e0738426> in <module>()
      1 with model_before_700:
----> 2     trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 1)
      3 

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
    460             _log.info('Sequential sampling ({} chains in 1 job)'.format(chains))
    461             _print_step_hierarchy(step)
--> 462             trace = _sample_many(**sample_args)
    463 
    464     discard = tune if discard_tuned_samples else 0

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _sample_many(draws, chain, chains, start, random_seed, step, **kwargs)
    508     for i in range(chains):
    509         trace = _sample(draws=draws, chain=chain + i, start=start[i],
--> 510                         step=step, random_seed=random_seed[i], **kwargs)
    511         if trace is None:
    512             if len(traces) == 0:

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _sample(chain, progressbar, random_seed, start, draws, step, trace, tune, model, live_plot, live_plot_kwargs, **kwargs)
    552     try:
    553         strace = None
--> 554         for it, strace in enumerate(sampling):
    555             if live_plot:
    556                 if live_plot_kwargs is None:

/home/narendra/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py in __iter__(self)
    951 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    952 
--> 953             for obj in iterable:
    954                 yield obj
    955                 # Update and possibly print the progressbar.

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    650                 step = stop_tuning(step)
    651             if step.generates_stats:
--> 652                 point, states = step.step(point)
    653                 if strace.supports_sampler_stats:
    654                     strace.record(point, states)

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py in step(self, point)
    220 
    221         if self.generates_stats:
--> 222             apoint, stats = self.astep(array)
    223             point = self._logp_dlogp_func.array_to_full_dict(apoint)
    224             return point, stats

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py in astep(self, q0)
    113 
    114         if not np.isfinite(start.energy):
--> 115             self.potential.raise_ok()
    116             raise ValueError('Bad initial energy: %s. The model '
    117                              'might be misspecified.' % start.energy)

/home/narendra/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py in raise_ok(self)
    188     def raise_ok(self):
    189         if np.any(self._stds == 0):
--> 190             raise ValueError('Mass matrix contains zeros on the diagonal. '
    191                              'Some derivatives might always be zero.')
    192         if np.any(self._stds < 0):

ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

Again, a few samples were drawn before the ValueError came up. The sampling was faster in this case while it happened, but since few samples were drawn before the ValueError, those might just be the initial tuning steps.

sorry. not sure what to tell you

Actually I am not sure whether changing the k is the right answer - it change the shape of the sigmoid which controls the sharpness of the switch point.

I will try to simulate some data and get back to you.

Right, so the problem is:

regression = intercept + slope*np.arange(2500)

where np.arange(2500) scaled the parameter to a large range that is difficult to sample from.
Standardizing the predictor seems to do the trick:

regression = intercept + slope*np.linspace(0., 1., 2500)
1 Like