Change distribution of jitter

TimOliverMaier · July 11, 2023, 11:24am

Hey!

I was wondering if there is a way in the current PyMC version to customize the jitter used in the “jitter+[xyz]” initialization. By default it is a uniform sample from [-1,1], but could I make it e.g. choose from [-0.1,0.1]?

Thanks for your insights!
Cheers!

cluhmann · July 11, 2023, 11:56am

I don’t believe so. If you are having issues with the default initialization (jitter+adapt_diag), you can try one of the others that is available (see here for the full list).

TimOliverMaier · July 11, 2023, 12:02pm

Thanks for your quick answer. I am currently using a jitterless initialization, but I was wondering about ways to show the model’s struggles when the initial values are “off”. Do you think this would be worth a PR? And how big of a change would that be?

cluhmann · July 11, 2023, 12:18pm

I suspect that it wouldn’t be a large change code-wise. The 2 questions that immediately jump to my mind (neither of which I have answers to) would be a) how much of an effect jittering adjustment would have in most cases (given that the initialization is designed to immediately move away from the initial values/mass matrix) and b) how much can be expected of users to know about the “scale” of the jitter needed in their particular case. I would say it’s worth opening an issue to get some feedback on these and other relevant matters (that I haven’t thought of).

TimOliverMaier · July 11, 2023, 12:20pm

Thank you for your thoughts! I will wrap it up in an issue to gather some feedback.

ricardoV94 · July 11, 2023, 1:10pm

@TimOliverMaier you can pass your manually jittered initial point to sample. I think that would be the easiest?

The kwarg is initvals

TimOliverMaier · July 11, 2023, 1:14pm

Ah cool! I think that is what I was looking for. I know initvals takes a dictionary. So to init e.g. 2 chains do I need a list of two dictionaries (one for chain 1 the other for chain 2) or a dictionary with a list (of length 2) for the parameter of interest? Thats what I don’t quite get from the doc.

ricardoV94 · July 11, 2023, 1:19pm

From memory I think a list of dictionaries. You can test with a simple model with a single variable and make the initval -100 and 100, then tune=1, draws=1

If you can make the docstrings more clear that’s also appreciated!

TimOliverMaier · July 11, 2023, 1:20pm

Allright. Thanks I will have look.

cluhmann · July 11, 2023, 1:28pm

If I remember correctly, NUTS can ignore/overwrite this argument, correct? Though I have never been quite sure when it does/does not do so.

ricardoV94 · July 11, 2023, 1:50pm

No I don’t think it ignores it? Anyway the test above should confirm it

cluhmann · July 11, 2023, 2:30pm

From the pm.sample docstring:

initvals optional, dict, array of dict

Dict or list of dicts with initial value strategies to use instead of the defaults from Model.initial_values. The keys should be names of transformed random variables. Initialization methods for NUTS (see init keyword) can overwrite the default.

ricardoV94 · July 11, 2023, 3:25pm

Ah yes @cluhmann you’re right. One must disable jitter so starting starts exactly at the provided initvals, otherwise the jitter is applied on top of it (which may not be what @TimOliverMaier needs).

TimOliverMaier · July 11, 2023, 3:27pm

I am aware of this. I use “adapt_diag” as init method.

TimOliverMaier · July 11, 2023, 3:31pm

A little example here:

import pymc as pm
import numpy as np
import arviz as az

X = np.random.normal(1,2)
with pm.Model() as model:
    mu = pm.Normal("mu",mu=0,sigma=1)
    sigma = pm.Uniform("sigma",lower=0.1,upper=3)
    obs = pm.Normal("obs",mu=mu,sigma=sigma,observed=X)    
    
init_vals_ch1 = [{"mu":-1,"sigma":1},{"mu":1,"sigma":1}]
with model:
    trace = pm.sample(init="adapt_diag",initvals=init_vals_ch1, discard_tuned_samples=False, chains=2)
    
import arviz as az
az.plot_trace(trace.warmup_posterior,coords={"draw":range(10)})

gives me this plot:

It seems the first tuning samples do not equal the init values. But it seems to work as expected, though.

ricardoV94 · July 11, 2023, 3:40pm

You might only be getting sample 1 and not 0 (which isn’t actually a sample)

TimOliverMaier · July 11, 2023, 3:45pm

Yes, that’s what I thought, too. Initializing with 10 and -10 makes it clearer however, that passing a list of dictionaries works as expected. This will help me. Thank you @ricardoV94 and @cluhmann .

import pymc as pm
import numpy as np
import arviz as az

X = np.random.normal(1,2)
with pm.Model() as model:
    mu = pm.Normal("mu",mu=0,sigma=1)
    sigma = pm.Uniform("sigma",lower=0.1,upper=3)
    obs = pm.Normal("obs",mu=mu,sigma=sigma,observed=X)    
    
init_vals_ch1 = [{"mu":-10,"sigma":1},{"mu":10,"sigma":1}]
with model:
    trace = pm.sample(init="adapt_diag",initvals=init_vals_ch1, discard_tuned_samples=False, chains=2)
    
import arviz as az
az.plot_trace(trace.warmup_posterior,coords={"draw":range(10)}, legend=True)

Topic		Replies	Views
Initialization energy is NaN or Inf with jitter Questions	4	1309	December 9, 2020
Current default initval strategy v5	4	573	June 5, 2023
Initial values not being used when sampling v5	7	1988	July 15, 2022
Difference between 'jitter+adapt_diag' and 'adapt_diag'? Questions	2	1222	April 2, 2019
Unexpected Initial evaluation results v5 modeling	6	781	June 30, 2022

Change distribution of jitter

Related topics