How to set a seed for pm.sample?

guin0x · February 28, 2023, 1:19pm

Hi,

I’m writing a notebook explaining a bit the logistic regression model I built, but I’m struggling a bit with reproducibility.

I tried including np.random.seed(666) in the cell where the model is being trained, and also tried the following:
pm.sample(step=step, random_seed=666)
pm.sample(step=step, random_seed = np.random.seed(666)) and
pm.sample(step=step, random_seed = [666,666,666,666].

The last try was the one that yielded the closest results when I ran it twice. Is there a way to set the seed properly so that I can reproduce the predicted probability so that I can write my explanation and exploration of the results?

Thanks in advance. Below the full code of my model:

np.random.seed(666)

with pm.Model(coords={"predictors":X_original.columns.values}) as model_1:
    X = pm.MutableData('X', X_train)
    y = pm.MutableData('y', y_train)

    constant = pm.Normal('constant', mu=-0.5, sigma=0.1)
    beta = pm.Normal('beta', mu=0, sigma=1, dims="predictors")
    score = pm.Deterministic('score', X@beta)
    noisy_score = pm.Normal('noisy_score', mu=score, sigma=5)
    p = pm.Deterministic('p', pm.math.sigmoid(constant + noisy_score))

    # define likelihood
    observed = pm.Bernoulli('obs', p, observed=y)

    step = pm.NUTS()
    idata = pm.sample(step=step, random_seed=[666,666,666,666])
    idata_prior = pm.sample_prior_predictive(samples=50)
    
    # above we have trained the model with `X_train`
    # below we predict `X_test`
    with model_1:
        pm.set_data({'X':X_test, 'y':np.zeros_like(y_test)})
        y_pred = pm.sample_posterior_predictive(idata)

idata1 = idata.copy()
idata_prior1 = idata_prior.copy()

ricardoV94 · February 28, 2023, 2:19pm

You should use sample(random_seed=x). The best is to pass a numpy Generator, like:

rng = np.random.default_rng(666)
...
  pm.sample(random_seed=rng)
  pm.sample_prior_predictive(random_seed=rng)
  pm.sample_posterior_predictive(..., random_seed=rng)

Global seed is ignored. Multiple seeds per chain as with [666, 666, 666, 666], uses each number for each chain. It doesn’t make much sense to pass the same values, because you don’t want your chains to be identical. And since you want multiple values you can just pass rng and PyMC will draw 4 values for you.

Topic		Replies	Views
Seeding issues when using model with Dirichlet and Gamma Questions	2	560	November 21, 2018
Results not fully reproducible Questions bug	1	740	May 10, 2022
Seeds: Random effect logistic regression Questions	31	1270	March 29, 2022
Setting pymc3 random seed at once Questions	5	3179	May 13, 2018
Accessing the random seed in the MultiTrace object Questions	2	453	January 2, 2020

How to set a seed for pm.sample?

Related topics