How to set a seed for pm.sample?

Hi,

I’m writing a notebook explaining a bit the logistic regression model I built, but I’m struggling a bit with reproducibility.

I tried including np.random.seed(666) in the cell where the model is being trained, and also tried the following:
pm.sample(step=step, random_seed=666)
pm.sample(step=step, random_seed = np.random.seed(666)) and
pm.sample(step=step, random_seed = [666,666,666,666].

The last try was the one that yielded the closest results when I ran it twice. Is there a way to set the seed properly so that I can reproduce the predicted probability so that I can write my explanation and exploration of the results?

Thanks in advance. Below the full code of my model:

np.random.seed(666)

with pm.Model(coords={"predictors":X_original.columns.values}) as model_1:
    X = pm.MutableData('X', X_train)
    y = pm.MutableData('y', y_train)

    constant = pm.Normal('constant', mu=-0.5, sigma=0.1)
    beta = pm.Normal('beta', mu=0, sigma=1, dims="predictors")
    score = pm.Deterministic('score', X@beta)
    noisy_score = pm.Normal('noisy_score', mu=score, sigma=5)
    p = pm.Deterministic('p', pm.math.sigmoid(constant + noisy_score))

    # define likelihood
    observed = pm.Bernoulli('obs', p, observed=y)

    step = pm.NUTS()
    idata = pm.sample(step=step, random_seed=[666,666,666,666])
    idata_prior = pm.sample_prior_predictive(samples=50)
    
    # above we have trained the model with `X_train`
    # below we predict `X_test`
    with model_1:
        pm.set_data({'X':X_test, 'y':np.zeros_like(y_test)})
        y_pred = pm.sample_posterior_predictive(idata)

idata1 = idata.copy()
idata_prior1 = idata_prior.copy()

You should use sample(random_seed=x). The best is to pass a numpy Generator, like:

rng = np.random.default_rng(666)
...
  pm.sample(random_seed=rng)
  pm.sample_prior_predictive(random_seed=rng)
  pm.sample_posterior_predictive(..., random_seed=rng)

Global seed is ignored. Multiple seeds per chain as with [666, 666, 666, 666], uses each number for each chain. It doesn’t make much sense to pass the same values, because you don’t want your chains to be identical. And since you want multiple values you can just pass rng and PyMC will draw 4 values for you.

2 Likes