Hi,
I’m writing a notebook explaining a bit the logistic regression model I built, but I’m struggling a bit with reproducibility.
I tried including np.random.seed(666)
in the cell where the model is being trained, and also tried the following:
pm.sample(step=step, random_seed=666)
pm.sample(step=step, random_seed = np.random.seed(666))
and
pm.sample(step=step, random_seed = [666,666,666,666]
.
The last try was the one that yielded the closest results when I ran it twice. Is there a way to set the seed properly so that I can reproduce the predicted probability so that I can write my explanation and exploration of the results?
Thanks in advance. Below the full code of my model:
np.random.seed(666)
with pm.Model(coords={"predictors":X_original.columns.values}) as model_1:
X = pm.MutableData('X', X_train)
y = pm.MutableData('y', y_train)
constant = pm.Normal('constant', mu=-0.5, sigma=0.1)
beta = pm.Normal('beta', mu=0, sigma=1, dims="predictors")
score = pm.Deterministic('score', X@beta)
noisy_score = pm.Normal('noisy_score', mu=score, sigma=5)
p = pm.Deterministic('p', pm.math.sigmoid(constant + noisy_score))
# define likelihood
observed = pm.Bernoulli('obs', p, observed=y)
step = pm.NUTS()
idata = pm.sample(step=step, random_seed=[666,666,666,666])
idata_prior = pm.sample_prior_predictive(samples=50)
# above we have trained the model with `X_train`
# below we predict `X_test`
with model_1:
pm.set_data({'X':X_test, 'y':np.zeros_like(y_test)})
y_pred = pm.sample_posterior_predictive(idata)
idata1 = idata.copy()
idata_prior1 = idata_prior.copy()