# Bayesian learning on logistic regression
# Lower and upper bounds for the uniform distribution
lower_b = -1000
upper_b = 1000
num_features = train_data.shape[1] # Number of features in a single training sample
model = pm.Model()
with model:
model_input = pm.Data('model_input', train_data) # Y = activation(theta*X)
model_output = pm.Data('model_output', train_labels)
# Priors for unknown parameters
theta = pm.Normal('theta', mu=0, sd=100, shape=num_features) # Logic regression parameters
# Getting the probability using the sigmoid activation function
p = pm.Deterministic('p', pm.math.sigmoid(pm.math.dot(model_input,theta)))
# Bernoulli trial with probability p
Y_obs = pm.Bernoulli('Y_obs', p=p, observed = model_output)
# Sample from the posterior using NUTS
idata = pm.sample(200, tune=1000, return_inferencedata=True)
I am a complete noob in Bayesian ML and this is literally the my first code in PyMC3. The dataset I have works fine with standard ML libraries in python like SVM, Random Forest, Logistic Regression.
I am trying to implement a Bayesian Version of Logistic Regression to provide further insight into the model parameters but I am getting the following error while performing MCMC sampling:
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RVtheta
.ravel()[1] is zero.
The derivative of RVtheta
.ravel()[2] is zero
.
.
.
The derivative of RVtheta
.ravel()[596] is zero.
The derivative of RVtheta
.ravel()[597] is zero.
The derivative of RVtheta
.ravel()[598] is zero.
The derivative of RVtheta
.ravel()[599] is zero.
I have tried searching for the solution online but couldn’t find anything that works for me.
My training data has dimension (2100,600) [2100 training examples with 600 features each]. I get decent testing accuracy when dealing with point estimate models of Logistic Regression provided in Python’s sklearn.
Any help here would be greatly appreciated.