ValueError: Mass matrix contains zeros on the diagonal (Bayesian ML noob problems)

# Bayesian learning on logistic regression

# Lower and upper bounds for the uniform distribution

lower_b = -1000

upper_b = 1000

num_features = train_data.shape[1] # Number of features in a single training sample

model = pm.Model()

with model:

    model_input = pm.Data('model_input', train_data)   # Y = activation(theta*X)

    model_output = pm.Data('model_output', train_labels)

    # Priors for unknown parameters
    theta = pm.Normal('theta', mu=0, sd=100, shape=num_features) # Logic regression parameters
 
    # Getting the probability using the sigmoid activation function

    p = pm.Deterministic('p', pm.math.sigmoid(pm.math.dot(model_input,theta)))

    # Bernoulli trial with probability p

    Y_obs = pm.Bernoulli('Y_obs', p=p, observed = model_output)  

    # Sample from the posterior using NUTS

    idata = pm.sample(200, tune=1000, return_inferencedata=True)

I am a complete noob in Bayesian ML and this is literally the my first code in PyMC3. The dataset I have works fine with standard ML libraries in python like SVM, Random Forest, Logistic Regression.
I am trying to implement a Bayesian Version of Logistic Regression to provide further insight into the model parameters but I am getting the following error while performing MCMC sampling:

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV theta.ravel()[1] is zero.
The derivative of RV theta.ravel()[2] is zero
.
.
.
The derivative of RV theta.ravel()[596] is zero.
The derivative of RV theta.ravel()[597] is zero.
The derivative of RV theta.ravel()[598] is zero.
The derivative of RV theta.ravel()[599] is zero.

I have tried searching for the solution online but couldn’t find anything that works for me.

My training data has dimension (2100,600) [2100 training examples with 600 features each]. I get decent testing accuracy when dealing with point estimate models of Logistic Regression provided in Python’s sklearn.

Any help here would be greatly appreciated.

More informative prior usually help, for example theta = pm.Normal('theta', mu=0, sd=1, shape=num_features)
Going a bit more in depth, you data contains a lot of columns (2100 training examples with 600 features each) - a sparse regression with horseshoe prior would be more appropriate.