Although it’s not new topic, but i couldn’t understand much on this subject from the previous posts also because I am a beginner. I have tried changing SD to 10, or even 100.
Code: https://github.com/Harpreetsingh31/MachineLearning/blob/master/Classification/Breastcancer/Logistic-PyMC3.ipynb
Data: https://github.com/Harpreetsingh31/MachineLearning/tree/master/Classification/Breastcancer
Non-probabilistic models (sci-kit) are in Jbook: “aio.ipynb”
Code implementation, I followed was from Nicole Carlson.
I want to know what is causing this error and possible solutions.
Many thanks in advance.
Hello Harpreetsingh31,
In your code you use
df[df == '?'] = np.nan
which I can imagine will mess things up if those nan values are present while training. Can you make sure all those values are dropped?
Yeah, i thought the same, so I basically dropped that entire column (‘bare_nuclei’)
##IMP NOTE: I basically dropped ‘bare_nuclei’ bcuz of 16 “?” values
X = scale(np.array(df.drop([‘class’,‘bare_nuclei’],1)))
When you do
for RV in logistic_model.basic_RVs:
print(RV.name, RV.logp(linear_model.test_point))
The output shows that the logp of observed y is inf
This means that the input p
is out of support.
Try doing p.tag.test_value
and see where p is returning value outside of [0, 1] (the support of p
)
I added the line “p.tag.test_value” in for loop, it prints
alpha -0.9189385332046727
[0.5]
betas -7.351508265637381
[0.5]
y -inf
[0.5]
So, Bernoulli returns inf for p=0.5.
I would say there are a couple of problem:
1, the observed y
contains value 2 and 4 instead of 0 and 1, which cause Bernoulli goes inf
2, you get one value for p
, but you should get a vector the same size as y
Try the code below:
#Split Data
X_tr, X_te, y_tr, y_te = train_test_split(X, y/2-1, test_size=0.2, random_state=42)
#Sharedvariable
model_input = shared(X_tr)
model_output = shared(y_tr)
with pm.Model() as logistic_model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0,sd=1)
betas = pm.Normal("betas", mu=0, sd=1, shape=(X.shape[1], 1))
# Expected value of outcome
p = pm.invlogit(alpha + T.dot(X_tr, betas))
# Likelihood (sampling distribution of observations)
y = pm.Bernoulli('y', p, observed=model_output)
1 Like
Thank you so much Junpenglao, I am sorry for that silly mistake, I had actually made changes to the dataframe but for some reason, they didn’t get reflected in ‘y’. Anyways, I think things are making sense now.
However, when i am calculating accuracy, I see that pred vector has got the dimensions as that of y_tr instead of y_te.
Unrelated questions:
-
Do we have probabilistic logistic regression model for multi-class problems?
-
And is there easier or general way to plot “Uncertainty in predicted value”?.
In the linked article, Thomas Wiecki does by defining a grid and then plotting contour on top but i replicate for my application of NeuralNetwork (git) because of reshape function .
https://blog.quantopian.com/bayesian-deep-learning/
You need to do model_input.set_value(X_te)
Yes, you can model the observed classes as a Categorical random variable. Usually ppl do a softmax on a matrix and use the matrix as p
for the Categorical.
There are many ways to do it, one way is to plot each ppc_sample and then plot the observed on top. For example, see the visualization in Motif of the Mind | Junpeng Lao, PhD
You can also get some inspiration from http://docs.pymc.io/notebooks/posterior_predictive.html
1 Like