Although it’s not new topic, but i couldn’t understand much on this subject from the previous posts also because I am a beginner. I have tried changing SD to 10, or even 100.
Non-probabilistic models (sci-kit) are in Jbook: “aio.ipynb”
Code implementation, I followed was from Nicole Carlson.
I want to know what is causing this error and possible solutions.
Many thanks in advance.
In your code you use
df[df == '?'] = np.nan
which I can imagine will mess things up if those nan values are present while training. Can you make sure all those values are dropped?
Yeah, i thought the same, so I basically dropped that entire column (‘bare_nuclei’)
##IMP NOTE: I basically dropped ‘bare_nuclei’ bcuz of 16 “?” values
X = scale(np.array(df.drop([‘class’,‘bare_nuclei’],1)))
When you do
for RV in logistic_model.basic_RVs:
The output shows that the logp of observed y is inf
This means that the input
p is out of support.
p.tag.test_value and see where p is returning value outside of [0, 1] (the support of
I added the line “p.tag.test_value” in for loop, it prints
So, Bernoulli returns inf for p=0.5.
I would say there are a couple of problem:
1, the observed
y contains value 2 and 4 instead of 0 and 1, which cause Bernoulli goes inf
2, you get one value for
p, but you should get a vector the same size as
Try the code below:
X_tr, X_te, y_tr, y_te = train_test_split(X, y/2-1, test_size=0.2, random_state=42)
model_input = shared(X_tr)
model_output = shared(y_tr)
with pm.Model() as logistic_model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0,sd=1)
betas = pm.Normal("betas", mu=0, sd=1, shape=(X.shape, 1))
# Expected value of outcome
p = pm.invlogit(alpha + T.dot(X_tr, betas))
# Likelihood (sampling distribution of observations)
y = pm.Bernoulli('y', p, observed=model_output)
Thank you so much Junpenglao, I am sorry for that silly mistake, I had actually made changes to the dataframe but for some reason, they didn’t get reflected in ‘y’. Anyways, I think things are making sense now.
However, when i am calculating accuracy, I see that pred vector has got the dimensions as that of y_tr instead of y_te.
Do we have probabilistic logistic regression model for multi-class problems?
And is there easier or general way to plot “Uncertainty in predicted value”?.
In the linked article, Thomas Wiecki does by defining a grid and then plotting contour on top but i replicate for my application of NeuralNetwork (git) because of reshape function .
You need to do
Yes, you can model the observed classes as a Categorical random variable. Usually ppl do a softmax on a matrix and use the matrix as
p for the Categorical.
There are many ways to do it, one way is to plot each ppc_sample and then plot the observed on top. For example, see the visualization in http://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html
You can also get some inspiration from http://docs.pymc.io/notebooks/posterior_predictive.html