Hi there,
I am fairly new to pymc3 and was using it for some basic regressions to get started. So far I liked it a lot. My Problem is now getting a logistisc regression with weighted samples to run. For my problem one of the two classes is heavily undersampled and moreover some data points are more important to get right.
Without using weights my model looks like this:
model = pm.Model()
with model:
m = pm.Normal("m",0,sd=10,shape =len(cols))
x = df[cols].to_numpy()
w = np.ones(len(x))
labels = df["label"]
b = pm.Normal("b",0,sd=10)
score = pm.math.dot(x,m) + b
mu = pm.math.sigmoid(score)
likelihood = pm.Bernoulli("y",mu,observed=df["label"])
nsample = 500
ntune = int(0.5*nsample)
trace = pm.sample(nsample,tune=ntune,cores=4)
This model runs fine and I get values in the expected range. The Problem is now moving to the weighted regression. I read various posts about it like this for example and looked at the documentation for pm.potential. My last try at the problem was:
model = pm.Model()
with model:
m = pm.Normal("m",0,sd=10,shape =len(cols))
x = df[cols].to_numpy()
w = np.ones(len(x))
labels = df["label"]
b = pm.Normal("b",0,sd=10)
score = pm.math.dot(x,m) + b
mu = pm.math.sigmoid(score)
logp = pm.Bernoulli.dist(p=mu).logp(labels)
error = pm.Potential("error",logp)
nsample = 500
ntune = int(0.5*nsample)
trace = pm.sample(nsample,tune=ntune,cores=4
But this does not work either. I am happy for any help and recommendations.
Best regards,
Tim