Beginner in Pymc3 here. I am trying to implement the bayesian network described in this paper.
The bayesian network described in the paper contains the sensitive attributes S, the latent fair labels D_f, the observed labels D and the features X. I want to do inference on P(D_f|X, S) to predict the fair label from observed data using a trained model.
I am using the adult dataset where S = gender, X = workclass, D = income. I write the model step by step but when i try to model P(D|D_f, S) i face some issues
with pm.Model() as model: # Data S_shared = pm.Data("S_obsered", sensitive['gender'][:100]) D_shared = pm.Data("D_observed", y[:100]) W_shared = pm.Data("W_observed", X['workclass'][:100]) # Sensitive Attribute beta_s = pm.Beta("beta_s", alpha=1, beta=1, shape=2) S = pm.Categorical("S", p=beta_s, observed=S_shared) # Fair Labels beta_f = pm.Beta("beta_f", alpha=1, beta=1, shape=2) Df = pm.Categorical("Df", p=beta_f) # Data Labels P(D | Df, S) intercept_D = pm.Normal("intercept_D", mu=0, sd=1) beta1_D = pm.Normal("beta1_D", mu=0, sd=1) beta2_D = pm.Normal("beta2_D", mu=0, sd=1) mu_D = pm.math.invlogit(intercept_D + beta1_D*S + beta2_D*Df) D = pm.Bernoulli("D", p=mu_D, observed=D_shared) # Sample (Only 50 for testing purposes) trace = pm.sample(50)
When i change the line
mu_D = pm.math.invlogit(intercept_D + beta1_D*S + beta2_D*Df)
mu_D = pm.math.invlogit(intercept_D + beta2_D*Df)
it works. Otherwise i get the error
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\tools\Anaconda3\envs\forseti\lib\site-packages\theano\compile\function\types.py in __call__(self, *args, **kwargs) 973 outputs = ( --> 974 self.fn() 975 if output_subset is None TypeError: expected type_num 1 (NPY_INT8) got 9 During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_18340/784928753.py in <module> 26 27 # Sample ---> 28 trace = pm.sample(50)
So somehow i am not understanding how to treat observed variables and how to model bayesian networks using pymc3. I model the relationship between two categorical variables to a binary variable as a logistic regression. For the categorical attributes in X i plan to model it as a logistic regression.
Is there someone more experienced in pymc3 who could give me some tips on how to implement the bayesian network described and how to treat observed variables?