total_churn = [101]
formula = 'θ ~ ' + ' + '.join(['%s' % variable for variable in x_train.columns])
with pm.Model() as model:
family = pm.glm.families.Normal()
fitted_θ = pm.glm.GLM.from_formula(formula, data = x_train, family = family)
churn_trials = pm.Bernoulli('churn_trials',p=θ,shape = (len(x)) )
λ = theano.tensor.sum(churn_trials)
observations = pm.Poisson('observations',λ, observed = total_churn)
trace = pm.sample(10000,tune =10000 ,cores = 4)
I’m trying to solve a problem where I’m trying to predict the likelihood of a person leaving the company based on their answers on a particular survey. However, I don’t have the data on which specific individual had left the company, I only have the net change in the number of people in a particular department. So I decided to treat whether or not an employee would leave within a year as a Bernoulli Trial and try to inference the probability of each Bernoulli trial theta as a measure of the likelihood. The idea is to use the number of successful trials as the mean to a Poisson distribution that represents the evidence (num of employees that left the company). However, I need to also represent the conditional probability P(theta|survey answers) as a Bayesian linear model but the target values theta are unknown. How do I go about doing something like this? Can anyone show me an example where this is done? Thanks!