Using glm as a prior to inference a hidden variable

total_churn = [101]
formula = 'θ ~ ' + ' + '.join(['%s' % variable for variable in x_train.columns])
with pm.Model() as model:
    family = pm.glm.families.Normal()
    fitted_θ = pm.glm.GLM.from_formula(formula, data = x_train, family = family)
    churn_trials = pm.Bernoulli('churn_trials',p=θ,shape = (len(x)) )
    λ = theano.tensor.sum(churn_trials)
    observations = pm.Poisson('observations',λ, observed = total_churn)
    trace = pm.sample(10000,tune =10000 ,cores = 4)

I’m trying to solve a problem where I’m trying to predict the likelihood of a person leaving the company based on their answers on a particular survey. However, I don’t have the data on which specific individual had left the company, I only have the net change in the number of people in a particular department. So I decided to treat whether or not an employee would leave within a year as a Bernoulli Trial and try to inference the probability of each Bernoulli trial theta as a measure of the likelihood. The idea is to use the number of successful trials as the mean to a Poisson distribution that represents the evidence (num of employees that left the company). However, I need to also represent the conditional probability P(theta|survey answers) as a Bayesian linear model but the target values theta are unknown. How do I go about doing something like this? Can anyone show me an example where this is done? Thanks!

well, basically you want to infer n = x_train.shape[1] number of parameters from 1 observation - you are not going to have enough information from the data and the posterior you get will most likely be dominated by the prior.
Moreover, since you don’t have the data on which specific individual had left the company, all the information will just “flow” back to the intercept (as that’s the common factor shared across) - it is thus impossible to estimate the coefficient accurately.

Thanks for the reply. I figured that the solution would be meaningless from a single sample, but it was just an idea I wanted to test out before throwing away the dataset. But what did u mean by information flowing back to the intercept wouldn’t that be the node with the bernoulli trials? Isn’t that what I want? It’s a bit of a noob question but I’m still learning how to work with pgms.

I suppose let’s say I had enough evidence with more samples available, would this method be valid then?

So what you want to know is the posterior of the coefficient (beta) of the linear function y = X*beta, but if you dont have per row data but just the aggregated count, you cannot only infer the information related to sufficient statistics (e.g., the mean) of y.
I think the easiest to see this is simulate some data and try modeling it.