I also feel sometimes that PyMC3 does not have sufficient documentation for certain features, yet there are many great examples and articles that explains most of the features done by the developers.
If you want to understand how GLM works please go through these articles.
It is mentioned in the article 1 that,
PyMC3’s glm() function allows you to pass in a family object that contains information about the likelihood.
Therefore, using the likelihood you can decide the type of regression that you want to performe (e.g. linear regression - normal, logisitic regression - binomial etc)
Data argument is not just simple the observed in this case. If you look at the articles they explain that the data is a data structure similar to pandas dataframe with all the data that are required to train the model. The column headers are important in this case because, the “vars” define the relationship between the observed and the predictor variables (in linear regression x values) as shown below (from article 2),
pm.glm.GLM.from_formula(‘income ~ age + age2 + educ + hours’, data, family=pm.glm.families.Binomial())
The income is the observed, others are the predictor variables. The attributes in this relationship are iidentified using the column headers. Since the family is Binomial() this is correspondent to logistic regression.
priors are to define the distributions for the priors. You can find most of those information if you read the docstring of the glm.py scripts from git repo.