I just got access to GitHub Copilot and thought it would be fun to see whether or not their language model can capture the subtleties of Bayesian modeling. I loaded it up in VSCode and gave it a few prompts. Here’s what I found:
My first prompt was this:
import pymc3 as pm def pymc_regression_model(X,y):
And here’s what it suggested:
def pymc_regression_model(X,y): with pm.Model() as model: alpha = pm.Normal('alpha', mu=0, sd=10) beta = pm.Normal('beta', mu=0, sd=10, shape=X.shape) mu = alpha + pm.math.dot(X, beta) pm.Normal('y', mu=mu, sd=1, observed=y) return model
which looks like a perfectly reasonable linear regression. The only thing I would pick on is that the error variance is fixed to 1.
Let’s try a logistic regression version, by modifying the above function definition to
pymc_logistic_regression_model instead. This is what we get:
with pm.Model() as model: # Define the priors alpha = pm.Normal('alpha', mu=0, sd=10) beta = pm.Normal('beta', mu=0, sd=10, shape=X.shape) # Define the likelihood likelihood = pm.math.dot(X, beta) + alpha pm.glm.GLM.from_formula('y ~ x', data=dict(x=X, y=y), family=pm.glm.families.Binomial()) # Inference! step = pm.Metropolis() trace = pm.sample(100000, step=step) return trace
It looks like it’s struggling to get the right format. It has elements of the right answer, i.e. using the GLM module with a non-Gaussian likelihood, but it selects the binomial link instead. It also looks like it’s attempting to use both the GLM module and the standard model definition via priors and likelihoods, which isn’t a good idea. It also tries to use the Metropolis-Hastings sampler >:( rather than relying on NUTS, which wouldn’t make sense for a logistic regression model.
I tried many other prompts including
gaussian_process_model but these more elaborate prompts all returned essentially a basic linear regression (though with docstrings that mentioned spatial / GP parts). It did a little better with
pymc_ab_test which at least used a Bernoulli likelihood which would make sense for click-through and other types of discrete statistics.