I am still learning about the concept of overfitting in Bayesian inference techniques.

From what I understand from works such as: Bishop, 2006 and McKay, 1992; in the Bayesian treatment overfitting is avoided through marginalising over the model parameters. There are also mentions of an Occam’s factor which penalises model complexity for large numbers of model parameters.

I’m wondering, how does pymc3 specifically avoid over-fitting in the regression sense?

Say my model set-up is something like:

```
with pm.Model() as model:
#hyperpriors
ℓ = pm.Uniform('ℓ', lower=0, upper=1, shape=X.shape[1])
α = pm.Uniform('α', lower=0, upper=1e-2)
σ = pm.Uniform('σ', lower=0, upper=2)
#priors
μprior = pm.gp.mean.Zero()
Σprior = α * pm.gp.cov.ExpQuad(input_dim=X.shape[1],active_dims=np.arange(X.shape[1]),ls=ℓ)
# GP function
gp = pm.gp.Marginal(mean_func=μprior, cov_func=Σprior)
y_ = gp.marginal_likelihood('y_',X=X, y=y, noise=σ)
trace = pm.sample(draws=2000, chains=4, cores=2, random_seed=1)
μpred, Σpred = gp.predict(X_new, point=trace[-1], pred_noise=True)
```

where `X`

and `y`

form my training data and have shapes `(10,100)`

and `(10,)`

respectively (so chances of overfitting in the non-Bayesian treatment are high). Then `X_new`

is a vector containing test cases for prediction, of shape `(1,100)`

. Considering this is a linear regression problem of the form `y = Xw`

where `w`

is the vector of shape `(100,1)`

containing my model parameters; does pymc3 use the Occam factor to penalise many of these parameter values? or is marginalising over them enough to prevent over-fitting? or both? Finally, does pymc3 offer a way to quantitatively check if a model is over-fitting?

References from above:

MacKay, D.J., 1992. Bayesian interpolation. *Neural computation* , *4* (3), pp.415-447.

Bishop, C.M., 2006. *Pattern recognition and machine learning* . springer.

Thanks.