I see, I understand your question now (I hope). I don’t know of any way to add variables automatically. However, it would be easy to do it in a loop.
import pymc as pm
import numpy as np
import pandas as pd
import pytensor.tensor as pt
y = np.random.normal(0,1,12)
x = np.random.normal(0,1,12)
z = np.random.normal(0,1,12)
w = list(np.repeat("high",6)) + list(np.repeat("low",6))
df = pd.DataFrame({"obs":y, "var":x, "cov":z, "cat":w})
cat_idx = pd.Categorical(df['cat']).codes
coords = {'loc':df.index.values, 'cat':df['cat'].unique()}
varias = []
for i in df.columns[1:3]:
coords[i] = df[i].unique()
varias.append(df[i])
with pm.Model(coords=coords) as model:
c_idx = pm.ConstantData("cat_idx", cat_idx, dims="loc")
d = {}
for i in range(len(df.columns[1:3])):
x = "b_"+str(i)
d[x] = pm.Normal(x, 0, 1, dims="cat")
a = pm.Normal("a", 0, 1)
d2 = [list(d.values())[i][c_idx]*varias[i] for i in range(len(varias))]
m = a + pt.sum(d2, axis=0)[c_idx]
s = pm.HalfNormal("s", 1)
y = pm.Normal("y", m, s, observed=df['obs'].values)
with model:
idata = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [b_0, b_1, a, s]
|████████████| 100.00% [8000/8000 00:34<00:00 Sampling 4 chains, 2 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 56 seconds.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
Personally, however, I think this is a bad idea (in general). Unless there’s a lot of certainty that each prior will have the same distribution and that the covariates make sense explanatorily or causally. I don’t know of any Bayesian stats package including functions for adding variables automatically (as far as my limited knowledge goes). Many of these variables may require different prior distributions or distributions parametrised in different ways. Also, adding a large number of variables (i.e. covariates) without knowing how these variables relate to each other in the model (i.e. causal dependencies, back-door criterion, etc. McElreath neatly summarises this topic), can have very bad consequences for inference. I hope this helps.