Sampling semantics of multiple observed variables

Your model looks like a hierarchical linear model. The general way of doing this relies on a covariate data matrix that might look like

intercept  media_A  media_B  input_XOR
        1        1        0       0    
        1        1        0       1   
        1        0        1       0    
...

Here the XOR of the input bits is pre-computed. Then:

X = theano.shared(data_matrix)
y_obs = theano.shared(data_obs)
with pm.Model() as mod:
    beta_vals = pm.Normal('beta', 0., 2., shape=4)
    err_sd = pm.HalfNormal('err_sd', 1.)
    y_pred = pm.Deterministic('y_pred', tt.dot(X, beta_vals))
    lik = pm.Normal('y_obs', mu=y_pred, sd=err_sd, observed=y_obs)

In this way the (independent) observations map to the (dependent) observations through the unknown parameters; and X[i,:] corresponds to y_obs[i].

see

for more examples