Questions about effects in hierarchical binomial model

I have data that I am modeling using a hierarchical binomial model that looks something like this:

def hierarchical_effect(name, idx):
        mu    = pm.Normal(f"{name}_mu", mu=0, sigma=10, dims=name)
        sigma = pm.HalfNormal(f"{name}_sigma", sigma=1, dims=name)
        z     = pm.Normal(f"{name}_z", mu=0, sigma=10, dims="idx")
        eff   = pm.Deterministic(f"{name}_effect", 
                                mu[idx] + z * sigma[idx],
                                dims="idx"
                                )
        return eff

# Define the model
coords = {"a": [0, 1, 2, ....], 
          "b": [0, 1, 2, ... ],
          "c": [0, 1, 2, ....],
          "idx": range(1000)}

with pm.Model(coords=coords) as model:
    
    # Define effects
    a = hierarchical_effect("a", df["a"].values)
    b = hierarchical_effect("b", df["b"].values)
    c = hierarchical_effect("c", df["c"].values)
    
    # Modeled sigma for theta
    theta_sigma = pm.HalfNormal("theta_sigma", sigma=1)

    # Theta
    theta = pm.Normal(
        "theta",
        mu= a + b + c,
        sigma=theta_sigma,
        dims="idx"
    )

    # Probability transformation
    p = pm.Deterministic("p", pm.math.invlogit(theta), dims="idx")

    # Likelihood
    obs = pm.Binomial(
                    "obs",
                    n=n,
                    p=p,
                    observed=observed,
                    dims="idx"
                    )

To be clear, there is only one row for each unique combination of (a,b,c), but each group can appear multiple times in the data frame.

This is a multi part question:

Question 1:
How do I study the individual effects but in probability space? I know that invlogit(a) + invlogit(b) + invlogit(c) != invlogit(a+b+c). Presumably for two effects, I could simply compute the invlogit(a + b) - invlogit(b) to get the marginal of a in probability space? But what about an arbitrary number of effects? I specifically want to be able to say what is the value of every member of in"a" in probability values. Would it be as simple as averaging the posterior of p over the observations of each member in group a? e.g., something like posterior.sel(a=0).p.mean(("chain",'draw")) to average over all members of (b,c)? (just typing here so syntax may be slightly off) Also, should I be using pm.ZeroSumNormal to ensure the sum of effects is equal to zero?

Question 2:
I also have the case where I have continuous data that I want add in as linear model. So if I have df[['x', 'y', 'z']], I could add:

X = pm.Data('X', df[['x', 'y', 'z']].values)
beta = pm.Normal("beta", mu=0, sigma=3)

# Linear model
lm = (X * beta).sum(axis=-1)

# Linear model effect
lm_effect = pm.Deterministic("lm_effect",lm)
.
.
.
# Theta
theta = pm.Normal(
      "theta",
      mu= a + b + c + lm_effect,
      sigma=theta_sigma,
      dims="idx"
  )

Similar to above, how can I study the effect of df.x in isolation? This feels like a place for pm.do or something a little beyond information already in the posterior trace.

Question 3:
I have the data in non-aggregated space with 0/1 for successes and failures instead of total success/failure counts for each group member. As such, I have the values of (x,y,z) for every observation of sucess/failure. However, the number of rows is insane and very slow to train, and ultimately what I care about most is at the level of the p. I already fit a hierarchical model to the values of (x,y,z) independently to perform sample size regression for lower sample size observations. That means that I already have a full pymc model for each predictor in the linear model. These numbers are made up, but would it be valid to do something like this?

x = pm.Normal.dist(mu=.2, sigma=2.7)
y = pm.Normal.dist(mu=1, sigma=4.2) 
z = pm.Normal.dist(mu=-1, sigma=1) 

# Linear model
X = pt.stack([pm.draw(x, draws=1000), pm.draw(y, draws=1000), pm.draw(z, draws=1000)]).T

or

x = pm.Normal.dist(mu=x_mus, sigma=x_sigmas)
y = pm.Normal.dist(mu=y_mus, sigma=y_sigmas) 
z = pm.Normal.dist(mu=z_mus, sigma=z_sigmas) 

# Linear model
X = pt.stack([pm.draw(x, draws=1000), pm.draw(y, draws=1000), pm.draw(z, draws=1000)]).T

in the case where x,y,z have different means/sigmas for each group? e.g., x_mus = [1, 1.2, 1.1, 1 1.1, ....]

I assume that if (x,y,z) co-vary I could incorporate that information using MvNormal and the covariance?

Question 4:

Let’s say I actually have multiple events. The values above are just for a single event. I want the probability of some event, e, occurring for a combination of (a,b,c) and some data X.

However, I would like to ultimately get the probability of each event under some constraint that all event probabilities add up to 1?

Is it possible to have a single model that contains multiple binomial models, where each probability p is passed into a multinomial or categorical? So something like

p_e1 = invlogit(theta_e1)
binom_e1 = pm.Binomial(..., n=n_e1, p=p_e1, observed=observed_e1, dims="idx")

p_e2 = invlogit(theta_e2)
binom_e1 = pm.Binomial(..., n=n_e2, p=p_e2, observed=observed_e2, dims="idx")

p_e3 = invlogit(theta_e3)
binom_e3 = pm.Binomial(..., n=n_e3, p=p_e3, observed=observed_e3, dims="idx")

m = pm.Multinomial(n=[n_e1, n_e2, n_e3] , p = [p_e1, p_e2, p_e3])

It feels like the binomial shouldn’t be there with the multinomial already being the generalized binomial…

I have been able to do multinomial regression with continuous variables and used a Dirichlet-Multinomial when only observing counts, however, I want both the hierarchical effects and linear regression model as part of my estimation of each individual p.

Sorry, I know this is a beefy question but since they are all related to the same model I just threw them in one. Thanks!