Learning Row-Wise Probabilities of 2D Binary Matrix

Hi all,

Let me start by saying, many thanks for this great tool and community!

I am working on a project that involves sampling network structure and wanted to try using pymc with binary representations of network edges (i.e. a ‘1’ represents the presence of an edge, ‘0’ a non-edge). My idea is to let network observations “trickle down” with a Bayesian Network approach. Each node in the Bayes Net represents the presence of an edge between related nodes in the observed network. For now, I just focus on probabilities of connections based on the presence/non-presence of other connections in the observed network (i.e. the fact that a node of type A has “n” connections to nodes of type B is a prior to the number of connections that the node has with nodes of type C.)

Unfortunately, I’ve had trouble getting my simple proof-of-concept script to do what I want. More details below:

# This represents the connections of two nodes of type A to nodes of type B
#                        NodeA_1  NodeA_2
connections_to_node_b = [[0,0,0],[1,1,1]]
# This represents the connections of two nodes of type A to nodes of type C
#                        NodeA_1   NodeA_2
connections_to_node_c = [[1,1,1],[0,0,0]]
# We see that NodeA_1 connects only to nodes of type c, and NodeA_2 connects 
# only to nodes of type b

# Now I want to let connections_to_node_b define the prior for 
# connections_to_node_c  as a linear relationship
with pm.Model() as model:
    
    # uninformed prior for first variable
    var1_prior = pm.Beta("var1_prior", .5, .5)
    var1_posterior = pm.Bernoulli("var1_posterior", p=var1_prior, observed=connections_to_node_b)
    
    # build a linear model to be learned
    intercept = pm.Normal("intercept", mu=0, sigma=1)
    slope = pm.Normal("slope", mu=0, sigma=1)
    
    # take the row-wise mean of the var1 posterior
    # for now I want to learn just one probability per row
    row_wise_means = pm.Deterministic("means", pt.mean(var1_posterior,axis=1))
    linear = pm.Deterministic("linear", (slope * row_wise_means) + intercept)
    # invlogit to convert linear output to probabilities
    sigmoid = pm.Deterministic("sigmoid", pm.invlogit(linear))
    
    # parameterize var2 with probabilities from var1 posterior
    alpha = pm.Deterministic("alpha", sigmoid + .01)     
    beta = pm.Deterministic("beta", (1 - sigmoid) + .01) 
    var2_prior = pm.Beta("var2_prior", alpha, beta)
    # expand dims to have a (2,1) prior representing 
    # the probability of a "1" for each row
    var2_prior_expanded = pt.expand_dims(var2_prior,axis=1)
    var2_posterior = pm.Bernoulli("var2_posterior", p=var2_prior_expanded, observed=connections_to_node_c)

with model:
    trace = pm.sample(1000, tune=1000, nuts={"max_treedepth":100,"target_accept":.95})
    posterior = pm.sample_posterior_predictive(trace, extend_inferencedata=True)

In spite of a few divergences, the trace plot shows that two distinct probabilities are learned, one for each row of connections_to_node_c.

Unfortunately, the sampled data does not reflect these probabilities. The proportion of sampled 1s vs. 0s is nearly identical between the two “nodes” (rows of connections_to_node_c).

I am very curious as to reasons why the sampled data might not reflect the learned probabilities in this case? Interestingly, when the code is changed to use an uninformed prior that is not inherited from another sampled distribution, the desired result is achieved (as shown below).

connections_to_node_c = [[1,1,1],[0,0,0]]
with pm.Model() as model:
    var2_prior = pm.Beta("var2_prior", .5, .5, size=(2,))
    var2_posterior = pm.Bernoulli("var2_posterior", p=pt.expand_dims(var2_prior, axis=1), observed=connections_to_node_c)

The learned probabilities are similar to the first example. However, now we can see in the bar plot that the samples do a good job of reflecting these probabilities.


I would really appreciate any suggestions, even if it is just ideas for where to start debugging! Many thanks!