Difference between shape = (2,1) & shape = 2

Hi,

Recently, I felt something very weird here. I am doing some toy examples for code verification. Here, I believed that two models should return the same results. But, the only second model returns me correct results (the first model gives me posterior distributions around 0)… The only difference between two models are whether I implied shape as a vector or int. I really want to know why such things happen. Could someone please explain the reason for the situation?

    with pm.Model() as model1: 
      n_model_1 = df_x.shape[1]

    sigma_ard = pm.Gamma("sigma_ard", alpha = 2, beta =  0.01, shape = n_model_1)
    beta = pm.Normal("beta", mu = 0, sigma = sigma_ard, shape = n_model_1)
    sigma_ard0 = pm.Gamma("sigma_ard0", alpha = 2, beta = 0.01)
    beta0 = pm.Normal("beta0", mu = 0, sigma = sigma_ard0)
    sigma = pm.Gamma("sigma", alpha = 2, beta = 0.01)
    
    y = pm.Normal("y", mu = pm.math.dot(df_x, beta) + beta0, sigma = sigma, observed = df_y)

    sample = pm.sample()

    az.plot_posterior(sample, var_names= "beta");


    with pm.Model() as model2: 
    n_model_1 = df_x.shape[1]

    sigma_ard = pm.Gamma("sigma_ard", alpha = 2, beta =  0.01, shape = (n_model_1,1))
    beta = pm.Normal("beta", mu = 0, sigma = sigma_ard, shape = (n_model_1,1))
    sigma_ard0 = pm.Gamma("sigma_ard0", alpha = 2, beta = 0.01)
    beta0 = pm.Normal("beta0", mu = 0, sigma = sigma_ard0)
    sigma = pm.Gamma("sigma", alpha = 2, beta = 0.01)
    
    y = pm.Normal("y", mu = pm.math.dot(df_x, beta) + beta0, sigma = sigma, observed = df_y)

    sample = pm.sample()

az.plot_posterior(sample, var_names= "beta");

Hi, that’s a really good question. It’s part of the broadcasting rules built into systems like numpy and pytensor. Here’s a weird thing about numpy:

np.array([3,4]) + np.array([[7],[6]])

yields:

array([[10, 11],
       [ 9, 10]])

Because it adds 7 to [3,4] for the top row and 6 to [3,4] for the bottom row. But in regular linear algebra, that’s not a valid operation. You can only add vectors of the same shapes. At some point the numpy/matlab projects decided to implicitly add dimensions to make these valid operations.

The same broadcasting rules give rise to your problem. Suppose my data matrix is 5x2 and my coefficient vector is 2x1:

df_x = np.random.normal(size=(5,2))
beta = np.array([[2],[3]])

Then my output vector is 5x1

df_y = df_x @ beta

It is important that I preserve these same shape requirements when I code the pymc model. When I use shape = 2, I lose those shape constraints. Mu will turn out to be a (5,) vector but df_y is a (5,1).

  with pm.Model() as model1: 
      beta = pm.Normal("beta", mu = 0, sigma = 1, shape = 2)
  
      mu = pm.math.dot(df_x, beta)
      
      y = pm.Normal("y", mu = mu, observed=df_y)

If you try to draw from y, it will throw an error saying a (5,5) is not compatible with your observations.

pm.draw(y)

Where did the (5,5) come from? The same funny broadcasting rules. If you take the (5,) mu vector and add a (5,1) vector of random noise, you get the (5,5) matrix. Yet we told pymc that the output of y should look like df_y, a (5,1). So it throws the error.

If you try flattening df_y into a (5,) vector the problem goes away. So you’ll just want to stay consistent, either working with column vectors everywhere or flatten vectors everywhere.

7 Likes

That clarified my problem so clearly. Thanks for your detailed explanation!

Best,

Jinyoung

2 Likes