PyTensor errors from using a sparse precision matrix for Gaussian data

I have multivariate normal data which is correlated according to a DAG structure. I’ve been struggling to debug a specific error ValueError: MulSD.grad returned a term with 2 dimensions, but 0 are required.

Here’s my model:

# Model for y ~ N(μ, Λ^-1) where Λ = (I - γW)^T Ω (I - γW)
# and W is a sparse directed graph, γ is a off-diagonal precision parameter
# Number of rows is `n` and number of covariates is `p`

with pm.Model() as model:
  ω = pm.HalfNormal('ω', shape=n)
  β = pm.Normal("β", shape=p)
  γ = pm.HalfNormal("γ")
  μ = X @ β
  δ = pm.math.constant(y) - μ
  I = sparse.square_diagonal(pt.ones(n))
  W = sparse.as_sparse_or_tensor_variable(W_scipy) * γ

  IminusW = sparse.sub(I, W)

  Ω = sparse.square_diagonal(ω)
  Λ = sparse.basic.mul(sparse.basic.mul(IminusW, Ω,), IminusW)
  Λδ = sparse.dot(Λ, δ[:, None])

  quaddist = pt.dot(δ[None, :],Λδ).squeeze() # works with quaddist = pt.dot(δ, δ)

  logdet = pt.sum(pt.log(ω))

  norm = -0.5 * pm.math.constant(n) * np.log(2 * np.pi)  # log(2 * pi)
  logp =  norm - 0.5 * quaddist + 0.5 * logdet
  pm.Potential('likelihood', logp)

  trace = pm.sample(chains=1, cores=1, tune=5, draws=5)

Any advice is greatly appreciated!

Here’s a notebook with a runnable example showing what I’m trying to do:

Looks like a bug in the gradient of MulSD (sparse * dense). pt.grad(W.sum(), γ) is sufficient to trigger the error. My guess is that this is related to broadcasting. Pytensor knows the output of this should be the shape of W, but then it sees that γ is a scalar.

Simplifying even further, here’s the MulSD graph:

W_pt = sparse.as_sparse_or_tensor_variable(W_scipy)
x = pt.dscalar('x')
(W_pt * x).dprint()

MulSD [id A]
 ├─ SparseConstant{csr,float64,shape=(3, 3),nnz=2} [id B]
 └─ x [id C]

And here a dense multiplication of the same type:

W_pt2 = pt.dmatrix('W_pt')
(W_pt2 * x).dprint()

Mul [id A]
 ├─ W_pt [id B]
 └─ ExpandDims{axes=[0, 1]} [id C]
    └─ x [id D]

Note the expand dims to make x.ndim == 2. I suspect we should also be doing this dimension broadcasting in the sparse case, at least in the gradients.

If you do the expand_dims yourself, it bypasses the error. But it would still be nice if you opened an issue :slight_smile: Just kidding, it makes the symbolic gradients work, but leads to a new error when trying to evaluate the graph.

  W = sparse.as_sparse_or_tensor_variable(W_scipy) * pt.expand_dims(γ, [0, 1])
1 Like

Thanks a ton, @jessegrabowski!

I didn’t realize it was the scalar multiplication. This makes it work now and is functionally equivalent to the matrix-scalar broadcasting.

 Γ = sparse.square_diagonal(pt.ones(n) * γ)
 W = sparse.basic.mul(Γ, sparse.as_sparse_or_tensor_variable(W_scipy))
1 Like