PyTensor errors from using a sparse precision matrix for Gaussian data

ckrapu · March 31, 2025, 3:56am

I have multivariate normal data which is correlated according to a DAG structure. I’ve been struggling to debug a specific error ValueError: MulSD.grad returned a term with 2 dimensions, but 0 are required.

Here’s my model:

# Model for y ~ N(μ, Λ^-1) where Λ = (I - γW)^T Ω (I - γW)
# and W is a sparse directed graph, γ is a off-diagonal precision parameter
# Number of rows is `n` and number of covariates is `p`

with pm.Model() as model:
  ω = pm.HalfNormal('ω', shape=n)
  β = pm.Normal("β", shape=p)
  γ = pm.HalfNormal("γ")
  μ = X @ β
  δ = pm.math.constant(y) - μ
  I = sparse.square_diagonal(pt.ones(n))
  W = sparse.as_sparse_or_tensor_variable(W_scipy) * γ

  IminusW = sparse.sub(I, W)

  Ω = sparse.square_diagonal(ω)
  Λ = sparse.basic.mul(sparse.basic.mul(IminusW, Ω,), IminusW)
  Λδ = sparse.dot(Λ, δ[:, None])

  quaddist = pt.dot(δ[None, :],Λδ).squeeze() # works with quaddist = pt.dot(δ, δ)

  logdet = pt.sum(pt.log(ω))

  norm = -0.5 * pm.math.constant(n) * np.log(2 * np.pi)  # log(2 * pi)
  logp =  norm - 0.5 * quaddist + 0.5 * logdet
  pm.Potential('likelihood', logp)

  trace = pm.sample(chains=1, cores=1, tune=5, draws=5)

Any advice is greatly appreciated!

Here’s a notebook with a runnable example showing what I’m trying to do:

jessegrabowski · March 31, 2025, 6:22am

Looks like a bug in the gradient of MulSD (sparse * dense). pt.grad(W.sum(), γ) is sufficient to trigger the error. My guess is that this is related to broadcasting. Pytensor knows the output of this should be the shape of W, but then it sees that γ is a scalar.

Simplifying even further, here’s the MulSD graph:

W_pt = sparse.as_sparse_or_tensor_variable(W_scipy)
x = pt.dscalar('x')
(W_pt * x).dprint()

MulSD [id A]
 ├─ SparseConstant{csr,float64,shape=(3, 3),nnz=2} [id B]
 └─ x [id C]

And here a dense multiplication of the same type:

W_pt2 = pt.dmatrix('W_pt')
(W_pt2 * x).dprint()

Mul [id A]
 ├─ W_pt [id B]
 └─ ExpandDims{axes=[0, 1]} [id C]
    └─ x [id D]

Note the expand dims to make x.ndim == 2. I suspect we should also be doing this dimension broadcasting in the sparse case, at least in the gradients.

~~If you do the expand_dims yourself, it bypasses the error. But it would still be nice if you opened an issue~~ Just kidding, it makes the symbolic gradients work, but leads to a new error when trying to evaluate the graph.

  W = sparse.as_sparse_or_tensor_variable(W_scipy) * pt.expand_dims(γ, [0, 1])

ckrapu · April 1, 2025, 1:54am

Thanks a ton, @jessegrabowski!

I didn’t realize it was the scalar multiplication. This makes it work now and is functionally equivalent to the matrix-scalar broadcasting.

 Γ = sparse.square_diagonal(pt.ones(n) * γ)
 W = sparse.basic.mul(Γ, sparse.as_sparse_or_tensor_variable(W_scipy))

Topic		Replies	Views
Error using sparse array	5	32	August 9, 2024
Multidimensional input using Gaussian Process Questions	6	3988	June 28, 2017
Sparse observed tensor? Questions	1	471	February 7, 2019
Errors with sampling	3	247	December 5, 2023
PyMC/PyTensor Error with Multivariate Normal Distribution: Incompatible Elemwise Input Shape v5 modeling	1	584	December 29, 2023

PyTensor errors from using a sparse precision matrix for Gaussian data

Related topics