Why would scipy/numpy wrapped in @as_op cause much faster sampling than using pytensor operations?

Deterministic just stores the computations in the trace, the name is perhaps unfortunate. More background on why it can slowdown things here: Parallelizing chains with custom likelihood on multiple cores - #18 by aseyboldt