Oh cool, I hadn’t known that some samplers are designed to make use of shared variables like that. Here’s a quick test based on a very simple model. However, after seeing the numbers, I think that a recent coding sprint might have messed up my ability to make use of the shared variable optimizations. I recently reworked my code to only expose one likelihood Op to the pymc model, rather than individual likelihood ops for each likelihood node (this allowed me to more easily debug the calls to each node, implement the caching feature, and explore why each sampling step was taking so long). If this makes sense, I’ll have to do some work to re-expose each likelihood node as its own Op again.
Testing models such as the following:

In the above model, M=4 because we’re sampling four drift rate parameters. As mentioned there’s formally four likelihood nodes, but pymc just sees one.
I can count the number of times that the aforementioned logp cache is used, for each likelihood node. Every time the cache is used, the likelihood node is getting passed a parameter set identical to a parameter set that it has seen before. For a few varying values of M (in this case, both model dimensionality and # of likelihood nodes), here are the cache counts after 100 sampling steps (slice and metropolis run separately):
M=1:
Slice: [100]
Metropolis: [100]
M=2:
Slice: [765, 841]
Metropolis: [300, 300]
M=4:
Slice: [2136, 2229, 2131, 2223]
Metropolis: [700, 700, 700, 700]
M=8:
Slice: [5216, 5204, 5138, 5153, 5153, 5133, 5145, 5118]
Metropolis: [1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500]
Big differences, but Metropolis still has a lot of cache hits. I figure that this is because it’s not making use of the shared optimization; rather, the differences we see are an indication of Slice making calls to logp for each sampling step. Still an indication of the redundant computations point, but not yet able to show whether the metropolis implementation addresses it.