Minibatch might not work well here as you can have some batch contain very little data, as the observed is mostly sparse. Moreover, since the formulation is summing over all pair of pixels, and minibatch would only summing a few pixels (ie pixels in that batch), wouldnt you get different model?
The formulation reminds me a lot of Mean field theory and Ising Model - would a similar decomposition works here also?