Theano.scan and pm.Model() are strangely incompatible

4m data point - well I would say you definitively need either subsample your data or find some kind of approximation.
FYI, here is a post discussing implementation of HMM: How to marginalized Markov Chain with categorical?