Understand root cause of high memory utilization

Yes I think it’s the number of parameters. I can imagine the computational graph (logp and gradient) easily taking what’s left of your RAM.