Neural networks are nothing more than affine transforms + nonlinearity functions.
I tried doing a talk explaining it at PyData NYC, “An attempt at demystifying Bayesian deep learning.”
(On my phone right now, so I can’t conveniently get the link, but you’ll definitely be able to find it via Google.)
If you get how to write linear and logistic regression using PyMC3, then you’ll be able to see the parallels to deep learning from that talk.