Using sampling to decide network architecture

I dont think you can do this in PyMC3, as currently it does not accept tensor as shape argument in the definition of random variables. As a workaround, you can set a maximum number of layers and neurons as the shape of the random variable, and setting some of the coefficient / weight to zeros according to some distributions.