I assume the model is not converging because it is under-identified. For a given value of a, you have a single observation. So the relationship between a and ax is extremely loose and a could take on many, many different values even conditional on the observed data. It looks like you are trying to compensate for the lack of observed data (per value of a) by cranking down the sigma parameter of y_samp, but that seems less than ideal.
I would encourage you to take a look at the various examples of hierarchical model in the PyMC documentation (e.g., this, this, this, this, this, as well as this) to get an idea of the type of data sets that are typically seen with hierarchical model as well as some of the modeling choices one has to make when designing a hierarchical model.