I tried using a Poisson likelihood instead of the negative binomial one but with the same result.
I am not sure rescaling the data is a valid option. I am essentially trying to implement the DESeq2 model in pymc but I must admit my theoretical background is a bit shaky. As common in RNASeq analysis we have many low counts, and the idea is to make a robust regression on the counts which avoids the infinite difference between data points with zero and non-zero counts. But maybe it is still possible to do using a gamma distribution (of which I have zero experience with).
I will try to generate small set of data that replicates my problem as I am unfortunately not allowed to share the real data.