Correctly defining model

I am new to PyMC3 and am trying to implement a rather simple model, but cannot seem to get it working.
I have observations \textbf{B} with shape (m\times1). Model parameters \textbf{m} of shape (n\times1). And a data kernel \textbf{G} with shape (m\times n).
Forward problem, \textbf{B}=\textbf{Gm}.
In this specific example m=270000 and n=3.

Below my intent was to define each element in \textbf{m} to have a normal prior distribution with mean zero and a standard deviation determined from another place. And that the residuals between should have a standard normal distribution.
When running NUTS with the settings below the problem does not converge. And I do not understand why. Can anyone tell me if the above model indeed defines the prior and likelihood I intended?
And is the matrix vector multiplication correct? I would normally use G @ m in python.
Some might ask if the problem might be with \textbf{B} or \textbf{G}.
The reason I do not think that’s the problem is because I have used the exact same observations and data kernel in STAN’s NUTS implementation and samples it successfully within a minute.

Thanks in advance.

basic_model = pm.Model()

with basic_model:

    # Prior distribution
    m = pm.Normal('m', mu=0, sd=limits/2, shape=(3))

    # Predictions
    pred =,m)

    # Likelihood
    Y_obs = pm.Normal('Y_obs', mu=pred, sd=1, observed=B)

with basic_model:
step = pm.NUTS()
chains = 4
draws = 1000

trace = pm.sample(draws=draws, step=step, chains=chains, njobs=njobs, tune=2000)

I had a similar issue and followed the trick in this post to reparameterize the Normal for your Y_obs, which worked beautifully.

Thanks you for the suggestion. But unfortunately this did not solve the problem.
Below I have appended the reparameterized version.
I also define the prediction as deterministic.
I normally start with a sample rate of around 10 draws/s with within the first 10% of the draws is reduced to 10 s/draw. Eventually a chain fails.

basic_model = pm.Model()

with basic_model:

# Priors for unknown model parameters
m_offset = pm.Normal('m_offset', mu=0, sd=1, shape=(3))
m = pm.Deterministic('m', m_offset*(limits/2))

# Expected value of outcome
pred = pm.Deterministic('pred',,m))

# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=pred, sd=1, observed=B)

Looks like this code only has the the reparameterization for the m draw, not the Y_obs draw too. It would require drawing some Y_obs_offset as a standard normal and then forming some Deterministic equal to,m) + sd * Y_obs_offset. What I’m not sure about at this point and haven’t tried before is if Deterministic variables can be observed?

P.s. I remember reading in some deep documentation that I can’t find anymore that NUTS starts of with a tree_depth of 8, and then switches to whatever max_treedepth is passed in, which defaults to 10 I believe. So that’s why you observe the dropoff in speed (and in my case I observed the same drastic dropoff and it was fixed just by reparameterizing).

P.p.s. How many times am I going to hit tab to indent code (which just my mouse focus to the reply button) and then space or enter and accidentally submit my post??