How to model a multistep logic

I’m observing a continuous variable. The variable has some probability to be non-zero. The non-zero values are
expected to be normally distributed. Here’s an example:

np.random.seed(42)
true_p_non_zero = 0.3
true_mu = 1234.5
true_sd = 1.2
data = pd.DataFrame({'non_zero': np.random.rand(100) < true_p_non_zero})
y = np.random.randn(len(data)) * true_sd + true_mu
y[~data.non_zero] = 0
data['y']=y

My goal is to estimate the probability of being non-zero and the parameters of the normal distribution

The first part is easy:

with pm.Model() as model:
    p_non_zero = pm.Uniform('p_non_zero', lower=0, upper=1)
    non_zero = pm.Bernoulli('non_zero', p=p_non_zero, observed=data.non_zero)
    trace = pm.sample(100, tune=50)

The problem is that I can’t figure out how to use the next step in this logic.

I tried the following code but it produces wrong estimates:

with pm.Model() as model:
  p_non_zero = pm.Beta('p_non_zero', alpha=1, beta=1)
  non_zero = pm.Bernoulli('non_zero', p=p_non_zero)
  mu = pm.Uniform('mu', lower=1000, upper=2000)
  sd = pm.Uniform('sd', lower=0, upper=5)
  _mu = pm.math.switch(non_zero, 0, mu)
  _sd = pm.math.switch(non_zero, 0, sd)
  y = pm.Normal('y', mu=_mu, sd=_sd, observed=data['y'])
  trace = pm.sample(500, tune=150)

The following approach works but it looks very un-general to me:

with pm.Model() as model:
    sel_non_zero = data.y != 0
    p_non_zero = pm.Uniform('p_non_zero', lower=0, upper=1)
    non_zero = pm.Bernoulli('non_zero', p=p_non_zero, observed=sel_non_zero)
    mu = pm.Uniform('mu', lower=1000, upper=2000)
    sd = pm.Uniform('sd', lower=0, upper=5)
    y = pm.Normal('y', mu=mu, sd=sd, observed=data.loc[~sel_non_zero]['y'])
    trace = pm.sample(500, tune=150)

Again, this approach sort of works, but I suspect that there are better, or more elegant solutions.