Going back to your original question, if all you you want to do is estimate the parameters of some distribution, then use those parameters as priors in estimation of another distribution, I believe there is nothing from stopping you from having two likelihood functions in your model. Put hyper-priors over the parameters of whatever distribution you are interested in, then estimate that distribution using your historical data as observed.
Then, in the same model block, take those estimates parameters and use them as inputs to another layer of parameters, which inform your post-treatment model. In pseudo-code:
prior_mu = pm.Normal(...)
prior_sigma = pm.HalfCauchy(...)
historical_loglike = pm.Normal(mu=prior_mu, sigma=prior_sigma, observed=historical_data)
post_intervention_mu = pm.Normal(mu=prior_mu, ...)
post_intervention_sigma = pm.HalfNormal(...)
intervention_loglike = pm.Normal(mu=post_intervention_mu, sigma = post_intervention_sigma, observed=post_intervention_data)
But, I just really, really strong suspect all of this is equivalent to something like \mu_{i,t} = \alpha_i + \beta \cdot \text{intervention}_{i,t} + \alpha_i \beta \cdot \text{intervention}_{i,t}, where intervention = 1 if t >= T else 0, and T is the date of intervention, and i indexes the individuals in the study.
The other place I am struggling with what you want is that the groups are randomized. By construction, a priori, the expected group difference is zero. Everything related to deviation from that expectation is captured by the spread in your posterior estimates of the intervention effect.
In the example you give, there is no interpretation where we would conclude the effect of the intervention is negative, because the spread between the groups decreased. The linear relationship I wrote above would capture this change, as would a DID regression.