Slower Multiprocess Sampling vs Sequential PyMC (v5.10.4)

Here is what my model looks like:

with pm.Model() as model:
	# Priors for the mean of treatment and baseline groups
	muBaseline = pm.Normal('muBaseline', mu=np.mean(dataBaseline), sigma=10)
	muTreatment = pm.Normal('muTreatment', mu=np.mean(dataBaseline), sigma=10)

	# Priors for the standard deviation of treatment and baseline groups
	sigmaTreatment = pm.HalfNormal('sigmaTreatment', sigma=10)
	sigmaBaseline = pm.HalfNormal('sigmaBaseline', sigma=10)

	Treatment_likelihood = pm.Normal('Treatment_likelihood', mu=muTreatment, sigma=sigmaTreatment, observed=dataTreatment)
	Baseline_likelihood = pm.Normal('Baseline_likelihood', mu=muBaseline, sigma=sigmaBaseline, observed=dataBaseline)
	
	diff_means = pm.Deterministic('diff_means', muTreatment - muBaseline)
	
	trace = pm.sample(500, chains = 4, cores = nCores) 
posterior = trace.posterior.stack(sample=['chain', 'draw'])
vals = posterior['diff_means']

I have one dataframe that I split into 2 based on the group (baseline/treatment) and use those as the inputs. I sample and then take the difference of means as my output.