Thank you Christian! I read through the posts you suggested and through the documentation for xarray. It was particularly clear from your post where you compare the “old” and “new” method, latter being with xarray.
If I can summarize what I understood, hoping this is correct, it is that shape and dims/coords are identical to the MCMC (or any other) sampler. That is, the parameter space will be treated as a collection of variable that are independent. It is just that dims/coords and the use of xarray makes it much more intuitive, because names / attributes can be assigned to the data.
So, these two parameter designations should give identical results when running inference, with the exception that the latter is more intuitive. Below I tried running a test but I was not able to compare the MCMC runs. I thought of doing it through the acceptance rate (two identical runs with the same seed should have exactly the same acceptance rates, right?) but I was not able to get this confirmed. I think I am not properly initiating the inference so that the results are exactly reproducible. But the approaches should be identical, right?
Thanks again for the help!
Edit: I include the random_seed argument when running the samplers but results are still not exactly reproducible, based on subtracting acceptance rates
# import needed packages
import pymc as pm
import numpy as np
import arviz as az
# set dimensions and create random state for reproducibility
dims = 50
rng = np.random.default_rng(5082022)
# create data
dip_data = rng.standard_normal(dims)
azimuth_data = rng.standard_normal(dims)
# run the inferene using dimensions and coordinates
dimension_names = ['Dim ' + str(d) for d in range(dims)]
with pm.Model(coords={"dip": dimension_names,
"azimuth": dimension_names}) as model:
# initialize parameter vectors
dips = pm.Normal("dip vector", mu=0, sigma=1, dims='dip')
azimuths = pm.Normal("azimuth vector", mu=0, sigma=1, dims='azimuth')
# observed data
observed_dip = pm.Normal(
'dip angle',
mu=0,
observed=dip_data,
dims='dip')
observed_azimuth = pm.Normal(
'azimuth angle',
mu=np.zeros(dims),
observed=azimuth_data,
dims='azimuth')
compute_trace_dims = pm.sample(1000, tune=1000, cores=1, random_seed=5082022)
#run the inference using shape to define dimensions
with pm.Model() as model2:
# parameters
dips2 = pm.Normal('dips', mu=0, sigma=3.0, shape=dims)
azimuths2 = pm.Normal('azimuths', mu=0, sigma=3.0, shape=dims)
# observed data
observed_dip2 = pm.Normal(
'dip angle',
mu=0,
observed=dip_data
)
observed_azimuth2 = pm.Normal(
'azimuth angle',
mu=0,
observed=azimuth_data
)
compute_trace_shape = pm.sample(1000,
tune=1000,
cores=1,
random_seed=5082022)
# az.plot_trace(compute_trace_dims)
# az.plot_trace(compute_trace_shape)
compute_trace_dims.sample_stats.acceptance_rate - compute_trace_shape.sample_stats.acceptance_rate