Non-deterministic results between different machines

will_h · May 10, 2022, 7:58pm

Hi all,

I’ve been using v4.0.0b6 to fit a simple multivariate regression model to some data. In order to make the output deterministic there is a call to np.random.seed(1234). I’ve posted some relevant parts of the code below.

When adopting this approach, initially I thought all was well. Running pm.sample() repeatedly on the same machine appeared to give convergence to the same solution (checked by inspecting trace.model_logp, and looking at the model parameters. Aside: is this a valid way to do it?)

This was checked on the same machine running (a) in a poetry environment, and (b) inside a docker container; in both cases the environments were set up using the same poetry commands and in both cases the fitted model appeared to be identical using the inspection approach described above.

The same docker container was then used in CI to run the same model fit process, and it does not appear to converge to the same solution. However, running it multiple times on the same CI machine gives the same converged solution each time, just a different one to the original machine.

A colleague then ran it on a third machine, and saw the same behaviour, i.e. different to machines 1 and 2, but consistent convergence on that machine.

Does anyone have any idea what might be causing this or any thoughts on where to look?

One potentially relevant warning we are seeing is "aesaraf.py:1005: UserWarning: The parameter ‘updates’ of aesara.function() expects an OrderedDict, got <class ‘dict’>. Using a standard dictionary here results in non-deterministic behavior. You should use an OrderedDict if you are using Python 2.7 (collections.OrderedDict for older python), or use a list of (shared, update) pairs. Do not just convert your dictionary to this type before the call as the conversion will still be non-deterministic."

However, a bit of searching suggests that we can ignore it in this case (?)

Many thanks for any suggestions.

Code snippets below. If more is needed please say.

# Model (x1, x2, x3, noise are random arrays, c0, c1... consts)
y = c0 + (c1 * x1) + (c2 * x2) + (c3 * x3) + (c4 * noise)

# dataframe constructed from the following dict:
data = {'x1': x1, 'x2': x2, 'x3': x3, 'y': y}

# Fit model
np.random.seed(self.seed)
...
with pm.Model() as model:
    # Priors
    sigma = pm.HalfNormal('sigma', sigma=2)
    intercept = pm.Normal('intercept', mu=20, sigma=200)

    beta = pm.Normal('beta', mu=15, sigma=20, dims=3)

    mu = intercept + pm.math.dot(df[['x1','x2','x3'], beta)
    pm.Normal('likelihood', mu=mu, sigma=sigma, observed=df['y'])

    # draw posterior samples
    trace = pm.sample(draws=1000, tune=1000, chains=4, random_seed=self.seed,
                      return_inferencedata=False,
                      compute_convergence_checks=True)

# inspection, for e.g:
trace.model_logp[0:10]
trace.beta[0:5]

cluhmann · May 11, 2022, 1:30am

I know the seeding behavior was somewhat unpredictable at some point. @ricardoV94 what’s the current status on that?

ricardoV94 · May 11, 2022, 7:15am

It’s a bit o a wild guess, but a quick google suggests some MKL floating-point operations are not necessarily deterministic, and I see you have a dot operation there… numpy - Same Python code, same data, different results on different machines - Stack Overflow

Can you try on a model with just just a simple unobserved normal?

with pm.Model() as m:
  x = pm.Normal("x")
  pm.sample(..., random_seed=self.seed)

Also do the final results look very different across the different machine?

will_h · May 11, 2022, 10:29am

Thanks @ricardoV94 - those were useful tips.

I tried naively applying various MKL flags and didn’t have much luck on that, however on your tip about the dot operator I had more luck: Replacing dot with a solution manually summing the products has, for the moment, resolved the issue. It has introduced a new issue however, which I will described in a subsequent reply.

One other interesting observation I made with the original model (with the dot operator) was that using just a subset of the observation data could result in agreement between both machines. I’m struggling to explain this one, but I guess it could fit under the “vague indeterminate floating point operations” umbrella (?)

The differences were significant enough for the model to converge successfully on one machine and not on the other: we noticed it because we are running a convergence check in CI and it was that model which was failing to converge - otherwise we might not have spotted the issue.

Thanks for your help on this.

ricardoV94 · May 11, 2022, 10:41am

It seems your model might be particular unstable, which may be the case where those “non-deterministic” operations play a bigger role (still just an hypothesis).

will_h · May 11, 2022, 10:50am

The issue mentioned above relates to why I was previously using the dot operator: to allow the dimensions of the model to scale automatically dependent on the number of columns of observations supplied.

Previously I had something like:

inputs  # dataframe with N rows of M input observations (M = [x1, x2, x3] in the original example)

coords = {"observations": inputs.index, "input_variables": inputs.columns}

with pm.Model(coords=coords) as model:
  ...
  # beta size set to M
  beta = pm.Normal('beta', mu=15, sigma=20, dims="input_variables")

  # observed size set to N
  data_observed = pm.MutableData("data_observed", output, dims=("observations"))
  # input data size set to N x M
  data_input = pm.MutableData("data_input", inputs, dims=("observations", "input_index"))

  # Intention is for mu to be size N, each entry equal to c0 + (c1 * x1) + (c2 * x2) + (c3 * x3) 
  mu = intercept + pm.math.dot(data_input, beta)
  pm.Normal('observed', mu=mu, sigma=sigma, observed=data_observed)
  ...

If a column was added to inputs then the size of beta etc updates automatically.

Is there a clean way to achieve the same goal without use of the dot operator?

ricardoV94 · May 11, 2022, 10:51am

I don’t think you want to get rid of the dot operator (it should be much faster), but instead investigate the underlying model instability that showed up to the surface because of it

will_h · May 11, 2022, 11:00am

That sounds sensible. Is there any general guidance anywhere on approaches to take investigating instabilities? I’m fairly new to Bayesian approaches. The real input data includes some categoricals, one-hot mapped, which I can imagine might be more problematic than numeric inputs.

ricardoV94 · May 11, 2022, 12:21pm

I suggest you open a new topic for that specific question, you’ll probably have to share some details about the data as well

Topic		Replies	Views
Seeding issues when using model with Dirichlet and Gamma Questions	2	557	November 21, 2018
Results not fully reproducible Questions bug	1	736	May 10, 2022
Aesaraf warning OrderedDict v5 aesara	1	682	March 31, 2022
TypeError when combining constant and Normal distribution in function within deterministic node v5 bug	6	755	April 18, 2022
Reproducibility across machines Questions	2	963	January 28, 2019

Non-deterministic results between different machines

Related topics