Inferring Transformation applied to mixture of known multivariate normal distributions

Hi,

I do not know if PyMC3 is the correct framework for my use case. My data is generated by sampling from a mixture of multivariate normal distributions afterwards an affine transformation is applied. I do know the means and covariances as well as the frequencies of each multivariate normal distribution. This means I do know all parameters of this underlying statistical model and I do not want to have them changed. The parameters of the affine transformation therefore are unknown and shall be derived. Since the affine transformation corresponds to a physical measurement it is plausible to find a distribution of affine transformations instead of ONE exact affine transformation. That´s why PyMC3 might be the correct framework. I do have three questions:

  1. How to model a mixture of multivariate normal distributions?
  2. How to fix the parameters of these multivariate normal distributions, so that they are not changed by the MCMC?
  3. How to compare observed data to a Deterministic variable?

I created a minimal working example with artificial data and only one multivariate distribution, so no mixture model:

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import scipy

### virtual data creation
mean = np.array((1, 1))
cov = np.array(((1, 0.5), (0.5, 1)))
sample_size = 1000
samples = np.random.multivariate_normal(mean, cov, size=sample_size)
print(samples.shape)

affine_A_mean = np.array((1, 10))
affine_A_std = np.array((0.5, 0.1))
affine_b_mean = np.array((-10, 8))
affine_b_std = np.array((2, 1))

affine_A = np.random.normal(affine_A_mean, affine_A_std, size=(sample_size, 2))
affine_b = np.random.normal(affine_b_mean, affine_b_std, size=(sample_size, 2))
print(affine_A.mean(axis=0), affine_A.std(axis=0))
print(affine_b.mean(axis=0), affine_b.std(axis=0))

measurement = affine_A * samples + affine_b

fig, axes = plt.subplots(1, 1, figsize=[8, 8])
axes.scatter(samples[:, 0], samples[:, 1], alpha=0.5, label="samples")
axes.scatter(measurement[:, 0], measurement[:, 1], alpha=0.5, label="measurements")
axes.legend()
plt.show()

### PyMC3 inference
with pm.Model() as model:
    model_A = pm.Normal(
        "A",
        mu=affine_A_mean + 1,
        sd=affine_A_std + 0.6,
        shape=(sample_size, 2)
    )
    model_b = pm.Normal(
        "b",
        mu=affine_b_mean - 2,
        sd=affine_b_std,
        shape=(sample_size, 2)
    )
    
    # This does not work, since we can not assign an observation to
    # pm.Deterministic
    # model_samples = pm.MvNormal("samples", mean, cov, shape=(sample_size, 2))
    # model_measurement = pm.Deterministic(
    #     "measurement",
    #     model_A * model_samples + model_b,
    #     observed=measurement
    # )
    
    # instead we use model_A and model_b as inverses of the affine transformation.
    # The inverse transforms the observed samples back so we can pass these
    # pm.Deterministic to the pm.MvNormal as observed.
    
    inverse_samples = pm.Deterministic(
        "inverse_samples",
        model_A * measurement + model_b,
    )
    model_samples = pm.MvNormal(
        "samples", mean, cov, shape=(sample_size, 2),
        observed=inverse_samples
    )
    
    step = pm.Metropolis()
    trace = pm.sample(100000, tune=5000, step=step)

### Plotting the inference
burned_trace = trace[20000:]
A_samples = burned_trace['A']
b_samples = burned_trace['b']
print(A_samples.shape)

inverse_As = 1 / affine_A
inverse_bs = - affine_b / affine_A

print(inverse_As.mean(axis=0), inverse_As.std(axis=0))
print(inverse_bs.mean(axis=0), inverse_bs.std(axis=0))

fig, axes = plt.subplots(2, 1, figsize=[8, 16])

prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']

for j, (vals, inverses) in enumerate(zip(
    [A_samples, b_samples],
    [inverse_As, inverse_bs],
)):
    for i in range(2):
        _, bins, _ = axes[j].hist(
            vals[..., i].flatten(),
            histtype="stepfilled", density=True,
            alpha=0.85, bins=30,
            color=colors[i],
            label="$A_{}$".format(i+1)
        )
        x = np.linspace(bins[0], bins[-1], 100)
        #kernel = scipy.stats.gaussian_kde(inverses[i])
        kernel = scipy.stats.norm(inverses[i].mean(), inverses[i].std()).pdf
        axes[j].plot(
            x, kernel(x), color=colors[i]
        )
    axes[j].legend()

From this output

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep

Metropolis: [b]
Metropolis: [A]
Sampling 4 chains, 0 divergences: 100%|██████████| 420000/420000 [03:22<00:00, 2072.75draws/s]
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

I get, that the means and covs of the MvNormal are fixed and not changed, which makes sense in hindsight. This answers the second question. The third question is not that important since I implemented a workaround to assign the observed keyword not to a deterministic but a random variable. So the most important question is how to implement a mixture distribution? In Chapter 3 of http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ a mixture model is introduced using the pm.Categorical Random variable. But I cannot use it, since I cannot create batched MvNormal RVs. Therefore I was thinking of creating the different MvNormals into an numpy array of dtype object as in Chapter 2 of the above mentioned book. I have code like this, that is not working:

frequencies = np.array([0.25, 0.75])
means = np.array([[1, 1], [2, 2]])
covs = np.array([
    [[1, 0.1], [0.1, 1]],
    [[2, 0.2], [0.2, 2]],
])
model_samples_i = np.empty(len(frequencies), dtype=object)
with pm.Model() as model:
    model_A = pm.Normal(
        "A",
        mu=affine_A_mean + 1,
        sd=affine_A_std + 0.6,
        shape=(sample_size, 2)
    )
    model_b = pm.Normal(
        "b",
        mu=affine_b_mean - 2,
        sd=affine_b_std,
        shape=(sample_size, 2)
    )

    inverse_samples = pm.Deterministic(
        "inverse_samples",
        model_A * measurement + model_b,
    )
    
    assignment = pm.Categorical(
        "assignment", frequencies,
        shape=sample_size,
        testval=np.random.randint(0, len(frequencies), sample_size)
    )
    
    for i in range(len(frequencies)):
        model_samples_i[i] = pm.MvNormal(
            "samples_{}".format(i), means[i], covs[i], shape=(sample_size, 2),
        )
        
    model_samples = pm.Deterministic(
        "samples",
        model_samples_i[assignment],
        observed=inverse_samples
    )
    
    step = pm.Metropolis()
    trace = pm.sample(100000, tune=5000, step=step)

I would really appreciate some help on this problem. If I am missusing MCMC here I am glad to know. I was also thinking of creating an artifical data set from the know mixture distribution and using Point Set Registration between the artificial data and the real data to find a good affine transformation.

Best,
Niklas