I am using Pymc3 to fit the spectrum (light histogram) of galaxies. Basically you have a have a linear system:
A = w_{i} \cdot B
where A is your observed histogram, B is a user declared matrix of stellar histograms and w_i is the weight of each component: The variable you want to solve. Graphically, in the plot below you have the purple line and you want to measured how much the other curves have contributed to it.
Actually, @jrmullaney posted a similar problem whose issues I share: The simulation becomes much slower in time and with the number of components although I still get very good results.
The answers explain possible causes for this behaviour and I am trying to find the guilty one.
I would like to reparameterise the model but I am not sure which is the right direction: Since I have very good estimates for the priors mean values I could scale them to zero but this will cause troubles since physically the weights cannot be negative. Would the scale of the input data affect? Maybe the fact that some of the components are similar is affecting the speed but in that case I do not understand why if I increase the likelihood standard deviation the speed increases.
Any advice is most welcome.
(I have included working sample code here):
from os import environ
environ["MKL_THREADING_LAYER"] = "GNU"
import theano
import theano.tensor as tt
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
# Define model data
x_array, y_CurvesMatrix = np.arange(4700, 6200), np.loadtxt('stellar_spectra.txt')
weightsTrue = np.array([0.75, 1.1, 0.5, 0.2]) # Declare each curve weight
spectrumTrue = weightsTrue.dot(y_CurvesMatrix) # Generate synthetic observation
# Add some noise
error_Pixel = 0.02
obsNoise = np.random.normal(0, error_Pixel, size=x_array.size)
# Observed spectrum curve
obsSpectrum = spectrumTrue + obsNoise
# Data for the priors
weights_priors = np.array([0.735, 1.15, 0.525, 0.192]) # Give distribution centers some offset
weights_std = weightsTrue * 0.10
n_weights = weightsTrue.size
# Std for the likelihood
curveUncertainty_array = error_Pixel * np.ones(x_array.size)
# Declare theano variables
curvesMatrix_tt = theano.shared(y_CurvesMatrix)
# Pymc3 model
with pm.Model() as model:
# Weights prior
w_i = pm.Normal('w_i', mu=weights_priors, sd=weights_std, shape=n_weights)
#Synthetic emission of the model
spectrum = w_i.dot(curvesMatrix_tt)
# Likelihood
Y = pm.Normal('Y', mu=spectrum, sd=curveUncertainty_array, observed=obsSpectrum)
# Launch model
model.check_test_point()
trace_nuts = pm.sample(1000, tune=500, step=pm.NUTS())
# Plot model output data
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
for i in np.arange(y_CurvesMatrix.shape[0]):
ax.plot(x_array, y_CurvesMatrix[i,:] * weightsTrue[i], label=r'$Curve_{{\,{}}}\cdot w_{{{}}}$'.format(i,i), linestyle=':')
ax.plot(x_array, obsSpectrum, label='Observed spectrum')
ax.plot(x_array, weightsFit_nutts.dot(y_CurvesMatrix), label='Fit spectrum NUTS', linestyle='-')
ax.update({'xlabel': 'Wavelength (nm)', 'ylabel': 'Flux (normalised)'})
ax.legend()
plt.show()
Here is the data stellar_spectra.txt (146.5 KB)