# Getting the current sample variable value so it can be used to compute the observed inputs in a secondary model likelihood

Hello everyone! I am trying to update my code from pymc2 to pymc3 and I wonder if anyone had the same type of issues (or actually, if it can be done). I have this observational data (a histogram with the light from a galaxy, the stellar component is a flat continuum while the gas produces narrow lines): As annotated in the plot I want to apply two operations: First, fit the continuum with a physical model and then fit the area of the narrow line with another physical model.

To do the second step in a pymc3 model I need to remove the fitted continuum before computed the observed area of the line. My strategy was using the “.tag.test_value” attribute to do this. I am not getting errors with this dummy example:

``````from os import environ
import theano.tensor as tt
import pymc3 as pm
import numpy as np
from scipy.integrate import simps
import matplotlib.pyplot as plt

def planck_radiation(wav, T, h, c, k):
a = 2.0*h*c**2
b = h*c/(wav*k*T)
intensity = a / ((np.power(wav,5)) * (np.exp(b) - 1.0))
return intensity

def planck_radiation_tt(wav, T, h, c, k):
a = 2.0*h*c**2
b = h*c/(wav*k*T)
intensity = a / ((tt.power(wav,5)) * (tt.exp(b) - 1.0))
return intensity

def gaussian(wave, A, mu, sig):
return A * np.exp(-np.power(wave - mu, 2.) / (2 * np.power(sig, 2.)))

# Physical constants
h = 6.626e-34
c = 3.0e+8
k = 1.38e-23

# Observed wavelength range
obs_wave = np.arange(400,500)

# Compute stellar continuum
T_true = 10000.0
continuum_true = planck_radiation(obs_wave * 1e-9, T_true, h, c, k)

# Calculate line true shape and area
A_true, mu_true, sig_true = 1e15, 460, 1
sqrt2pi = np.sqrt((2 * np.pi))
emline_true = gaussian(obs_wave, A_true, mu_true, sig_true)
emLineArea_true = A_true*sig_true*sqrt2pi
emLineArea_err = emLineArea_true * 0.05

# Mask for the line region
idcs_mask = (obs_wave > mu_true-5) & (obs_wave < mu_true+5)

# Observed flux with noise
err_continuum = np.mean(continuum_true) * 0.05
obs_flux = continuum_true + emline_true + np.random.normal(0, err_continuum, obs_wave.size)

# Pymc3 model
with pm.Model() as model:

temp = pm.Normal('temp', mu=5000.0, sd=1000.0)
A_norm  = pm.HalfNormal('A_norm', sd=5)
sig = pm.HalfNormal('sigma', sd=5)

# Model continuum

# Likelihood model continuum (masking the line)
Y_continuum = pm.Normal('Y_continuum', mu=continuum_flux, sd=err_continuum, observed=obs_flux*mask)

# Remove from the observation the proposed continuum
emission_obs = obs_flux - continuum_flux.tag.test_value

# Integrate the line area

# Model line area
line_area = A_norm*sig*sqrt2pi

# Global multivariable likelihood for all lines
Y_line = pm.Normal('Y_line', mu=line_area, sd=emLineArea_err, observed=line_area_obs)

for RV in model.basic_RVs:
print(RV.name, RV.logp(model.test_point))

# Launch model
trace = pm.sample(8000, tune=2000)

# Output trace data
print pm.summary(trace)
pm.traceplot(trace)
plt.show()
``````

But while the continuum fitting gives a good result line area priors are not being sampled:

I wonder if anyone would share with me their approach in those cases where the observed input data needs to be treated/corrected with the model variables.

Dont subtract the test_value - it is in most case a constant for shape/value checking. I think in you case it should work by doing: `emission_obs = obs_flux - continuum_flux`.

1 Like

I have not been able to make that work because I need a function which can integrate under a theano vector.

However, in this post you shared there are people doing the same :

That approach works: I can can integrate the specially if I apply an interpolation on the propopsed

This seems like and old post: Maybe you know if someone has shared a working pymc3 code with a integration op for a y = f(x) function?

You can find more information in this discussion: Custom theano Op to do numerical integration

1 Like

Thanks again! That reply from @aseyboldt is very complete (I am saving it). But I made a mistake, I did not mean a function but a theano vector (continuum_true), very sorry for that…

It should be then something like a Simpson’s rule: To calculate the green area in the image. Then, I add it to the model computation of the red area. This sum I can compare with the observational total line area without breaking the Bayesian statistics Do I need to declare a theano ops to do that?

Tensor describe some mapping, so you can integrate it just like a function.

I am still a bit puzzled by your problem here. To me, if you are trying to integrate the red part and use as an observed, you can compute the area as:

``````    # Remove from the observation the proposed continuum
emission_obs = obs_flux - continuum_flux

# Integrate the line area
``````

With a bit of scaling, it seems to work:

``````mask_loc = np.where(idcs_mask)
with pm.Model() as model:
# parameters
temp = pm.Normal('temp', mu=5000.0, sd=1000.0)
A_norm = pm.HalfNormal('A_norm', sd=5)
sig = pm.HalfNormal('sigma', sd=5)

# Model continuum
continuum_flux = planck_radiation_tt(obs_wave * 1e-9, temp, h, c, k)

Y_continuum = pm.Normal('Y_continuum', mu=continuum_flux *

# Remove from the observation the proposed continuum
emission_obs = obs_flux - continuum_flux

# Integrate the line area
line_area_obs = pm.Deterministic('linearea',

# Model line area
line_area = A_norm * sig * sqrt2pi

# Global multivariable likelihood for all lines
Y_line = pm.Normal('Y_line', mu=line_area,
sd=emLineArea_err / 1e15, observed=line_area_obs)
``````

However, I see this approach a bit odd, first `A_norm` and `sig` is unidentifiable as `line_area` is the muliplication of the two. From the data generation process you can directly model the observed `obs_flux` as the sum of the latent function `continuum_true` and `emline_true` plus some noise:

``````def gaussian_tt(wave, A, mu, sig):
return A * tt.exp(-tt.power(wave - mu, 2.) / (2 * tt.power(sig, 2.)))

with pm.Model() as model:
temp = pm.Normal('temp', mu=5000.0, sd=1000.0)
A_norm = pm.HalfFlat('A_norm')
sig = pm.HalfNormal('sigma', sd=5)
mu = pm.Uniform('mu', 400, 500)
err = pm.HalfNormal('err', 1.)
# Model continuum
continuum_latent = planck_radiation_tt(obs_wave * 1e-9, temp, h, c, k)
emline_latent = gaussian_tt(obs_wave, A_norm, mu, sig)

# Likelihood
obs = pm.Normal('obs',
mu=(continuum_latent + emline_latent) / 1e15,
sd=err,
observed=obs_flux / 1e15)

# Launch model
trace = pm.sample(1000, tune=2000)
``````

Which all parameter could be fitted nicely.

Then if you would like to calculate the integral of something you can plug in the posterior sample and computed them after inference.

1 Like

That is perfect!! (Too obvious for me to see!) Thanks a lot!!!.

Sorry for the simple example, I have made it more complicated than necessarily: The second approach is much nicer but I cannot apply it: In reality, we do not have an equation to model the continuum in galaxies, we just fit a linear system with the light histogram of each star times how many stars of that type there are. So I have to integrate the area (which is not linear either but has a drop in flux due to the atoms absorbing light at that wavelength). And yes the lines lines area is computed different as it is done here :) Thanks a lot!!!

I see. In that case, just be careful of the scaling in the integral, as some parameters seem to be quite sensitive to that.

1 Like

What do you mean by scaling?

``````    # Integrate the line area
line_area_obs = pm.Deterministic('linearea',