A little change to the sample code of MAP, however

It’s my first step into PyMC3. I changed some lines in sample code of PyMC3 in the doc. The original code is to estimate [alpha, beta, sigma] by MAP. It’s as follows:

#########BEGIN############
# Generating data
size = 100
# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
# Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma

# Begin MAP
basic_model = pm.Model()
with basic_model:
# Priors for unknown model parameters
alpha = pm.Normal(‘alpha’, mu=0, sd=10)
beta = pm.Normal(‘beta’, mu=0, sd=10, shape=2)
sigma = pm.HalfNormal(‘sigma’, sd=1)
# Expected value of outcome
mu = alpha + beta[0]*X1 + beta[1]*X2
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal(‘Y_obs’, mu=mu, sd=sigma, observed=Y)
map_estimate = pm.find_MAP(model=basic_model)
#########END###########

I wonder if it can still work well when I changed as follows:

  1. sigma=abs(alpha + beta[0]*X1 + beta[1]*X2)
  2. mu=pm.Normal(‘alpha’, mu=0, sd=10)

The code changed as follows:

###########BEGIN###########
# Generating data
size = 100
# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
# Simulate outcome variable
Y = np.random.randn(size)* **abs(alpha + beta[0]X1 + beta[1]X2 )

# Begin MAP
basic_model = pm.Model()
with basic_model:
# Priors for unknown model parameters
alpha = pm.Normal(‘alpha’, mu=0, sd=10)
beta = pm.Normal(‘beta’, mu=0, sd=10, shape=2)
mu = pm.Normal(‘mu’, mu=0, sd=10)
sigma=abs(alpha + beta[0]*X1 + beta[1]*X2)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal(‘Y_obs’, mu=mu, sd=sigma, observed=Y)
map_estimate = pm.find_MAP(model=basic_model)
print(“map_estimate:\n”,map_estimate)
########## END ##########

However it can NOT work. The outcome was totally ZERO. Likewise it can not reach a result using NUTS Sampler.
This really makes me upset. I don’t know why and how to fix it.

Kingdom

Most likely is that the start value for sigma is 0, resulting the model logp being 0 or inf. You can check the default test_value of sigma by doing:

sigma.tag.test_value

Since here the sigma is defined to be possible to take 0 as value, it causes problem when you plug that into a Normal likelihood. I am not sure what your intention is, but you should either change the prior for sigma, or add a small epslion to avoid this problem.

Thank you for reply.
As you said, the sigma.tag.test_value is ALL ZERO.
Following your suggestion, a small epsilon was added to sigma:

epsilon=1e-3


sigma = pm.HalfNormal(‘sigma’, sd=1)

However, the outcome seems little improved. Following is the output on epsilon=(1e-3, 1.5) accordingly. Setting different epsilon results different mu and sigma. The optimizer seems meet local maximas.

You did not specify the same model as the data generation process, you should do
Y_obs = pm.Normal(‘Y_obs’, mu=0, sd=sigma, observed=Y)

Sorry for the error in my previous post. Here I posted latest code as following as well as the outcome.

#--------Begin-----------

#True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

#Size of dataset and epsilon for sigma
size = 100
epsilon=1e-3

#Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

#Simulate outcome variable
Y = np.random.randn(size)*(abs(alpha + beta[0]*X1 + beta[1]*X2)+epsilon)

basic_model = pm.Model()
with basic_model:

#Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10, shape=2)
mu = pm.Normal('mu', mu=0, sd=1)

#Expected value of outcome
sigma=abs(alpha + beta[0]*X1 + beta[1]*X2)+epsilon

#Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
map_estimate = pm.find_MAP(model=basic_model)
print("map_estimate:\n",map_estimate)

#---------End-----------

I can’t dig out what cause the problem.

Most likely is that there is not enough information in your data to infer the parameters, and find_MAP is likely reached a local minimum. You can play around with increasing the number of the data and the true parameter value and see if you can recover the parameters.

The parameters were changed as following:

#True parameter values
alpha, sigma = 10, 18
beta = [7, 12.5]
epsilon=1e-3
#Size of dataset
size = 100000

However, the outcome seems still.

Outcome

logp = -7.7452e+12, ||grad|| = 0.63739: 100%|██████████| 24/24 [00:00<00:00, 120.24it/s]
map_estimate:
{‘alpha’: array(0.0), ‘beta’: array([ 0., 0.]), ‘sigma_log__’: array(-0.22579135278478346), ‘mu’: array(0.006863589788560047), ‘sigma’: array(0.7978845606911168)}

Your model maybe unidentifiable, I can see couples of problems that make inference difficult:
1, abs creates multiple modes, as alpha and beta can take both positive and negative value, using a HalfNormal as prior might help.
2, the gradient is too small for alpha and beta, as they control the sigma of the Y_obs, but the information from X1 and X2 is too small (you can think of it as too much random noise in the data generation process). One way to show this is to increase the scale of X1 and X2:

#True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

#Size of dataset and epsilon for sigma
size = 1000
epsilon=1e-5

#Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size)*10

3, try with different starting value of the find_MAP(), for example:

# this is the default starting point
point = basic_model.test_point
point
# {'alpha': array(0.), 'beta': array([0., 0.]), 'mu': array(0.)}

# change the starting point
point['alpha'] = np.array(1.)
point['beta'] = np.asarray([1, 2.0])

map_est = pm.find_MAP(start=point, model=basic_model)
map_est

Thank you for your time.
Yes, the change on starting value made alpha,beta depart from zero, although they are not close the true values.
The log likelihood of a Gaussian L(sigma) approximately takes the form of:
L(sigma) = -log(sigma)-1/sigma**2
which has 2 maxima globally.
If statement above is true then how the estimations meet so many local maxima?
Regarding ‘using HalfNormal as prior’, I can’t figure out how to express the **sigma=alpha+beta[0]X1+beta[1]X2 in a HalfNormal?

In addtion would you mind we talk in Chinese? Many thanks.

Kingdom

You are right that for the marginal sigma it is not multimodal, but the problem is that you are a regression model sigma=abs(alpha + beta[0]*X1 + beta[1]*X2). With the abs it makes the regression non-monotone, which means you can move one predictor’s coefficient up and another one down and get roughly the same predictions, which makes the predictors multimodal and non-identifiable.

As for the problem of starting value, the default starting value (alpha and beta being zero) is at a flat geometry - with small gradient it has difficulty to move towards the MAP.

Thank you for all the reply. Maybe it’ll be clear along with my learning on PyMC3 :slightly_smiling_face:

1 Like