Dealing with multivariate x, errors in both x and y, and posterior predictive for y

payeldas · December 14, 2017, 10:44am

I am fitting an artificial neural network to a set of of observed x and y, using the following linear regression model as my guide:

with pm.Model() as model:
    intercept = pm.Normal('intercept', 0, sd=20)
    gradient = pm.Normal('gradient', 0, sd=20)
    true_x = pm.Normal('true_x', mu=0, sd=20, shape=len(x)) 
    likelihood_x = pm.Normal('x', mu=true_x, sd=x_err, observed=x)
    true_y = pm.Deterministic('true_y', intercept + gradient * true_x)
    likelihood_y = pm.Normal('y', mu=true_y, sd=y_err, observed=y)

I still have some questions:

How would you deal with multiple x variables?
If I don’t know much about true_x, what is the best prior to use?
If I would like to calculate the posterior predictive distribution given new observed x, there is now no new observed y, does this mean I only update likelihood_x but not likelihood_y? Is this a problem if new observed x has a different sample size?

Thanks!

junpenglao · December 14, 2017, 11:29am

You can represent multiple x variables (e.g., repeated measurement) as a matrix, and do tt.dot(gradient * true_x) for matrix multiplication. Of course, you need to be careful of the shape in gradient and true_x
It is hard to say, in general use a normal distribution with large standard deviation (e.g., 10x std of x) is a good starting point
You can have a look at this blogpost and related discussion in OOS predictions with missing input values for some inspiration.

payeldas · December 15, 2017, 10:45am

Thanks so much! In particular for the link to the missing input values. I think my problem was letting go of the idea of ‘inputs’ and ‘outputs’, and just think of observables.

payeldas · December 18, 2017, 11:46am

Hi,
I used http://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html to implement my model but am still having some issues! I think it’s to do with how I define the shape of some of the random variables. To explain my model briefly, I have 12 input random variables, and one output random variable that I estimate through a neural network. This results in 13 observed variables each with measured noise. My training set has ~1000 (nobs) sets of measurements of these 13 observed variables. I would like to fit the training set, and then examine the performance on a further test set, before using the Bayesian neural network to calculate posterior predictive distributions for new observations for which the 13th observable(i.e. the output) is missing. From your example, it seemed that I don’t need to give the information on the number of measurements, but in that case I don’t get the right distribution of weights. When I do give the nobs information, I get posteriors that look right but cannot then apply the model to new datasets with different numbers of measurements. Here’s my code:

# Number of input variables (fixed)
ninputs = inputsTrainScale.shape[1]
    
# Number of neurons on each hidden layer (fixed)
neurons_per_hiddenlayer = 5

# Set observed random variables to be shared (initially set to training set)
inputsShared     = theano.shared(inputsTrainScale)
errInputsShared  = theano.shared(errInputsTrainScale)
targetShared     = theano.shared(targetTrain)
errTargetShared  = theano.shared(errTargetTrain)

# Initialize random weights on neurons
init_hid = np.random.randn(inputsTrainScale.shape[1],neurons_per_hiddenlayer)
init_out = np.random.randn(neurons_per_hiddenlayer)

with pm.Model() as neural_network:
    
    nobs = len(targetShared.eval())
    
    ## PARAMETERS OF NEURAL NETWORK (UNOBSERVED RANDOM VARIABLES)
    
    # Weights from input to hidden layer
    wts_in_hid = pm.Normal('wts_in_hid',mu=0,sd=1,shape=(ninputs,neurons_per_hiddenlayer),testval=init_hid)
        
    # Weights from hidden layer to output
    wts_hid_out = pm.Normal('wts_hid_out',mu=0,sd=1,shape=(neurons_per_hiddenlayer,),testval=init_out)
    
    ## OBSERVED RANDOM VARIABLES (TRUE VALUES)
    
    # Deterministic spectral parameters (mean zeros assuming they have been scaled, and std 1 means similar to measured values)
    true_x  = pm.Normal('true_x', mu=0,sd=1, shape=(ninputs)) 
    
    # Deterministic mass from neural network (assuming sigmoid and linear activation functions for the hidden and outer layers)
    act_hid   = TT.nnet.sigmoid(TT.dot(true_x,wts_in_hid))
    act_out   = TT.dot(act_hid,wts_hid_out)
    true_mass = pm.Deterministic('true_mass',act_out)
    
    ## OBSERVED RANDOM VARIABLES (LIKELIHOODS USING OBSERVED VALUES)

    # Define likelihoods using observations
    likelihood_x    = pm.Normal('likelihood_x',mu=true_x,sd=errInputsShared,observed=inputsShared, shape=(ninputs))
    likelihood_mass = pm.Normal('likelihood_mass',mu=true_mass,sd=errTargetShared,observed=targetShared, shape=(nobs))

junpenglao · December 18, 2017, 12:45pm

You should avoid specifying the ninputs in the shape, which is unnecessary as the computation should be broadcasted on that dimension (same idea as the batch size of using minibatch). Say that in your neural net you have the inputX shape=(1000, 12) and outputY shape = (1000, 1), you should specify the weight with the shapes such as (12, a), (a, b), (b, c) … (k, 1), and after a chain of matrix multiplication and non-linear transformation, you arrive at a Y-hat with shape = (1000, 1). Notice that in the model specification the shape of the weight (i.e., the RVs in pm.Model) does not contain 1000, the number of observations.

In my blog post, instead of inputing inputX I use a latent variable inputX_obs = pm.Normal('X', mu=Xmu, sd=1., observed=inputX). The matrix multiplication between the input and the weights at each layer of the NN is then operate on inputX_obs instead of inputX. Here the prior Xmu will have a shape of (12, 1) in your case. Notices that if there is no missing value in inputX, doing so it is equivalent as inputX_obs = inputX.

Back to your code, instead of
act_hid = TT.nnet.sigmoid(TT.dot(true_x,wts_in_hid))
you should do
act_hid = TT.nnet.sigmoid(TT.dot(likelihood_x,wts_in_hid)),
and add additional Random variable to the model for posterior prediction.

payeldas · December 18, 2017, 10:06pm

Ok I completely missed that I should be inputting inputX rather than inputX_obs. It now works as expected for predicting posterior distributions for the seen observables, but only when inputsTrainScale and errInputsTrainScale are not theano.shared. I’m not sure what sharing them with theano changes to stop it from working. The other thing I’m not sure about is what to set the sd parameter in likelihood_mass for new measurements of inputX. In the original model I assign it to the measured noise. Obviously for new inputX I don’t have a measurement of the noise in Y. Should I treat it like a missing variable?
Thanks.

junpenglao · December 19, 2017, 8:34am

Hmm, if you dont have any missing value (ie., the input is not a numpy mask array), setting it to theano.shared should still works. But if you are adding a new node for posterior prediction you dont need theano.shared the input as you are not going to reset the value.

I would use the measurement noise of the original Y.

payeldas · December 19, 2017, 12:44pm

Ok I think that all works now - thank you!

payeldas · March 14, 2018, 10:51am

Hi,
I’ve come back to this after a few months. I realized there is still one thing that confuses me. What is the difference between posterior predictive checks and prediction (https://docs.pymc.io/notebooks/posterior_predictive.html)? It seems that the difference lies in the specification of the size parameter. In my estimates of the posterior distributions for new data predicted by the neural network, I would like to carry through the parameter errors, and the errors in new data. Doesn’t this mean I should specify size too? When I do, I get a shape error.
Thanks.

junpenglao · March 14, 2018, 11:05am

Posterior prediction check usually is used in the context that after you fitted the model, you generate prediction from the model to and see how the observed overlapped with the prediction from the fit. One typical way to do is plot the (smoothed) histogram and plot the real data on top. For example, see figure 6 in Visualization in Bayesian workflow.

Notice that the observed here could be the training set or the test set.

As for the shape error when specifying the size, you can try creating a new node with a specific shape and do sample_ppc(...,vars=[new_node],...).

Topic		Replies	Views
How do I predict on new, unseen real data using pm.sample_posterior_predictive? Questions	13	8351	January 7, 2021
Posterior and data mismatch in linear model with observed predictors Questions	0	471	August 13, 2020
Odd results in model prediction using pymc.sample_posterior_predictive v5 linear_model , modeling	9	1205	September 24, 2022
Multivariate Linear Regression with correlated input variables Questions	7	2474	June 9, 2021
Prediction by Bayesian linear regression Questions	7	1787	April 12, 2021

Dealing with multivariate x, errors in both x and y, and posterior predictive for y

Related topics