Initializing a vectorized prior with multiple different distributions?

ProcessEngineer · May 20, 2020, 6:15pm

Say I’m creating a prior:

alpha = Normal('alpha', mu=0, sigma=1, shape=2)

but I actually want alpha[0] to have mu of -1 and alpha[1] to have mu of 1. Is there a way to do that?

Motivation (feel free to skip this part if you can answer the question above): I’m working on an updating script that updates posteriors after every new data point. I am trying to learn from this example that does updating by using pm.Interpolated to forward-propagate the old posteriors to new priors. I tried to modify that script to vectorize the model to have two values (instead of one value) for alpha, beta0, and beta1, but I don’t know how to modify the from_posterior() function from that example to handle vectorized distributions.

My code: (errors out when trying to call from_posterior() because I’m passing trace of shape (4000,2) instead of the expected shape of (4000,) )

import matplotlib.pyplot as plt
import matplotlib as mpl
import pymc3 as pm
from pymc3 import Model, Normal, Slice
from pymc3 import sample
from pymc3 import traceplot
from pymc3.distributions import Interpolated
from theano import as_op
import theano.tensor as tt
import numpy as np
from scipy import stats
from random import choice
import pandas as pd

if __name__ == '__main__':
    plt.style.use('seaborn-darkgrid')
    print('Running on PyMC3 v{}'.format(pm.__version__))
    
    # Initialize random number generator
    np.random.seed(93457)
    
    # True parameter values
    alpha_true = [5, -5]
    beta0_true = [7, 17]
    beta1_true = [13, 23]
    
    # Size of dataset
    size = 100
    
    # Predictor variable
    X1 = np.random.randn(size)
    X2 = np.random.randn(size) * 0.2
    
    # for different data points, different known parameter settings are applied, which means there are two values for alpha, two values for beta0, and two values for beta1   
    Data = pd.DataFrame(columns=["Y", "alpha", "beta0", "beta1"], index=range(0,size))
    for point in range(0,size):
        # randomly choose which alphas and betas are applied as parameters for this data point
        alpha_choice = choice([0,1])
        beta0_choice = choice([0,1])
        beta1_choice = choice([0,1])
        
        this_alpha = alpha_true[alpha_choice]
        this_beta0 = beta0_true[beta0_choice]
        this_beta1 = beta1_true[beta1_choice]
        # Simulate outcome variable 
        this_Y = alpha_true[alpha_choice] + beta0_true[beta0_choice] * X1[point] + beta1_true[beta1_choice] * X2[point] + np.random.randn()
        Data.loc[point] = [this_Y, alpha_choice, beta0_choice, beta1_choice]
        
    basic_model = Model()
    
    with basic_model:
    
        # Priors for unknown model parameters
        alpha = Normal('alpha', mu=0, sigma=1, shape=2)
        beta0 = Normal('beta0', mu=12, sigma=1, shape=2)
        beta1 = Normal('beta1', mu=18, sigma=1, shape=2)
    
        # Expected value of outcome
        mu = alpha[Data["alpha"].values.astype(int)] + beta0[Data["beta0"].values.astype(int)] + beta1[Data["beta1"].values.astype(int)]
    
        # Likelihood (sampling distribution) of observations
        Y_obs = Normal('Y_obs', mu=mu, sigma=1, observed=Data["Y"])
    
        # draw 1000 posterior samples
        trace = sample(1000)
        
    traceplot(trace);
    
    def from_posterior(param, samples):
        smin, smax = np.min(samples), np.max(samples)
        width = smax - smin
        x = np.linspace(smin, smax, 100)
        y = stats.gaussian_kde(samples)(x)
    
        # what was never sampled should have a small probability but not 0,
        # so we'll extend the domain and use linear approximation of density on it
        x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
        y = np.concatenate([[0], y, [0]])
        return Interpolated(param, x, y)
    
    traces = [trace]
    
    for _ in range(10):
    
        # generate more data
        X1 = np.random.randn(size)
        X2 = np.random.randn(size) * 0.2
        for point in range(0,size):
            alpha_choice = choice([0,1])
            beta0_choice = choice([0,1])
            beta1_choice = choice([0,1])
            
            this_alpha = alpha_true[alpha_choice]
            this_beta0 = beta0_true[beta0_choice]
            this_beta1 = beta1_true[beta1_choice]
            this_Y = alpha_true[alpha_choice] + beta0_true[beta0_choice] * X1[point] + beta1_true[beta1_choice] * X2[point] + np.random.randn()
            Data.loc[point] = [this_Y, alpha_choice, beta0_choice, beta1_choice]
    
        model = Model()
        with model:
            # Priors are posteriors from previous iteration
            alpha = from_posterior('alpha', trace['alpha'])
            beta0 = from_posterior('beta0', trace['beta0'])
            beta1 = from_posterior('beta1', trace['beta1'])
    
            # Expected value of outcome
            mu = alpha[Data["alpha"].values.astype(int)] + beta0[Data["beta0"].values.astype(int)] + beta1[Data["beta1"].values.astype(int)]
    
            # Likelihood (sampling distribution) of observations
            Y_obs = Normal('Y_obs', mu=mu, sigma=1, observed=Data["Y"])
    
            # draw 1000 posterior samples
            trace = sample(1000)
            traces.append(trace)
    
    print('Posterior distributions after ' + str(len(traces)) + ' iterations.')
    cmap = mpl.cm.autumn
    for param in ['alpha', 'beta0', 'beta1']:
        plt.figure(figsize=(8, 2))
        for update_i, trace in enumerate(traces):
            samples = trace[param]
            smin, smax = np.min(samples), np.max(samples)
            x = np.linspace(smin, smax, 100)
            y = stats.gaussian_kde(samples)(x)
            plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
        plt.axvline({'alpha': alpha_true, 'beta0': beta0_true, 'beta1': beta1_true}[param], c='k')
        plt.ylabel('Frequency')
        plt.title(param)

AlexAndorra · May 21, 2020, 9:13am

Hi,
You should be able to just do alpha = Normal('alpha', mu=np.array([-1., 1.]), sigma=1, shape=2)

Topic		Replies	Views
Updating Priors with Posterior for Vector-Valued Distributions Questions	4	1739	December 6, 2018
Multiple priors in regression model Questions	2	930	January 22, 2021
Can you specify different parameters for a vector of priors? Questions	9	1152	April 6, 2022
Drawing from posterior of a Multivariate Distribution Questions	10	1559	December 22, 2018
Parameterize a multi-dimensional pymc.Interpolated distribution version agnostic	2	802	July 23, 2024

Initializing a vectorized prior with multiple different distributions?

Related topics