Log-scale transform

Varun_Gupta · July 11, 2023, 9:40pm

Hi Everyone !

I am implementing a custom likelihood function and doing the computations in the log space.
There for posterior = prior*likelihood gets converted to Log posterior= log prior + log likelihood

How do I achieve the log transformation of the priors in pymc? Do I even need to do it explicitly or pymc does it internally?

Thank you !

cluhmann · July 12, 2023, 4:39am

Have you checked out the 2 blackbox likelihood notebooks (the one and the other)?

ricardoV94 · July 12, 2023, 7:22am

You don’t have to worry about the logp of other variables. PyMC will retrieve the logp of each variable (free or observed) and combine it as needed depending on what it is doing.

Varun_Gupta · July 12, 2023, 2:36pm

Thanks for referring these, I’ll take a look !

Varun_Gupta · July 12, 2023, 2:36pm

Got it ! Thank you !

Varun_Gupta · July 13, 2023, 1:28am

Hi @ricardoV94
Could you also tell me whats the logP function supposed to return?

Log Likelihood computed and summed over all the dataset (this would be one single floating point value OR we could say a scalar) or just likelihood computed for the whole data, without summing (this would be tensor of shape Nx1 where N is the number of records I have in the dataset)

The above calculation is in log space

cluhmann · July 13, 2023, 2:47am

@ricardoV94 it seems like it may also depend on whether pm.Potential is being use of pm.CustomDist? I thought pm.Potential required a scalar logp. But can pm.CustomDist handle a vector/matrix of logps?

jessegrabowski · July 13, 2023, 3:27am

For potentials it doesn’t ultimately matter because all potentials are summed up in the end, see here

The logp for distributions evaluate at a single point, but can include a batch dimensions. I think it’s most clear when you look at the logp function for multivariate normal. The dimensionality of the distribution, k, used in the normalizing constant, is taken to the last dimension of value. So you give it an (n,k) matrix of values and get back (n,) logp numbers. I don’t know of a distribution that doesn’t accept batch dimensions – maybe GPs count?

So does that count as logp accepting vector/matrix inputs, or does it just count as broadcasting of a (scalar/vector/matrix, depending on the dimensionality of the support) function over batch dims? I think about it as the latter, but maybe that’s wrong.

ricardoV94 · July 13, 2023, 5:48am

What @jessegrabowski said. Logp should have one entry per independent draws (aka batch dimensions). For univariate rvs, logp.shape == rv.shape, for vector rvs logp.shape==rv.shape[:-1], for matrix rvs logp.shape=rv.shape[:-2] and so on…

Varun_Gupta · July 17, 2023, 10:31pm

if my model output is multivariate, then logp output should be multivariate as well?

One log probability per index of random variable?

Also, one draw one means? One draw while sampling? LogP should spit one value per draw? The shape out Logp O/P depends on random variable under study?

ricardoV94 · July 17, 2023, 10:56pm

Think about the core case. What’s the smallest RV draw you can make? You should have one logp scalar for such a draw.

For a univariate normal you can make a scalar draw with shape (), for a multivariate normal a vector draw with shape (n,). Both draws are associated with a scalar logp with shape () because you can’t break the draw “apart”. Anything beyond those are “batch dimensions” and you will have corresponding batch dimensions in the logp.

If you a have a univariate normal with batch shapes (3, 4), and thus shape (3, 4) as well, then logp will have shape (3, 4).

A multivariate normal with batch shapes (3, 4) will have shape (3, 4, n) and logp with shape (3, 4). You basically have one entry per batch dimensions.

Batch dimensions are explained in the pymc dimensionality notebook: Distribution Dimensionality — PyMC 5.6.1 documentation

Varun_Gupta · July 17, 2023, 11:08pm

Thank you @ricardoV94 for being patient.

I have a univariate random variable in my case. According to what you have suggested the draw would have the shape (). Now what I cannot understand is what LogP should return? Lets say I have N data points, should logP return a tensor of shape NX1, where I am returning likelihood calculation for every data point and while sampling PYMC takes care of summing it up OR does LogP return just one single scalar of shape ().

Right now my LogP function generates a tensor variable of shape of the observed data (lets say thats N X 1). Currently, logp values are calculated independently for each observation and are not summed while returning from LogP function. Is that the right way to do it? Or do I need to write the summing up part in logP while returning it?

ricardoV94 · July 18, 2023, 6:26am

Almost, should be shape N, not Nx1.

PyMC will sum whenever needed yes.

Varun_Gupta · July 18, 2023, 6:33am

Thanks a lot for the clarification!

Varun_Gupta · July 18, 2023, 6:25pm

@ricardoV94 I was looking at the example below

Why have they summed while returning from the logp function?

ricardoV94 · July 18, 2023, 6:29pm

That example is unfortunately very outdated.

In general it may also be fine to sum the logp. But some use cases will not be possible if you do this (such as model comparison via elementwise log-likelihood or use in Mixtures, to name a few). So better to do it correctly and not sum unnecessarily.

Varun_Gupta · July 18, 2023, 6:39pm

from pymc.math import where
import pandas as pd
import pymc as pm
from pymc.math import log,exp
from pytensor import tensor as pt
import numpy as np
import nutpie

from numpy import pi
from sklearn.preprocessing import StandardScaler
import pymc.sampling_jax
import numpyro
import jax

numpyro.set_host_device_count(4)

def standardize(x, mu, sigma):
    return (x - mu) / sigma


def standardNormCdf(x):
    return 0.5 + 0.5 * pm.math.erf(x / pm.math.sqrt(2))


def getContributionFromInterval(interval, mu, sigma):
    logT = log(interval)
    a = standardize(logT, mu, sigma)
    return log(1 - standardNormCdf(a))


def computeFailureRate(sigma, mu, t, theta, data):
    logT = log(t)
    a1 = standardize(logT, mu, sigma)
    a2 = 0.5 * log(2 * pi * pow(sigma, 2))
    B = 0.5 * pow(a1, 2)
    C = log(1 - standardNormCdf(a1))
    D = (theta * data).sum(axis=-1).reshape((-1,1))
    failureRate = ((-1) * logT) - a2 - B - C + D
    return failureRate


def getSurvival(time_to_event, mu, sigma, theta, features):
    contributionFromFinalInterval = (-1) * getContributionFromInterval(time_to_event, mu, sigma)
    theta_features_vector = (theta * features).sum(axis=-1)
    theta_features_vector = theta_features_vector.reshape((-1, 1))
    survival = (contributionFromFinalInterval * exp(theta_features_vector))
    # survival = survival.sum(axis=1).reshape((-1, 1))
    return survival


def runModel():
    bcphm_model = pm.Model()
    with bcphm_model:
        data = pd.read_csv('data/precovid_SF.csv')

        data.sort_values(by=['event'], inplace=True)
        data.loc[(data.FirstTimeHomeBuyer.isna()) & data.LoanPurpose.isin(
            ['Refinance', 'CashOutRefi']), 'FirstTimeHomeBuyer'] = 'Not Applicable'
        data['PMI'].fillna(0, inplace=True)
        data = data[data.time_to_event != 0]

        data.dropna(inplace=True)
        data['ClosingDt'] = pd.to_datetime(data['ClosingDt']).dt.year

        data = data.groupby('event', group_keys=False).apply(lambda x: x.sample(frac=0.2))

        events = np.where(data.event.values == "default", 0, np.where(data.event.values == "prepayment", 1, 2)).reshape(
            (-1, 1))

        time_to_event = data.time_to_event.values

        time_to_event_shape = time_to_event.shape
        time_to_event = time_to_event.reshape(time_to_event_shape[0], 1)
        data['isSingleBorrower'] = data['isSingleBorrower'].map({0: 'No', 1: 'Yes'})
        features = data.drop(
            columns=['LoanNumber', 'time_to_default', 'time_to_prepayment', 'State', 'ClosingDt', 'event',
                     'time_to_event'], axis=1)
        categorical = [col for col in features.columns if features[col].dtype == "O"]
        quantitative = set(features.columns) - set(categorical)
        # features = np.repeat(features[:, np.newaxis, :], lifetime.shape[1], axis=1)

        categorical_dummies = pd.get_dummies(features.loc[:, list(categorical)], columns=categorical, drop_first=True)

        sc = StandardScaler()
        standardized_quantitative = pd.DataFrame(sc.fit_transform(features.loc[:, list(quantitative)]),
                                                 columns=list(quantitative))
        model_input = pd.concat(
            [categorical_dummies.reset_index(drop=True), standardized_quantitative.reset_index(drop=True)],
            axis=1)

        model_input.replace({False: 0, True: 1}, inplace=True)

        features_shape = model_input.shape

        features_num = model_input.values
        features_num = pm.MutableData('features_num', features_num)
        events = pm.MutableData('events', events)

        theta_D = pm.Normal('theta_D', mu=0, sigma=100, shape=features_shape[1])
        theta_P = pm.Normal('theta_P', mu=0, sigma=100, shape=features_shape[1])
        mu_D = pm.Normal('mu_D', mu=0, sigma=10)
        mu_P = pm.Normal('mu_P', mu=0, sigma=10)
        sigma_D = pm.Exponential('sigma_D', .01)
        sigma_P = pm.Exponential('sigma_P', .01)

        def logp(time_to_event, mu_P, mu_D, sigma_P,
                 sigma_D, theta_D, theta_P, event,
                 features):
            failureRate = where(
                pt.eq(event, 0),
                computeFailureRate(sigma_D, mu_D, time_to_event, theta_D, features),
                where(pt.eq(event, 1),
                      computeFailureRate(sigma_P, mu_P, time_to_event, theta_P, features),
                      0)
            )
            defaultSurvival = getSurvival(time_to_event, mu_D, sigma_D,
                                          theta_D, features)
            prepaymentSurival = getSurvival(time_to_event, mu_P,
                                            sigma_P, theta_P, features)
            return (failureRate - defaultSurvival - prepaymentSurival).flatten()

       
        likelihood = pm.CustomDist('LL', mu_P, mu_D,
                                   sigma_P, sigma_D, theta_D, theta_P,
                                   events, features_num,
                                   logp=logp,
                                   observed=time_to_event)

    compiled_model = nutpie.compile_pymc_model(bcphm_model)
    trace_pymc = nutpie.sample(compiled_model)
    return trace

@ricardoV94 I am having trouble in getting this sampling to converge. Could you please take a look and see any obvious issues? Surprisingly it works on my local machine with vanilla sampler and with smaller fragment of dataset.

Same code with Numpyro on aws sagemaker gives me samples where the variables are stuck and every proposal is rejected.

Topic		Replies	Views
Cross-checking logp with scipy Questions	1	405	February 14, 2020
Pymc3.model likelihood evaluation for given prior sample Questions	2	536	June 22, 2018
Compute the KL divergence between two distributions Questions	12	2597	August 13, 2018
Dynamic shaping, "round" function, JAX, and a "few" more questions v5 modeling , jax , pytensor	28	1552	September 29, 2023
Evaluate logposterior at sample points Questions	9	4362	August 16, 2017

Log-scale transform

Related topics