Log-scale transform

Hi Everyone !

I am implementing a custom likelihood function and doing the computations in the log space.
There for posterior = prior*likelihood gets converted to Log posterior= log prior + log likelihood

How do I achieve the log transformation of the priors in pymc? Do I even need to do it explicitly or pymc does it internally?

Thank you !

Have you checked out the 2 blackbox likelihood notebooks (the one and the other)?

You don’t have to worry about the logp of other variables. PyMC will retrieve the logp of each variable (free or observed) and combine it as needed depending on what it is doing.

1 Like

Thanks for referring these, I’ll take a look !

Got it ! Thank you !

Hi @ricardoV94
Could you also tell me whats the logP function supposed to return?

Log Likelihood computed and summed over all the dataset (this would be one single floating point value OR we could say a scalar) or just likelihood computed for the whole data, without summing (this would be tensor of shape Nx1 where N is the number of records I have in the dataset)

The above calculation is in log space

@ricardoV94 it seems like it may also depend on whether pm.Potential is being use of pm.CustomDist? I thought pm.Potential required a scalar logp. But can pm.CustomDist handle a vector/matrix of logps?

For potentials it doesn’t ultimately matter because all potentials are summed up in the end, see here

The logp for distributions evaluate at a single point, but can include a batch dimensions. I think it’s most clear when you look at the logp function for multivariate normal. The dimensionality of the distribution, k, used in the normalizing constant, is taken to the last dimension of value. So you give it an (n,k) matrix of values and get back (n,) logp numbers. I don’t know of a distribution that doesn’t accept batch dimensions – maybe GPs count?

So does that count as logp accepting vector/matrix inputs, or does it just count as broadcasting of a (scalar/vector/matrix, depending on the dimensionality of the support) function over batch dims? I think about it as the latter, but maybe that’s wrong.

2 Likes

What @jessegrabowski said. Logp should have one entry per independent draws (aka batch dimensions). For univariate rvs, logp.shape == rv.shape, for vector rvs logp.shape==rv.shape[:-1], for matrix rvs logp.shape=rv.shape[:-2] and so on…

if my model output is multivariate, then logp output should be multivariate as well?

One log probability per index of random variable?

Also, one draw one means? One draw while sampling? LogP should spit one value per draw? The shape out Logp O/P depends on random variable under study?

Think about the core case. What’s the smallest RV draw you can make? You should have one logp scalar for such a draw.

For a univariate normal you can make a scalar draw with shape (), for a multivariate normal a vector draw with shape (n,). Both draws are associated with a scalar logp with shape () because you can’t break the draw “apart”. Anything beyond those are “batch dimensions” and you will have corresponding batch dimensions in the logp.

If you a have a univariate normal with batch shapes (3, 4), and thus shape (3, 4) as well, then logp will have shape (3, 4).

A multivariate normal with batch shapes (3, 4) will have shape (3, 4, n) and logp with shape (3, 4). You basically have one entry per batch dimensions.

Batch dimensions are explained in the pymc dimensionality notebook: Distribution Dimensionality — PyMC 5.6.1 documentation

2 Likes

Thank you @ricardoV94 for being patient.

I have a univariate random variable in my case. According to what you have suggested the draw would have the shape (). Now what I cannot understand is what LogP should return? Lets say I have N data points, should logP return a tensor of shape NX1, where I am returning likelihood calculation for every data point and while sampling PYMC takes care of summing it up OR does LogP return just one single scalar of shape ().

Right now my LogP function generates a tensor variable of shape of the observed data (lets say thats N X 1). Currently, logp values are calculated independently for each observation and are not summed while returning from LogP function. Is that the right way to do it? Or do I need to write the summing up part in logP while returning it?

Almost, should be shape N, not Nx1.

PyMC will sum whenever needed yes.

1 Like

Thanks a lot for the clarification!

@ricardoV94 I was looking at the example below

Why have they summed while returning from the logp function?

That example is unfortunately very outdated.

In general it may also be fine to sum the logp. But some use cases will not be possible if you do this (such as model comparison via elementwise log-likelihood or use in Mixtures, to name a few). So better to do it correctly and not sum unnecessarily.

from pymc.math import where
import pandas as pd
import pymc as pm
from pymc.math import log,exp
from pytensor import tensor as pt
import numpy as np
import nutpie

from numpy import pi
from sklearn.preprocessing import StandardScaler
import pymc.sampling_jax
import numpyro
import jax

numpyro.set_host_device_count(4)

def standardize(x, mu, sigma):
    return (x - mu) / sigma


def standardNormCdf(x):
    return 0.5 + 0.5 * pm.math.erf(x / pm.math.sqrt(2))


def getContributionFromInterval(interval, mu, sigma):
    logT = log(interval)
    a = standardize(logT, mu, sigma)
    return log(1 - standardNormCdf(a))


def computeFailureRate(sigma, mu, t, theta, data):
    logT = log(t)
    a1 = standardize(logT, mu, sigma)
    a2 = 0.5 * log(2 * pi * pow(sigma, 2))
    B = 0.5 * pow(a1, 2)
    C = log(1 - standardNormCdf(a1))
    D = (theta * data).sum(axis=-1).reshape((-1,1))
    failureRate = ((-1) * logT) - a2 - B - C + D
    return failureRate


def getSurvival(time_to_event, mu, sigma, theta, features):
    contributionFromFinalInterval = (-1) * getContributionFromInterval(time_to_event, mu, sigma)
    theta_features_vector = (theta * features).sum(axis=-1)
    theta_features_vector = theta_features_vector.reshape((-1, 1))
    survival = (contributionFromFinalInterval * exp(theta_features_vector))
    # survival = survival.sum(axis=1).reshape((-1, 1))
    return survival


def runModel():
    bcphm_model = pm.Model()
    with bcphm_model:
        data = pd.read_csv('data/precovid_SF.csv')

        data.sort_values(by=['event'], inplace=True)
        data.loc[(data.FirstTimeHomeBuyer.isna()) & data.LoanPurpose.isin(
            ['Refinance', 'CashOutRefi']), 'FirstTimeHomeBuyer'] = 'Not Applicable'
        data['PMI'].fillna(0, inplace=True)
        data = data[data.time_to_event != 0]

        data.dropna(inplace=True)
        data['ClosingDt'] = pd.to_datetime(data['ClosingDt']).dt.year

        data = data.groupby('event', group_keys=False).apply(lambda x: x.sample(frac=0.2))

        events = np.where(data.event.values == "default", 0, np.where(data.event.values == "prepayment", 1, 2)).reshape(
            (-1, 1))

        time_to_event = data.time_to_event.values

        time_to_event_shape = time_to_event.shape
        time_to_event = time_to_event.reshape(time_to_event_shape[0], 1)
        data['isSingleBorrower'] = data['isSingleBorrower'].map({0: 'No', 1: 'Yes'})
        features = data.drop(
            columns=['LoanNumber', 'time_to_default', 'time_to_prepayment', 'State', 'ClosingDt', 'event',
                     'time_to_event'], axis=1)
        categorical = [col for col in features.columns if features[col].dtype == "O"]
        quantitative = set(features.columns) - set(categorical)
        # features = np.repeat(features[:, np.newaxis, :], lifetime.shape[1], axis=1)

        categorical_dummies = pd.get_dummies(features.loc[:, list(categorical)], columns=categorical, drop_first=True)

        sc = StandardScaler()
        standardized_quantitative = pd.DataFrame(sc.fit_transform(features.loc[:, list(quantitative)]),
                                                 columns=list(quantitative))
        model_input = pd.concat(
            [categorical_dummies.reset_index(drop=True), standardized_quantitative.reset_index(drop=True)],
            axis=1)

        model_input.replace({False: 0, True: 1}, inplace=True)

        features_shape = model_input.shape

        features_num = model_input.values
        features_num = pm.MutableData('features_num', features_num)
        events = pm.MutableData('events', events)

        theta_D = pm.Normal('theta_D', mu=0, sigma=100, shape=features_shape[1])
        theta_P = pm.Normal('theta_P', mu=0, sigma=100, shape=features_shape[1])
        mu_D = pm.Normal('mu_D', mu=0, sigma=10)
        mu_P = pm.Normal('mu_P', mu=0, sigma=10)
        sigma_D = pm.Exponential('sigma_D', .01)
        sigma_P = pm.Exponential('sigma_P', .01)

        def logp(time_to_event, mu_P, mu_D, sigma_P,
                 sigma_D, theta_D, theta_P, event,
                 features):
            failureRate = where(
                pt.eq(event, 0),
                computeFailureRate(sigma_D, mu_D, time_to_event, theta_D, features),
                where(pt.eq(event, 1),
                      computeFailureRate(sigma_P, mu_P, time_to_event, theta_P, features),
                      0)
            )
            defaultSurvival = getSurvival(time_to_event, mu_D, sigma_D,
                                          theta_D, features)
            prepaymentSurival = getSurvival(time_to_event, mu_P,
                                            sigma_P, theta_P, features)
            return (failureRate - defaultSurvival - prepaymentSurival).flatten()

       
        likelihood = pm.CustomDist('LL', mu_P, mu_D,
                                   sigma_P, sigma_D, theta_D, theta_P,
                                   events, features_num,
                                   logp=logp,
                                   observed=time_to_event)

    compiled_model = nutpie.compile_pymc_model(bcphm_model)
    trace_pymc = nutpie.sample(compiled_model)
    return trace

@ricardoV94 I am having trouble in getting this sampling to converge. Could you please take a look and see any obvious issues? Surprisingly it works on my local machine with vanilla sampler and with smaller fragment of dataset.

Same code with Numpyro on aws sagemaker gives me samples where the variables are stuck and every proposal is rejected.