Iterating over Tensors - pytensor.scan

Hi Community,

I am trying to implement a custom likelihood function and need to iterate over individual values of the observed output variable. Need to use these values as an end argument to arange function. I am trying to use pytensor.scan to achieve the same within the LogP function.

I’m a newbie to PyMC, can somebody help me with this?

Input to logP function -

time_to_event.shape

TensorConstant{(1,) of 50}

[24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24., 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24. 24., 24. 24. 24. 17. 2. 6. 14. 23. 16. 7. 20. 20. 12. 10.]

def standardize(x, mu, sigma):
    return (x - mu) / sigma


def standardNormCdf(x):
    return 0.5 + 0.5 * pm.math.erf(x / pm.math.sqrt(2))


def exponetiateDotProduct(x, y):
    return exp(x * y)


def getContributionFromInterval(interval, mu, sigma):
    print('getContributionFromInterval')
    logT = log(interval)
    a = standardize(logT, mu, sigma)
    return log(1 - standardNormCdf(a))

def getSurvival(time_to_event,mu,sigma,theta):
    survival = 0
    for interval in pt.arange(1,time_to_event):
        logT = log(interval)
        a = standardize(logT)
        features = data[getListOfFeatures()]
        A = (-1) * getContributionFromInterval(interval, mu, sigma)
        B = getContributionFromInterval(interval - 1, mu, sigma)
        survival += exponetiateDotProduct(theta, features) * (A + B)
    return survival

def runModel():
    bcphm_model = pm.Model()
    with bcphm_model:
        data = pd.read_csv('data/prevcovid_SF.csv')
        data = data.head(50)
        data.to_csv('first50.csv', index=False)
        test_columns = ['time_to_default', 'time_to_prepayment', 'DTI']
        data = data[test_columns]
        data['event'] = data.apply(
            lambda x: 'default' if x.time_to_default > 0 else ('prepayment' if x.time_to_prepayment > 0 else 'active'),
            axis=1)
        data['time_to_event'] = data.apply(
            lambda x: x.time_to_default if x.time_to_default > 0 else (
                x.time_to_prepayment if x.time_to_prepayment > 0 else 24), axis=1)

        data.sort_values(by=['event'], inplace=True)
        events = np.where(data.event.values == "default", 0, np.where(data.event.values == "prepayment", 1, 2))
        time_to_event = data.time_to_event.values
        features = data['DTI'].values
        # Define the priors for 𝜭D and 𝜭D
        # Zero mean normal prior with standard deviation 100
        theta_D = pm.Normal('theta_D', mu=0, sigma=100)
        theta_P = pm.Normal('theta_P', mu=0, sigma=100)

        # Define the priors for μD and μP
        # Zero mean normal prior with standard deviation 10
        mu_D = pm.Normal('mu_D', mu=0, sigma=10)
        mu_P = pm.Normal('mu_P', mu=0, sigma=10)

        # Define the priors for σD and σP
        # Exponential with mean = 100
        sigma_D = pm.Exponential('sigma_D', 100)
        sigma_P = pm.Exponential('sigma_P', 100)

        def logp(time_to_event, mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, event, features):
            failureRate = where(
                pt.eq(event, 0),
                computeFailureRate(sigma_D, mu_D, time_to_event, theta_D, features),
                where(pt.eq(event, 1),
                      computeFailureRate(sigma_P, mu_P, time_to_event, theta_P, features),
                      1)
            )

            a, b = pytensor.scan(getSurvival, sequences=[time_to_event], outputs_info=None)
            print(a.eval())

            return time_to_event

        likelihood = pm.CustomDist('LL', mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, events, features,
                                   logp=logp,
                                   observed=time_to_event)

        step = pm.Metropolis()
        trace = pm.sample(5, step=step)


if __name__ == '__main__':
    runModel()

Hi Varun,

I’m trying to go through your code but there’s quite a lot, and it can’t be run without data/precovid_SF.csv. If you updated the post with some code that can be run end-to-end (include generation of artificial data, for example), I’d be happy to give detailed help.

When you’re struggling with scan, it’s also helpful to see some kind of reference implementation of the recursive function you’re working with, either an algorithm from a paper or a pure numpy function. A lot of times you end up not needing a scan at all, but it’s hard to know without more context.

Also, what errors are you getting, if any?

As it stands, though: right away that something looks off with your getSurvival function. I don’t ever expect to see a loop in an inner scan function. Also if I understand it correctly, time_to_event is an integer, so interval is itself just an integer? You can definitely just replace the loop with broadcast elementwise operations in that case.

Also, there’s no dot product in exponentiateDotProduct D:

Hi Jesse !

Thanks for getting back. I have made a few changes and gotten rid of scan, found a work around, which I feel isn’t the right way to get through.
I am trying to implement survival analysis in a competing risks setting.
Screenshot 2023-06-01 at 7.37.42 AM
This is what the likelihood function looks like.

The part within integral is survival’s contribution towards likelihood term, has been solved below -

Screenshot 2023-06-01 at 7.38.37 AM

This needs me to sum over intervals until the "time_to_event’. Intervals are one month long. The terms Sj,Sj-1,Sm’,td refer to the end points of the intervals. J ranges from 1 to m’, where m’ is the penultimate interval right before failure. m′ = max{ j | τ j < tD }. Tau(j) are mid points of these intervals at which the covariates X are recorded. Covariates are vary piecewise constantly on these mid points. Also, Phi is Standard normal CDF.

Here is a dummy dataset.

first50.csv (1.2 KB)

Problem at hand -

Lets say time to event is 10 months. For every month until the 10th would contribute to the survival component, wouldn’t this need some kind of iteration within the getSurvival function?

def standardize(x, mu, sigma):
    return (x - mu) / sigma


def standardNormCdf(x):
    return 0.5 + 0.5 * pm.math.erf(x / pm.math.sqrt(2))


def exponetiateDotProduct(x, y):
    return exp(x * y)


def getContributionFromInterval(interval, mu, sigma):
    logT = log(interval)
    a = standardize(logT, mu, sigma)
    return log(1 - standardNormCdf(a))


def computeFailureRate(sigma, mu, t, theta, data):
    a1 = standardize(log(t), mu, sigma)
    a2 = pow(2 * pi * pow(sigma, 2), -0.5) * pow(t, -1)
    B = pt.exp(-0.5 * pow(a1, 2))
    C = 1 - standardNormCdf(a1)
    D = exponetiateDotProduct(theta, data)
    return (a2 * B * D) / C


def getSurvival(time_to_event,mu,sigma,theta,features):
    survival = pt.zeros(time_to_event.eval().shape[0],dtype=float)
    for i in time_to_event.eval():
        subSurvial=pt.zeros(int(i),dtype=float)
        for interval in pt.arange(1,i).eval():
            A = (-1) * getContributionFromInterval(interval, mu, sigma)
            B = getContributionFromInterval(interval - 1, mu, sigma)
            survival = exponetiateDotProduct(theta, features) * (A + B)
            pt.set_subtensor(subSurvial[int(interval)],survival[0])
        pt.set_subtensor(survival[int(i)], ptm.sum(subSurvial))
    return survival


def getListOfFeatures():
    listOfFeatures = ['DTI']
    return listOfFeatures


def runModel():
    bcphm_model = pm.Model()
    with bcphm_model:
        data = pd.read_csv('first50.csv')
        time_to_event = data.time_to_event.values
        features = data['DTI'].values
        # Define the priors for 𝜭D and 𝜭D
        # Zero mean normal prior with standard deviation 100
        theta_D = pm.Normal('theta_D', mu=0, sigma=100)
        theta_P = pm.Normal('theta_P', mu=0, sigma=100)

        # Define the priors for μD and μP
        # Zero mean normal prior with standard deviation 10
        mu_D = pm.Normal('mu_D', mu=0, sigma=10)
        mu_P = pm.Normal('mu_P', mu=0, sigma=10)

        # Define the priors for σD and σP
        # Exponential with mean = 100
        sigma_D = pm.Exponential('sigma_D', 100)
        sigma_P = pm.Exponential('sigma_P', 100)

        def logp(time_to_event, mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, event, features):
            failureRate = where(
                pt.eq(event, 0),
                computeFailureRate(sigma_D, mu_D, time_to_event, theta_D, features),
                where(pt.eq(event, 1),
                      computeFailureRate(sigma_P, mu_P, time_to_event, theta_P, features),
                      1)
            )

            defaultSurvival=getSurvival(time_to_event,mu_D,sigma_D,theta_D,features)
            prepaymentSurival=getSurvival(time_to_event,mu_P,sigma_P,theta_P,features)
            return failureRate * exp(defaultSurvival+prepaymentSurival)

        likelihood = pm.CustomDist('LL', mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, events, features,
                                   logp=logp,
                                   observed=time_to_event)

        step = pm.Metropolis()
        trace = pm.sample(5, step=step)


if __name__ == '__main__':
    runModel()

You’re right to suspect something is off. You should never use loops on a pytensor graph, unless it’s just as a “macroprocessor” to add groups of variables on the PyMC side. And if you are ever calling .eval() inside your graph, you should have alarm bells blazing in your head.

I think you can probably avoid a scan, because you aren’t actually doing any recursive calculation here. There is an s_{j-1} term in the integral, but you can just make a “lagged s” variable to handle that. The only other wrinkle is that I’m guess m^\prime isn’t necessarily the same for every observation, so your data is a ragged list of lists of varying sizes? I would try to tapdance around that by:

  1. Pad the s vectors out to m^\star = \max(m^\prime_i) and stack them into an N \times m^\star matrix S. The pad values are not relevant, except that they have to be a valid input to the log function (so no zero)
  2. Make the lagged version of S, S_lag = S[:, :-1], and drop the first row of S so that they line up: S = S[:, 1:]. You can replace this with s_0 if you know what it is, but I didn’t see a definition.
  3. Make an boolean matrix also of size N \times m^\star - 1 that records whether the corresponding value in S is real or padded, call it S_mask. I guess it should correspond to S_lag, because anything that is padded in S_lag is definitely padded in S, but the reverse is not true.
  4. Vectorize the entire integral computation into something like this:
A = getContributionFromInterval(S, mu_D, sigma_D)
B = getContributionFromInterval(S_lag, mu_D, sigma_D)
C = getContributionFromInterval(t_D, mu_D, sigma_D)
D = getContributionFromInterval(S[:, last_valid_idx], mu_D, sigma_D)

integral = (pt.exp(theta_D @ X_[:, tau_j]) * (A + B) + pt.exp(theta_D @ X_D[:, tau_m_prime]) * (C + D))

Integral will have shape N \times m^\star - 1, and will contain all the components of the sum in equation A.4. You now just need to zero out the padded elements and take the sum over the last dimension: final_val = (integral * S_mask).sum(axis=1)

1 Like

Hi Jesse !

Thanks a ton for pointing that out. Sj-1 can def be created out of Sj just like you’ve told.
Yes, the data is ragged lists of lists of varying size. The exposure to failure are of different lengths because of failure happening at different times. Your approach sounds just about right, creating two matrices for Sj and Sj-1 and padding them with a non-zero value to make the mathematical calculation work. Also a helper matrix to point whether its a legit value or a padded value.

The only thing I wanted to point out, I dont think we need C and D separately, all the intervals are one month long, covariates in every interval are piecewise constant(they dont vary). In short I didnt see any reason why are we tackling m’ to Td interval separately?

I will implement this and get back to this platform once done, its a little tough to get around with PyMC but it gets the job done sophisticatedly

Probably because I didn’t understand some subtleties of the problem definition. I look X_D(\tau_j)) and X_D({\tau_{m^\prime}}) to be a feature matrix X_D indexed at different times. You will of course understand better than me exactly what these objects are, so just take my suggestion as a general outline. I definitely got some details wrong. The important thing is to pad-shift-mask-sum.

Oh No No ! You’re 100% correct over there and implementation correctly aligns with whats there in the equation. Its my own understanding that we might not need separate handling for the last interval. Probably we can just done one integral that goes from j=1 to m’ and not break it out into two. Because the inner contents of the integrals are the same and exponential on the outside is same too. Nevertheless, thats my job to figure :wink:

But thanks a ton with your prompt replies :slight_smile: Really appreciate the help :slight_smile:

1 Like

Hi Jesse

The below integral is calculated for all the records(N) I have in the dataset. The records are studied for a period of 24 months. For exp(Theta * X(tau_J)) I am expecting a 3D matrix multiplication with both the matrices of shape [N x 24 x Number of regressors(X)/parameters(theta)]. How do I achieve a 3D matrix multiplication and achieve a result of shape [Nx24] (basically multiplying and summing along the last axis). I need to do this as the part of equation in the big braces, I believe, would be of the shape [Nx24].

I hope I’m understanding the process correctly.

Screenshot 2023-06-01 at 7.38.37 AM

Thank you !
Varun

The short answer is that you need to use broadcast multiplication and summation. Calling the number of regressors k, you want to do np.einsum('abk, abk->ab', Theta, X(tau_j)). This can be achieved “by hand” as (Theta * X_tau_j).sum(axis=-1). Note that this isn’t really a generalized dot product, because all dimensions of your tensors match. This leads me to…

A longer answer:

It’s a bit odd that both the parameter object and the data object have the same shape – this would suggest every datapoint has its own parameter.

I want to press you on shapes and concepts a bit more so I don’t give wretched advice. My understanding from the above was that X_D has shape N \times m \times k, where N is the number of “units”, m is the length of the interval, and k is the number of covariates. What I’m imagining is like longitudinal data, so for example we are studying determinants of sovereign debt defaults, a cell in X would be a triple (country, year, indicator), like GDP in Senegal in 1999, and we want to know how more years until a default (at which time Senegal is removed from the dataset, hence the survival angle). Finally, \tau_j is a vector of dates which are not the same for every country, but they conceptually go together. At any rate, we want to compare (country, indicator) pairs with each other at heterogeneous years \tau_j.

So, to continue with this example, we want to slice into X at time \tau_j, so the object X_D(\tau_j) now has shape N \times k. It contains all the economic indicators for all countries that have not defaulted at time \tau_j. My expectation, then, is that \theta_D has shape k – it’s just a vector of regression coefficients. Or perhaps every country has its own coefficient, in which case it has shape N \times k.

Case 1: We want X_D(\tau_j) \theta_D – expand \theta_D to have shape 1 \times k, multiply and sum: (X_D[:, tau_j, :] * theta_D[None, :]).sum(axis=-1). The resulting shape is N.This operation is equivalent to X_D[:, tau_j, :] @ theta_D, or np.einsum('nk, k->n', X_D[:, tau_j, :], theta_D).

Case 2: Everything actually matches up for us already, we just multiply and sum without any expanding: (X_D[:, tau_j, :] * theta_D).sum(axis=-1). The resulting shape is N.

Let me know if this exposition is correct. Maybe I’m missing another dimension on X_D, and that’s why it’s still 3d after you slice in with \tau_j? If so, how to proceed might change depending on what the dimension represents and how the parameters are arranged.