Having problems defining random function for custom distribution with mixture of observed and random variables

anurag.sethi · November 16, 2018, 6:59pm

I am a pymc newbee and I have been looking for examples of this but havent found any. I have no problem creating a custom function for survival analysis with the Gompertz function:

def survival(event, logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    return ((event * log_h) - H).sum()

with pm.Model() as model:
        param, mu, y, sigma = {}, {}, {}, {}
        BoundedNormal = pm.Bound(pm.Normal, lower=-10, upper=0)
        BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
        BoundedNormal3 = pm.Bound(pm.Normal, lower=-2, upper=2)
        BoundedHalfNormal = pm.Bound(pm.HalfNormal, lower=0, upper=2)
        for curr_feature in bn.nodes():
            if curr_feature == 'S':
                logalpha = BoundedNormal('logalpha', sd=10)
                beta = BoundedNormal2('beta', sd=10)
                S = pm.DensityDist('S', survival,
                                   observed={'event': data['status'],
                                             'logalpha': logalpha,
                                             'beta': beta,
                                             'time': data['Survival Time']/365
                                             }
                                   )
with model:
        trace = pm.sample(2000, tune=1000)
        pm.traceplot(trace)

This runs perfectly. However, I would also like to do some posterior predictions for new data. I am having problems with defining the random function for the Gompertz function. I am assuming I need to draw logalpha and beta and take event and time from the test data. How do I do this? How should I pass event and time to this function? The following gives me errors:

def survival_random(point=None, size=None):
                    # draw a numerical value for the parameters
                    logalpha_, beta_ = draw_values([loagalpha, beta], point=point)
                    size = 1 if size is None else size
                    return generate_samples(survival, logalpha=logalpha_, beta=beta_, size=size, broadcast_shape=(size,))

This gives the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 model = run_model(bn, data_shared)
2 with model:
3 trace = pm.sample(10000, tune=2000, cores=4)

<ipython-input-17-e9ddcacb62db> in run_model(bn, data, columns)
     52                 S = pm.DensityDist('S', survival(logalpha, beta, data[:, i], data[:, j]/365),
---> 53                                    random=random_survival
     54                                    )
     55             elif curr_feature == 'logalpha':

~/pymc3/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     40             total_size = kwargs.pop('total_size', None)
     41             dist = cls.dist(*args, **kwargs)
---> 42             return model.Var(name, dist, data, total_size)
     43         else:
     44             raise TypeError("Name needs to be a string but got: {}".format(name))

~/pymc3/lib/python3.7/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size)
    806                 with self:
    807                     var = FreeRV(name=name, distribution=dist,
--> 808                                  total_size=total_size, model=self)
    809                 self.free_RVs.append(var)
    810             else:

~/pymc3/lib/python3.7/site-packages/pymc3/model.py in __init__(self, type, owner, index, name, distribution, total_size, model)
   1206             self.tag.test_value = np.ones(
   1207                 distribution.shape, distribution.dtype) * distribution.default()
-> 1208             self.logp_elemwiset = distribution.logp(self)
   1209             # The logp might need scaling in minibatches.
   1210             # This is done in `Factor`.

TypeError: 'TensorVariable' object is not callable

Thanks for helping me out with this in advance.

junpenglao · November 18, 2018, 9:29am

Hmmm, maybe an easier way to do is to extract the logalpha and beta from the posterior (i.e., trace), and build the posterior prediction logalpha + beta * time, and logistic transformed it and generate Bernoulli random?

anurag.sethi · November 19, 2018, 6:00am

Thanks for your answer @junpenglao. The problem I am trying to solve is to predict alpha and beta given a set of covariates. Also, it is essential to predict alpha and beta separately as one is proportional to fitness of individual at t=0 and beta is related to “aging” rate of the individual (i.e., the rate at which fitness reduces with time). Ideally, I would like to have both alpha and beta individually rather than just alpha + beta*time.

Is there no way to do this with pymc? In general, it would be great to have a few examples of how random function is written for custom functions within pymc (or maybe I just haven’t found them). For example, could I make logalpha and beta observed but make all of it missing values?

Again, thanks in advance.

junpenglao · November 19, 2018, 6:30am

You already have alpha and beta individual from the posterior trace. The question is to generate random sample of S according to the Density function.
If a specific random generation algorithm is not known, I guess you can always implement a reject sampler or an important sampler. There is no example for it in pymc3, so if you got something working please share with us

anurag.sethi · November 20, 2018, 3:37pm

@junpenglao I think I understand what you are saying now. In an ideal world, we should treat this as a Poisson process with time-dependent probabilities. Can you please help me figure out how we can implement what you have said in the random function given that time is part of the input data and logalpha and beta are latent parameters? I am struggling with writing the right random function for this particular situation.

junpenglao · November 20, 2018, 6:23pm

Since your logp of the observed is:

def survival(event, logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    return ((event * log_h) - H).sum()

what I would do (not sure if it is correct or not!) is get a temporal function of the latent alpha and beta:

def survival2_random(logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    logp_temporal = np.stack([(0. * log_h) - H, (1. * log_h) - H]) <== observed being 0 or 1
    p_temporal = np.exp(logp_temporal)
    p_norm = p_temporal/p_temporal.sum(axis=1, keepdim=True)
    return np.random.bernoulli(p_norm)

# and call it like:
for point in trace:
    randomsample = survival2_random(point['logalpha'], point['beta'], time)

I did not run the code so there might be mistake

Topic		Replies	Views
Defining random for a CustomDist time series v5 time_series , modeling	6	663	January 26, 2023
Custom Distribution with pm.CustomDist v5 modeling	4	1704	June 27, 2023
Examples of random() in custom distribution? Questions	19	4060	April 7, 2023
Observed variable using custom function Questions	1	659	December 14, 2017
Using custom python function within PyMC3 v5	2	963	August 2, 2022

Having problems defining random function for custom distribution with mixture of observed and random variables

Related topics