Has anyone developed a custom tweedie distribution for sampling? How to implement once developed?

jordan.howell2 · January 17, 2019, 5:06pm

Hello,

I’d like to model insurance pure premium which is usually known to have a tweedie distribution. Has any of you pymc3-ers ever developed a custom tweedie distribution to sample from?

Update: I found a comparable function for a tweedie distribution (see below). How would I actually use this function as a sampling method like pm.Normal/pm.StudentT/etc.?

def tweedie(n,p,mu,phi):

np.random.seed(seed = 32 )

#checking the value of variance power between 1-2

if (p = 2 ):

print ( 'p must be between (1,2)' )

pass

else :

rt = np.full(n,np.nan)

# calculating mean of poisson distribution

lambdaa = mu * * ( 2 - p) / (phi * ( 2 - p))

# shape parameter of gamma distribution

alpha = ( 2 - p) / ( 1 - p)

# scale parameter of gamma distribution

gam = phi * (p - 1 ) * (mu * * (p - 1 ))

# Generating Poisson random sample

N = np.random.poisson(lambdaa,n)

for i in range (n):

# Generate single data point of gamma distribution using poisson random variable

rt[i] = np.random.gamma(N[i] * np. abs (alpha),gam, 1 )

return (rt)

lucianopaz · January 17, 2019, 8:40pm

I suggest that you read this introduction to pymc3 distributions. It introduces the two alternatives that you have: define a class that inherits from Distribution and write down both the logp and random methods, or wrap a single logp definition with a DensityDist.

lucianopaz · January 17, 2019, 8:42pm

The most important thing needed by pymc3 to do inference is the logp function. If you can write that down you’ll be good to go.

raviteja_k · July 3, 2023, 3:10pm

import scipy.special as sp

def tweedie_logp(value, mu, p, phi):
    # Implement the log probability function of the Tweedie distribution
    # based on the provided parameters: mu, p, and phi
    # Return the logarithm of the probability density/mass function of the distribution
    # given the observed value
    
    # Your implementation of the Tweedie log probability function goes here
    est_lambda = mu ** (2 - p) / (2 - p)
    est_alpha = (2 - p) / (p - 1)
    est_beta = mu ** (1 - p) / (p - 1)
    gam = phi * (p - 1 ) * (mu ** (p - 1 ))

    #Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * np.log(value.eval() / mu) - (value.eval() / (mu ** p * phi)) - np.log(phi)
        + np.log((1 - p) * mu ** (1 - p)))


    return logp_value


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    alpha = pm.Normal('alpha', mu=0, sigma=10)
    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = pm.math.exp(alpha+storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)

    # Define the likelihood using the custom Tweedie distribution
    sales = pm.DensityDist('sales_observed', tweedie_logp, observed={'value': storedata['Rslr_Sales_Amount'].values, 'mu': expected_sales, 'p': 1.5, 'phi': 1})

    # Define the model
    trace = pm.sample(1000, tune=1000)

TypeError Traceback (most recent call last) Cell In[100], line 39 34 expected_sales = pm.math.exp(alpha+storedata[‘day_of_week’].values + storedata[‘month’].values + storedata[‘year’].values + storedata[‘MISX’].values * offline_campaign_prior + 35 storedata[‘Status’].values * discounts_prior + 36 storedata[‘event’].values * promotion_events_prior) 38 # Define the likelihood using the custom Tweedie distribution —> 39 sales = pm.DensityDist(‘sales_observed’, tweedie_logp, observed={‘value’: storedata[‘Rslr_Sales_Amount’].values, ‘mu’: expected_sales, ‘p’: 1.5, ‘phi’: 1}) 41 # Define the model 42 trace = pm.sample(1000, tune=1000) File ~\AppData\Roaming\Python\Python38\site-packages\pymc\distributions\distribution.py:944, in CustomDist.new(cls, name, dist, random, logp, logcdf, moment, ndim_supp, ndims_params, dtype, *dist_params, **kwargs) 929 def new( 930 cls, 931 name, (…) 941 **kwargs, 942 ): 943 if isinstance(kwargs.get(“observed”, None), dict): → 944 raise TypeError( 945 “Since v4.0.0 the observed parameter should be of type” 946 " pd.Series, np.array, or pm.Data." 947 " Previous versions allowed passing distribution parameters as" 948 " a dictionary in observed, in the current version these " 949 “parameters are positional arguments.” 950 ) 951 dist_params = cls.parse_dist_params(dist_params) 952 cls.check_valid_dist_random(dist, random, dist_params) TypeError: Since v4.0.0 the observed parameter should be of type pd.Series, np.array, or pm.Data. Previous versions allowed passing distribution parameters as a dictionary in observed, in the current version these parameters are positional arguments.

raviteja_k · July 3, 2023, 3:20pm

Hey Lucianopaz i have tried with custom tweedie distribution but i got the following error: below the given inputs are in numpy array. Please help me to solve this.

import scipy.special as sp

# Define the log probability function for the Tweedie distribution
def tweedie_logp(value, mu, p, phi):
    # Ensure positive values for mu, p, and phi
    mu = np.maximum(mu, 1e-10)
    p = np.maximum(p, 1e-10)
    phi = np.maximum(phi, 1e-10)

    # Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * np.log(value / mu) - (value / (mu ** p * phi)) - np.log(phi)
        + np.log((1 - p) * mu ** (1 - p))
    )



    return logp_value.sum()


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    # Prior distributions for the unknown parameters
    mu = pm.Normal('mu', mu=0, sigma=10)
    p = pm.Normal('p', mu=1.5, sigma=1)
    phi = pm.Gamma('phi', alpha=1, beta=1)

    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = (storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)
    

    # Define the observed sales as a data variable
    sales_observed = storedata['Rslr_Sales_Amount'].values


    # Define the likelihood using the custom Tweedie distribution
    sales = pm.DensityDist('sales_observed', logp=tweedie_logp, observed={'value': sales_observed, 'mu': expected_sales, 'p': p, 'phi': phi})
    

    # Define the model
    trace = pm.sample(1000, tune=1000)



error: TypeError                                 Traceback (most recent call last)
Cell In[123], line 45
     41 sales_observed = storedata['Rslr_Sales_Amount'].values
     44 # Define the likelihood using the custom Tweedie distribution
---> 45 sales = pm.DensityDist('sales_observed', logp=tweedie_logp, observed={'value': sales_observed, 'mu': expected_sales, 'p': p, 'phi': phi})
     48 # Define the model
     49 trace = pm.sample(1000, tune=1000)

File ~\AppData\Roaming\Python\Python38\site-packages\pymc\distributions\distribution.py:944, in CustomDist.__new__(cls, name, dist, random, logp, logcdf, moment, ndim_supp, ndims_params, dtype, *dist_params, **kwargs)
    929 def __new__(
    930     cls,
    931     name,
   (...)
    941     **kwargs,
    942 ):
    943     if isinstance(kwargs.get("observed", None), dict):
--> 944         raise TypeError(
    945             "Since ``v4.0.0`` the ``observed`` parameter should be of type"
    946             " ``pd.Series``, ``np.array``, or ``pm.Data``."
    947             " Previous versions allowed passing distribution parameters as"
    948             " a dictionary in ``observed``, in the current version these "
    949             "parameters are positional arguments."
    950         )
    951     dist_params = cls.parse_dist_params(dist_params)
    952     cls.check_valid_dist_random(dist, random, dist_params)

TypeError: Since ``v4.0.0`` the ``observed`` parameter should be of type ``pd.Series``, ``np.array``, or ``pm.Data``. Previous versions allowed passing distribution parameters as a dictionary in ``observed``, in the current version these parameters are positional arguments.

Simon · July 5, 2023, 9:08am

The problem is that you’re trying to pass a dictionary as observed data. I could suggest two things. Firstly, not using numpy on your likelihood function, but using pytensor instead. Secondly, use a pm.Potential instead of pm.DensityDist. Here’s the idea:

import pymc as pm
import pytensor.tensor as pt

# Define the log probability function for the Tweedie distribution
def tweedie_logp(name, value, mu, p, phi):
    # Ensure positive values for mu, p, and phi
    mu = pt.maximum(mu, 1e-10)
    p = pt.maximum(p, 1e-10)
    phi = pt.maximum(phi, 1e-10)

    # Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * pt.log(value / mu) - (value / (mu ** p * phi)) - pt.log(phi)
        + pt.log((1 - p) * mu ** (1 - p))
    )

    return logp_value.sum()


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    # Prior distributions for the unknown parameters
    mu = pm.Normal('mu', mu=0, sigma=10)
    p = pm.Normal('p', mu=1.5, sigma=1)
    phi = pm.Gamma('phi', alpha=1, beta=1)

    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = (storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)
    

    # Define the observed sales as a data variable
    sales_observed = storedata['Rslr_Sales_Amount'].values


    # Define the likelihood using the custom Tweedie distribution
    sales = pm.Potential("sales", lp("sales_lp", sales_observed, expected_sales, p, phi)) 
    

    # Define the model
    trace = pm.sample(1000, tune=1000)

I haven’t tested it, as I don’t have the data or appropriate simulations for it. But hopefully helps.

jessegrabowski · July 5, 2023, 10:21am

Another option (aside from pm.Potential, which will work great) is to use pm.CustomDist. It’s the successor to pm.DensityDist, and would be written:

sales = pm.CustomDist('sales', mu, p, phi, logp=tweedie_logp, observed=sales_observed)

Topic		Replies	Views
Random Sampling from Custom Distribution Questions	1	1151	July 1, 2020
Drawing values from custom distribution Questions	8	977	September 14, 2022
Posterior Predictive Check for Density Dist Questions	4	360	November 17, 2021
Examples of random() in custom distribution? Questions	19	4060	April 7, 2023
Creating a LogStudentT distribution v5	10	587	May 6, 2023

Has anyone developed a custom tweedie distribution for sampling? How to implement once developed?

Related topics