Has anyone developed a custom tweedie distribution for sampling? How to implement once developed?

Hello,

I’d like to model insurance pure premium which is usually known to have a tweedie distribution. Has any of you pymc3-ers ever developed a custom tweedie distribution to sample from?

Update: I found a comparable function for a tweedie distribution (see below). How would I actually use this function as a sampling method like pm.Normal/pm.StudentT/etc.?

def tweedie(n,p,mu,phi):

np.random.seed(seed = 32 )

#checking the value of variance power between 1-2

if (p = 2 ):

print ( 'p must be between (1,2)' )

pass

else :

rt = np.full(n,np.nan)

# calculating mean of poisson distribution

lambdaa = mu * * ( 2 - p) / (phi * ( 2 - p))

# shape parameter of gamma distribution

alpha = ( 2 - p) / ( 1 - p)

# scale parameter of gamma distribution

gam = phi * (p - 1 ) * (mu * * (p - 1 ))

# Generating Poisson random sample

N = np.random.poisson(lambdaa,n)

for i in range (n):

# Generate single data point of gamma distribution using poisson random variable

rt[i] = np.random.gamma(N[i] * np. abs (alpha),gam, 1 )

return (rt)

I suggest that you read this introduction to pymc3 distributions. It introduces the two alternatives that you have: define a class that inherits from Distribution and write down both the logp and random methods, or wrap a single logp definition with a DensityDist.

The most important thing needed by pymc3 to do inference is the logp function. If you can write that down you’ll be good to go.

import scipy.special as sp

def tweedie_logp(value, mu, p, phi):
    # Implement the log probability function of the Tweedie distribution
    # based on the provided parameters: mu, p, and phi
    # Return the logarithm of the probability density/mass function of the distribution
    # given the observed value
    
    # Your implementation of the Tweedie log probability function goes here
    est_lambda = mu ** (2 - p) / (2 - p)
    est_alpha = (2 - p) / (p - 1)
    est_beta = mu ** (1 - p) / (p - 1)
    gam = phi * (p - 1 ) * (mu ** (p - 1 ))

    #Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * np.log(value.eval() / mu) - (value.eval() / (mu ** p * phi)) - np.log(phi)
        + np.log((1 - p) * mu ** (1 - p)))


    return logp_value


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    alpha = pm.Normal('alpha', mu=0, sigma=10)
    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = pm.math.exp(alpha+storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)

    # Define the likelihood using the custom Tweedie distribution
    sales = pm.DensityDist('sales_observed', tweedie_logp, observed={'value': storedata['Rslr_Sales_Amount'].values, 'mu': expected_sales, 'p': 1.5, 'phi': 1})

    # Define the model
    trace = pm.sample(1000, tune=1000)

TypeError Traceback (most recent call last) Cell In[100], line 39 34 expected_sales = pm.math.exp(alpha+storedata[‘day_of_week’].values + storedata[‘month’].values + storedata[‘year’].values + storedata[‘MISX’].values * offline_campaign_prior + 35 storedata[‘Status’].values * discounts_prior + 36 storedata[‘event’].values * promotion_events_prior) 38 # Define the likelihood using the custom Tweedie distribution —> 39 sales = pm.DensityDist(‘sales_observed’, tweedie_logp, observed={‘value’: storedata[‘Rslr_Sales_Amount’].values, ‘mu’: expected_sales, ‘p’: 1.5, ‘phi’: 1}) 41 # Define the model 42 trace = pm.sample(1000, tune=1000) File ~\AppData\Roaming\Python\Python38\site-packages\pymc\distributions\distribution.py:944, in CustomDist.new(cls, name, dist, random, logp, logcdf, moment, ndim_supp, ndims_params, dtype, *dist_params, **kwargs) 929 def new( 930 cls, 931 name, (…) 941 **kwargs, 942 ): 943 if isinstance(kwargs.get(“observed”, None), dict): → 944 raise TypeError( 945 “Since v4.0.0 the observed parameter should be of type” 946 " pd.Series, np.array, or pm.Data." 947 " Previous versions allowed passing distribution parameters as" 948 " a dictionary in observed, in the current version these " 949 “parameters are positional arguments.” 950 ) 951 dist_params = cls.parse_dist_params(dist_params) 952 cls.check_valid_dist_random(dist, random, dist_params) TypeError: Since v4.0.0 the observed parameter should be of type pd.Series, np.array, or pm.Data. Previous versions allowed passing distribution parameters as a dictionary in observed, in the current version these parameters are positional arguments.

Hey Lucianopaz i have tried with custom tweedie distribution but i got the following error: below the given inputs are in numpy array. Please help me to solve this.

import scipy.special as sp

# Define the log probability function for the Tweedie distribution
def tweedie_logp(value, mu, p, phi):
    # Ensure positive values for mu, p, and phi
    mu = np.maximum(mu, 1e-10)
    p = np.maximum(p, 1e-10)
    phi = np.maximum(phi, 1e-10)

    # Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * np.log(value / mu) - (value / (mu ** p * phi)) - np.log(phi)
        + np.log((1 - p) * mu ** (1 - p))
    )



    return logp_value.sum()


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    # Prior distributions for the unknown parameters
    mu = pm.Normal('mu', mu=0, sigma=10)
    p = pm.Normal('p', mu=1.5, sigma=1)
    phi = pm.Gamma('phi', alpha=1, beta=1)

    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = (storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)
    

    # Define the observed sales as a data variable
    sales_observed = storedata['Rslr_Sales_Amount'].values


    # Define the likelihood using the custom Tweedie distribution
    sales = pm.DensityDist('sales_observed', logp=tweedie_logp, observed={'value': sales_observed, 'mu': expected_sales, 'p': p, 'phi': phi})
    

    # Define the model
    trace = pm.sample(1000, tune=1000)



error: TypeError                                 Traceback (most recent call last)
Cell In[123], line 45
     41 sales_observed = storedata['Rslr_Sales_Amount'].values
     44 # Define the likelihood using the custom Tweedie distribution
---> 45 sales = pm.DensityDist('sales_observed', logp=tweedie_logp, observed={'value': sales_observed, 'mu': expected_sales, 'p': p, 'phi': phi})
     48 # Define the model
     49 trace = pm.sample(1000, tune=1000)

File ~\AppData\Roaming\Python\Python38\site-packages\pymc\distributions\distribution.py:944, in CustomDist.__new__(cls, name, dist, random, logp, logcdf, moment, ndim_supp, ndims_params, dtype, *dist_params, **kwargs)
    929 def __new__(
    930     cls,
    931     name,
   (...)
    941     **kwargs,
    942 ):
    943     if isinstance(kwargs.get("observed", None), dict):
--> 944         raise TypeError(
    945             "Since ``v4.0.0`` the ``observed`` parameter should be of type"
    946             " ``pd.Series``, ``np.array``, or ``pm.Data``."
    947             " Previous versions allowed passing distribution parameters as"
    948             " a dictionary in ``observed``, in the current version these "
    949             "parameters are positional arguments."
    950         )
    951     dist_params = cls.parse_dist_params(dist_params)
    952     cls.check_valid_dist_random(dist, random, dist_params)

TypeError: Since ``v4.0.0`` the ``observed`` parameter should be of type ``pd.Series``, ``np.array``, or ``pm.Data``. Previous versions allowed passing distribution parameters as a dictionary in ``observed``, in the current version these parameters are positional arguments.

The problem is that you’re trying to pass a dictionary as observed data. I could suggest two things. Firstly, not using numpy on your likelihood function, but using pytensor instead. Secondly, use a pm.Potential instead of pm.DensityDist. Here’s the idea:

import pymc as pm
import pytensor.tensor as pt

# Define the log probability function for the Tweedie distribution
def tweedie_logp(name, value, mu, p, phi):
    # Ensure positive values for mu, p, and phi
    mu = pt.maximum(mu, 1e-10)
    p = pt.maximum(p, 1e-10)
    phi = pt.maximum(phi, 1e-10)

    # Calculate the log probability using the provided parameters and observed value
    logp_value = (
        (1 - p) * pt.log(value / mu) - (value / (mu ** p * phi)) - pt.log(phi)
        + pt.log((1 - p) * mu ** (1 - p))
    )

    return logp_value.sum()


with pm.Model() as model:
    # Prior distributions for the unknown parameters

    # Prior distributions for the unknown parameters
    mu = pm.Normal('mu', mu=0, sigma=10)
    p = pm.Normal('p', mu=1.5, sigma=1)
    phi = pm.Gamma('phi', alpha=1, beta=1)

    # Priors for the binary variables
    offline_campaign_prior = pm.Bernoulli('MISX_campaign_prior', p=0.5)
    discounts_prior = pm.Bernoulli('PA_prior', p=0.5)
    promotion_events_prior = pm.Bernoulli('promotion_events_prior', p=0.5)

    # Compute the expected sales based on the variables
    expected_sales = (storedata['day_of_week'].values + storedata['month'].values + storedata['year'].values + storedata['MISX'].values * offline_campaign_prior +
                      storedata['Status'].values * discounts_prior +
                      storedata['event'].values * promotion_events_prior)
    

    # Define the observed sales as a data variable
    sales_observed = storedata['Rslr_Sales_Amount'].values


    # Define the likelihood using the custom Tweedie distribution
    sales = pm.Potential("sales", lp("sales_lp", sales_observed, expected_sales, p, phi)) 
    

    # Define the model
    trace = pm.sample(1000, tune=1000)

I haven’t tested it, as I don’t have the data or appropriate simulations for it. But hopefully helps.

1 Like

Another option (aside from pm.Potential, which will work great) is to use pm.CustomDist. It’s the successor to pm.DensityDist, and would be written:

sales = pm.CustomDist('sales', mu, p, phi, logp=tweedie_logp, observed=sales_observed)

1 Like