Sampling from power law likelihood

Hello Everyone,
I am new to PyMC and am getting used to the modeling framework. As an example problem, I want to infer the parameters of a power law distribution by sampling from the likelihood function. In particular I am following this paper. Fitting scale free distributions in itself is tricky thing, however, here I intend to have a naive implementation of sampling methods in PyMC. The code I wrote is below :

# you will need the powerlaw package to generate data
!pip install powerlaw

import pymc as pm
import pytensor as pt
import arviz as az
import matplotlib.pyplot as plt 
# %matplotlib inline ## if using the notebook

simulated_dist = pw.Power_Law(xmin=0.5,parameters=[2.5])
test_data = simulated_dist.generate_random(1000)

with pm.Model() as powerlaw_model:
    
    def custom_func(p1,gamma,xmin):
        ## this is equation (B4) from the paper shared above

        return (pt.tensor.log(gamma-1)-pt.tensor.log(xmin)-gamma*pt.tensor.log(p1/xmin)).sum()

    gamma = pm.Uniform('gamma',1,4)
    xmin = pm.Uniform('xmin',0.1,1)
    
    power_law = pm.DensityDist('power_law',gamma,xmin,logp=custom_func,observed=test_data)

with powerlaw_model:
    
    step = pm.Metropolis()
    samples = pm.sample(1000,step=step,chains=3) 
    

This gives me the following output :

As we see this is not what I expect. We expect x_{min} = 0.5 and \gamma = 2.5. Can someone point out what I am doing wrong. I am sure it’s something really stupid. Thanks in advance :slight_smile:

It looks like there’s a typo in the log-likelihood formula. In B4, there is a summation only over the final term.

Also why use Metropolis? I’m pretty sure NUTS should be able to fit this model.

1 Like

Hi… Thanks for pointing out the error. Indeed the sum should be only on the last term and the first two terms should be multiplied by the size of the sample n. I thought by doing a sum on the entire term it should be incorporated. However, I made the changes and used NUTS instead as below.

simulated_dist = pw.Power_Law(xmin=0.5,parameters=[2.5])
test_data = simulated_dist.generate_random(1000)

with pm.Model() as powerlaw_model:
    
    def custom_func(p1,gamma,xmin):
        
        n = pt.tensor.shape(p1).value[0]
        return (n*pt.tensor.log(gamma-1)-n*pt.tensor.log(xmin)- pt.tensor.sum(gamma*pt.tensor.log(p1/xmin)))

    gamma = pm.Uniform('gamma',1,4)
    xmin = pm.Uniform('xmin',0.1,1)
    
    power_law = pm.DensityDist('power_law',gamma,xmin,logp=custom_func,observed=test_data)

with powerlaw_model:
    
    step = pm.NUTS()
    samples = pm.sample(1000,step=step,chains=3) 

Getting this result now… Still not able to fit

I am wondering if I am doing something conceptually wrong here…

Probably the results would be the same whether you use the contracted or expanded form of the likelihood. (I thought you simply used the contracted form in your first attempt, so I stick to that). My impression as a first though and giving a very quick look or the paper you cite, is that xmin is calculated as the minimum of x, rather than as a parameter to be estimated (though they mention that’s an option). But their approach seems to be focused on only estimating alpha. If you change the model according to that, you get a good result for alpha:

import pymc as pm
import pytensor.tensor as at
import arviz as az
import powerlaw as pw

simulated_dist = pw.Power_Law(xmin=0.5,parameters=[2.5])
test_data = simulated_dist.generate_random(1000)

with pm.Model() as powerlaw_model:
    
    def custom_func(x, alpha):
        plog = at.sum(at.log(alpha-1) - at.log(x.min()) - alpha*at.log(x/x.min()))
        return plog

    alpha = pm.Uniform('alpha', 1, 10)

    y = pm.DensityDist('y', alpha, logp=custom_func, observed=test_data)

with powerlaw_model:
    idata = pm.sample(1000)


pos = idata.stack(sample = ['chain', 'draw']).posterior

print("alpha mean: "+str(pos.alpha.values.mean().round(2)))


az.plot_trace(idata, var_names=["alpha"])

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]
 |████████████| 100.00% [8000/8000 00:16<00:00 Sampling 4 chains, 0 divergences]Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 47 seconds.

alpha mean: 2.51

I hope this helps. I haven’t figured out a way to estimate both xmin and alpha, but maybe the uniform prior for xmin won’t work, maybe a different prior could help.

2 Likes

Wow… indeed it gives a good approximation. Will see if I can incorporate x_{min} as well and share. Thanks :slight_smile: . Also, true about the format of the likelihood … both work.

def custom_func(p1,gamma):
        
        n = pt.tensor.shape(p1).value[0]
        return (n*pt.tensor.log(gamma-1)-n*pt.tensor.log(p1.min())- pt.tensor.sum(gamma*pt.tensor.log(p1/p1.min())))

Using expanded form.

3 Likes