Geometric Adstock Paramter Estimation in PyMC3 and Theano (using theano scan)

theano

#1

Hi, I am trying to estimate the parameters of a geometric adstock function (parameter name is alpha) but somehow the PyMC3 converges to the totally wrong results.

I am not sure whether this depends on my implementation of the adstock function or other factors.

The formula for the adstock function can be found in this paper (formula number 1 and 2)

and it is defined by the following code in python:

   def geometric_adstock_function(xx, alpha, normalize=True):
        result = []
        for i, x in enumerate(xx):
            cumsum = 0
            wcumsum = 0
            for l in range(i + 1):
                w = alpha ** l

                cumsum += w * xx[i - l]
                wcumsum += w
            if not normalize:
                result.append(cumsum)
            else:
                result.append(cumsum / wcumsum)

        return np.array(result)

To be able to do paramter estimation in PyMC3 I define a deterministic theano function using theano scan as

def geometric_adstock_theano(xx, alpha):
    """
    This very complex function implement a normalized, geometric adstock using scan in theano

    """

    def fn(x, prior_result, prior_norm, carry, prior_numerator, alpha):

        carry = carry * alpha
        new_norm = prior_norm + carry
        new_result = (x + alpha * prior_numerator) / prior_norm

        new_numerator = (x + alpha * prior_numerator)

        return new_result, new_norm, carry, new_numerator

    channel_adstock, updates = theano.scan(fn=fn,
                                           sequences=[xx],
                                           outputs_info=[tt.zeros_like(alpha), tt.ones_like(alpha),
                                                         tt.ones_like(alpha), tt.zeros_like(alpha)],
                                           non_sequences=alpha)

    return channel_adstock, updates

The two functions return the same result when running them using theano:

import theano
from theano import tensor as tt


alpha = tt.scalar("alpha",dtype='float64')
xx = tt.vector("X",dtype='float64')

channel_adstock,updates = geometric_adstock_theano(xx,alpha)

test_adstock = theano.function(inputs=[xx,alpha], outputs=channel_adstock, updates=updates)

#the two functions return the same result for the same value of alpha here 0.8
print(test_adstock([1.,0,0.,0, 0 , 0],.8)[0])
print(geometric_adstock_function([1.,0.,0.,0, 0, 0],.8))

However, when using PyMC3 to estimate the value of alpha for a known signal I obtain a totally ill estimate.

Is there anything wrong in my implementation of the geometric adstock in or in my PyMC3 code?

I would expect the estimate to be pretty good since the model is fairly simplistic

# defines groud truth for the original signal and the transformed one
true_alpha = 0.8
original = np.sin(np.linspace(0,np.pi*2*4,50))
transformed = geometric_adstock_function(original, alpha=true_alpha)

# tries to estimate alpha using pymc3
with pm.Model() as model:
    alpha = pm.Beta('alpha', .5, .5)
    channel_adstock, _ = geometric_adstock_theano(original, alpha)

    likelyhood =pm.Normal('y',mu=channel_adstock,observed=transformed,sd=.001)


#map estimate for alpha also very off
map_estimate = pm.find_MAP(model=model, method='L-BFGS-B',start=dict(alpha=.8))
print(map_estimate)

#using the sampling strategy
with model:
    trace = pm.sample(500, cores=2, tune=500)# use_mmap=True, nuts_kwargs=dict(target_accept=.99))  # draw 3000 posterior samples using NUTS sampling

result = pm.summary(trace)
print(result)
estimated_alpha = result.loc['alpha']['mean']
ppc = pm.sample_ppc(trace, samples=500, model=model)
mean_prediction = ppc['y'].mean(axis=0)[0]

# plotting some results 
plt.figure(figsize = (12,20))
plt.subplot(1,2,1)
plt.plot(geometric_adstock_function([1,0,0,0,0,0],estimated_alpha),'-')
plt.plot(geometric_adstock_function([1,0,0,0,0,0],true_alpha),'-')
plt.legend(['kernel estimated','kernel original'])
plt.subplot(1,2,2)
plt.plot(geometric_adstock_function(original,estimated_alpha),'-')
plt.plot(geometric_adstock_function(original,true_alpha),'-')
plt.plot(mean_prediction)

Thanks


#2

Hmmm if the theano function outputs the same as the numpy version it should be correct. Possible explanation is that the model is more difficult than you think. Did you see any warning from sampler?


#3

It does actually even if I increase the tuning for the sampler …

Sampling 2 chains: 100%|███████████████████████████████████████████████████████████████████████████████| 12000/12000 [01:33<00:00, 128.12draws/s]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6795954450888448, but should be close to 0.8. Try to increase the number of tuning steps.

This is my estimate

           mean        sd      mc_error   hpd_2.5  hpd_97.5        n_eff     Rhat
 alpha  0.000026  0.000017  5.235124e-07  0.000001   0.00006  1019.017598  1.00044

While I would like to have the mean close to 0.8.

What puzzles me is that funnily enough also the map estimate is close to zero, even when giving the starting point the true value for alpha

map_estimate = pm.find_MAP(model=model, method='L-BFGS-B',start=dict(alpha=.8))
print(map_estimate)

Returning

{'alpha_logodds__': array(-11.27276948), 'alpha': array(1.2714313e-05)}

Any suggestion how to reparametrize myabe ?
Why is the MAP estimate starting from the true minimum returning such a funny value ?


#4

It might not be the true minimal - usually you dont really see where the problem your model has until you try to visualize the whole parameter space. Some example here: Inferencing trigonometric time series model


#5

Ah found the problem :slight_smile: I knew it must have been a bug ! so basically, my implementation of the geometric adstock using theano.sca returns a list of theano variables. Only the first one is relevant and should be passed to the downstream Normal distribution. So basically the model should be constructed

with pm.Model() as model:
    alpha = pm.Beta('alpha', .5, .5)
    channel_adstock, _ = geometric_adstock_theano(original, alpha)

    likelyhood =pm.Normal('y',mu=channel_adstock[0],observed=transformed,sd=.001)

then the example works beautifully both in terms of MAP inference as well as Baysian sampling.

Thank you anyway!