Using Hierarchical Models in Instrumental Variable Analysis for Advertising Effectiveness by Ruben Mak

Talk Abstract

Due to unobserved confounders, users are often exposed to too many repetitive ads. We will show how we use instrumental variable analysis to prove this is ineffective for advertisers. The focus of the talk will be choosing the model assumptions and how to implement them in pymc3. Finally, we show how hierarchical modelling can be used to combine these models.

Ruben Mak GitHub rubenmak


Ruben Mak

Back in 2012, Ruben introduced data science at Greenhouse, a digital advertising agency in the Netherlands. He is currently principal data scientist and cluster lead. He’s given several talks at PyData conferences and is one of the founders of PyData Eindhoven.

This is a PyMCon 2020 talk

Learn more about PyMCon!

PyMCon is an asynchronous-first virtual conference for the Bayesian community.

We have posted all the talks here in Discourse on October 24th, one week before the live PyMCon session for everyone to see and discuss at their own pace.

If you are available on October 31st you can register for the live session here!, but if you are not don’t worry, all the talks are already available here on Discourse (keynotes will be posted after the conference) and you can network here on Discourse and on our Zulip.

We value the participation of each member of the PyMC community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events. Everyone taking part in PyMCon activities must abide by the PyMCon Code of Conduct. You can report any incident through this from.

If you want to support PyMCon and the PyMC community but you can’t attend the live session, consider donating to PyMC

Do you have suggestions to improve PyMCon? We have an anonymous suggestion box waiting for you

Have you enjoyed PyMCon? Please fill our PyMCon attendee survey. It is open to both async PyMCon attendees and people taking part in the live session.


Thanks for the interesting talk, I am curious to know more about how the theano random stream doesnt work for you.

Thanks for the question!

To be honest, it was in a different project which I can’t unfortunately share in public. Also, I tried several things and gave up at some point, so it isn’t really well documented. However, here’s a screenshot of one of my last attempts in that project:

Which gives:

In previous tries I had ‘chain x failed’ errors. Any suggestion would be welcome or if you could point me to some example implementations where something similar is done (i.e. incorporating estimation error / variance from earlier stage estimation). Also, in this case we need to sample from the beta distribution which isn’t in the default distribution in Theano random streams if I’m not mistaken. So probably I would need to do a transformation from uniform distribution. Haven’t really thought much about how to do that exactly so any suggestion would be welcome!

I have tried to put randomstream in a model ages ago, which seems to work fine:

Just to confirm, have you try putting a tt.print in the model block so it prints the rng value? I guess a risk here is that the gradient is broken when a rng is added to the model

Many thanks for your reply! It definitely has to do something with specific model specification and indeed seems to be breaking some gradients. By the way, for this specific application my ‘bootstrapping approach’ shows we don’t need to worry that much about estimation error from the first stage. But sure would be interesting from a more technical perspective to look into some time!

1 Like

Hey @Rubenmak I just went through the talk and I found it incredibly helpful in my understanding of hierarchical models. Thank you for putting it together.

I did have one question about the model.

In each of the models you had something to the degree of:

p = sum(beta[i] * x_hat[:, i] for i in range(max_cap))

I’m wondering why you would sum these values? I’m trying to reason about this logic, as x_hat is is just the proportion of users who saw a given number of impressions for all the cap levels. So how would the sum of those values be helpful for the analysis? As would would we not want to just pass in the raw proportion values, not the sum of their proportions?

Thank you again for your work!