Priors of Great Potential - How You Can Add Fairness Constraints to Models Using Priors by Vincent D. Warmerdam & Matthijs Brouns

Talk

Vincent D. Warmerdam

Vincent likes to spend his days debunking hype in ML. He started a few open source packages (whatlies, scikit-lego, clumper and evol) and is also known as co-founding chair of PyData Amsterdam. He currently works at Rasa as a Research Advocate where he tries to make NLP algorithms more accessible.

Matthijs Brouns

Matthijs is a data scientist, active in Amsterdam, The Netherlands. His current work involves training junior data scientists at Xccelerated.io. This means he divides his time between building new training materials and exercises, giving live trainings and acting as a sparring partner for the Xccelerators at his partner firms, as well as doing some consulting work on the side.

Matthijs spent a fair amount of time contributing to his open scientific computing ecosystem through various means. He maintains open source packages (scikit-lego, seers) as well as co-chairs the PyData Amsterdam conference and meetup and vice-chair the PyData Global conference.

In his spare time he likes to go mountain biking, bouldering, do some woodworking or go scuba diving.


This is a PyMCon 2020 talk

Learn more about PyMCon!

PyMCon is an asynchronous-first virtual conference for the Bayesian community.

We have posted all the talks here in Discourse on October 24th, one week before the live PyMCon session for everyone to see and discuss at their own pace.

If you are available on October 31st you can register for the live session here!, but if you are not donā€™t worry, all the talks are already available here on Discourse (keynotes will be posted after the conference) and you can network here on Discourse and on our Zulip.

We value the participation of each member of the PyMC community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events. Everyone taking part in PyMCon activities must abide by the PyMCon Code of Conduct. You can report any incident through this from.

If you want to support PyMCon and the PyMC community but you canā€™t attend the live session, consider donating to PyMC

Do you have suggestions to improve PyMCon? We have an anonymous suggestion box waiting for you

Have you enjoyed PyMCon? Please fill our PyMCon attendee survey. It is open to both async PyMCon attendees and people taking part in the live session.

4 Likes

Feel free to AskUsAnything[tm] here.

Just a few links that are relevant.

3 Likes

This is an excellent use case of pm.Potential, and I love the Fairness use case. Great work!

3 Likes

Awesome talk, so many gems, sketch over notebooks! The topic is fascinating. I am not a stats person but I had always pictured estimation as always trying to capture reality as best as possible. But this is using priors to encode how we want our model to be/behave (v pertinent atm) but I think this use of priors is only for when we have an algorithm that is making decisions for users and not so much for informing users to make decisions? I can imagine circumstances where constraining for fairness in estimation could hide it in the data? We do need these ā€˜nobsā€™ and other tools for quantitative reasoning about what we want and what a good decision is though. Would it be possible to get your model to include an interesting predictor, like investment in different fairness initiatives, highlight, rank etc the areaā€™s of most unfairness and then give us advice on the best bang for buck interventions against all those, for example? Or is there another good use case/mode Iā€™m ignoring?

Iā€™ll try to respond to some of the many topics here.

I am not a stats person but I had always pictured estimation as always trying to capture reality as best as possible.

Models, by definition, are different from reality. I can come up with many timeseries models that donā€™t reflect reality at all. You can approximate the seasonal effect of ice-cream sales with a taylor series despite the reality of ice-cream having nothing to do with it. A model doesnā€™t have to reflect reality in order to be useful.

but I think this use of priors is only for when we have an algorithm that is making decisions for users and not so much for informing users to make decisions?

It depends ā€œwhoā€ the user is. If the user is more of an analyst person who wants to learn from the coefficients of the model then this is different than if the user is the receiving end of the models prediction. In the case of the talk weā€™re indeed more concerned about the latter.

I can imagine circumstances where constraining for fairness in estimation could hide it in the data?

Could you give an example? Before you can apply these debiasing tricks you need to have data on sensitive attributes. That suggests that the act of finding constraints is also the act of making bias less hidden. Also: Iā€™m more concerned with bias hidden in the model actually. If the bias remains in the data thatā€™s to some extend ā€œfineā€ if we can guarantee itā€™s not in the model.

Would it be possible to get your model to include an interesting predictor, like investment in different fairness initiatives, highlight, rank etc the areaā€™s of most unfairness and then give us advice on the best bang for buck interventions against all those, for example? Or is there another good use case/mode Iā€™m ignoring?

What youā€™re suggesting here sounds like a proper comparative study. I understand that projects like fairlearn intend to investigate this too but most of the studies that I hear from seem to be academic. Iā€™m personally more interested in use-cases adopted by industry.

As far as mitigation techniques go, both Matthijs and myself have spoken about this if youā€™re interested in more background material. Thereā€™s a pretty wide array of techniques weā€™ve open sourced in scikit-lego that work differently than what we propose here.

3 Likes

Thanks for taking the time to respond to all that. It seems I have a whole new type of models to think and read about!

Thank you for a great talk.

Iā€™ve tried to reproduce the code from Matthijs part on my data. He imported fairness dem_par_score from metrics. This library does not have such metric. Have I missed something?

He says

we have a module called metrics here

so I presume itā€™s something defined locally?

Thatā€™s correct.
The method dem_par_score is defined here.
Thank you!