[PyMCon Web Series 01] The Power of Bayes in Industry: Your Business Model is Your Data Generating Process (Feb 9, 2023) (Dante Gates)

The Power of Bayes in Industry: Your Business Model is Your Data Generating Process

Speaker: Dante Gates

Event type: Live webinar
Date: Feb 9th, 2023 (subscribe here for email updates)
Time: 21:00 UTC (4pm EST)
Register for the event: Meetup event or Zoom
Notebook: On Colab

NOTE: The event will be recorded. Subscribe to the PyMC YouTube for notifications.

Content

Video: Interview with Dante Gates (8 minutes)

Video: The Power of Bayes in Industry

pymcon-dante-gates (2)


Welcome to the first event of the PyMCon Web Series! As part of this series, most events will have an async component and a live talk.

In this case, Dante, as part of the async component, prepared a Colab notebook for the community to engage in before the talk. Run it and answer the questions Dante left for discussion:

  • What is your favorite example of a Data Generating Process (DGP) / first principles model?
  • Have you applied the ideas in this post in industry?
  • What are some of the benefits we missed?

Abstract of the talk

This talk will attempt to answer the question “what is a Data Generating Process and why does it matter?” While we will begin our discussion with a bit of theory, don’t worry about this being too technical or inaccessible if you’re new to Bayesian Statistics. Our primary goal is to focus on the second half of the question and give you tools to use for real-world applications.

With the core concepts and background covered, we’ll demonstrate how incorporating this understanding into our modeling decisions allows us to embed elements of a business function directly into our statistical models and how this can provide immense value in industry settings, especially where traditional machine learning techniques fail, such as

  • The ability to tackle critical problems when data is lacking, like launching a new product

  • Building powerful, predictive models that are difficult to overfit

  • Explainability is built in, and it’s already expressed in the terms of your business

Best of all is that the design techniques we propose here are such that when you get one the benefits above, the rest usually come for free.

All of this and more will be illustrated through concrete examples found in both publicly available data as well as proprietary data we use here at Perpay.

6 Likes

Sounds great! How can I register for the event?

2 Likes

Thanks for the excellent colab notebook, it definitely cast the putting example in newer light.

I hope this is the right place to share my answers to the 3 questions:

  • Reasoning about how demand for a particular product/commodity evolves over time as a function of physical events that lead to conversion
  • Reasoning about the behavior of a distributed computing system (for eg kafka) which displays a mix of deterministic (software behaves deterministically, most of the time), and probabilistically (under heavy loads/spikes a distributed compute environment will behave stochastically, even though the behavior of the software itself is deterministic at an individual node level)
  • Several times, in principle, and in several domains, but without formalizing a DGP in code. Been looking to find the right software toolkit to be able to do this in a repeatale and reproducible manner and have been getting tripped up by a lack of understanding of how to model complicated multi-time step PGMs. Also find it difficult to express hybrid DGPs which are a mix of deterministic and probabilistic links using existing tooling.
  • Using a DGP to run realistic simulation on a business model can inform experimental design for estimating specific sensitivities of business metrics to policy variables
  • This can also plug into programmatic recommendations for business strategy pivots
2 Likes

Hi Gireesh!

We will send the link the week before the event to the mailing list. Edited the main post to make it more clear! We are not sharing it publicly to avoid zoombombing. Also you can subscribe to the Meetup event.

1 Like

Very intrigued already, thanks Dante. Given we are modelling a portfolio defaults, what is your view on how these methods guard from the pitfalls faced by the models informing portfolio default levels on subprime CDO portfolios in the GFC?

1 Like

In the APAC region, I see a lot of players in the same area as Perpay investing their DS budget to build user centric recsys which perform “credit recommendation”. Does the framework you have described sit on top of a pipeline that operates on a “per user” basis?

1 Like

Also very curious about what kind of infrastructure/automation you utilise in the context of using this model in a repeatable manner?

1 Like

Is there a way to quantify whether the model that we designed is correctly specified i.e., it represents the true data-generating process?

3 Likes

K/S Statistics, Hellinger Distance etc?

1 Like

Anonymous question at webinar:

How have you navigated divergences when building PyMc models? I have reguarly run into divergences when building models of defined and well-understood data-generating processes.

I have found bayesian methods very helpful in situations with small data where layering business knowledge can be be advantageous but have strugged to get full convergence (no divergences)

1 Like

I liked your example on defaults and appreciate the high level insights, but how could you use it to understand if any individual loan may default?

2 Likes

Does the process to be modelled have to be episodic? By that I mean, starts in the same initial condition and then ends after finite time t. Or can you still fit a model to an indefinitely long dynamic
process?

1 Like

Thanks.

1 Like

Hi Team,
I attended the talk by Mr. Dante Gates today. Very insightful talk!
I missed some portions of the talk and fill up my missing info.

  1. The Sketching tool that Mr. Dante used to describe the diagrams that converted into HTML code.
    I need the correct name of this tool.
  2. I missed the list of contents in the “Chat” or the "Q&A Box where there were some URL names listed there. I was not able to copy them before the meeting ended. Do you list the contents from the “Chat” /Q&A window in your reply?
    Thanks
    GL Srinivasan(GL)

I know Dante said he made the slides in Quarto. Not sure about the doodling, but you can ask him (I will get @cucho to ping him so he checks in here).

PyMCon Web Series information:

PyMCon
Link straight to PyMC’s rolling Call for Proposals
PyMC Meetup Group

​PyMC Information:

PyMC Documentation
PyMC Discourse (this site!)

Various channels for PyMC info:

Twitter
LinkedIn

1 Like

Thanks Mr. Cluhmann! I got it
GL

There are reams of literature about bayesian workflow and model checking.The arviz package automates some of the more popular model checking procedures (LOO, WAIC) on the basis of the model’s posterior predictive distribution.

Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., … & Modrák, M. (2020). Bayesian workflow. arXiv preprint arXiv:2011.01808..

1 Like