Welcome to the first event of the PyMCon Web Series! As part of this series, most events will have an async component and a live talk.
In this case, Dante, as part of the async component, prepared aColab notebook for the community to engage in before the talk. Run it and answer the questions Dante left for discussion:
What is your favorite example of a Data Generating Process (DGP) / first principles model?
Have you applied the ideas in this post in industry?
What are some of the benefits we missed?
Abstract of the talk
This talk will attempt to answer the question “what is a Data Generating Process and why does it matter?” While we will begin our discussion with a bit of theory, don’t worry about this being too technical or inaccessible if you’re new to Bayesian Statistics. Our primary goal is to focus on the second half of the question and give you tools to use for real-world applications.
With the core concepts and background covered, we’ll demonstrate how incorporating this understanding into our modeling decisions allows us to embed elements of a business function directly into our statistical models and how this can provide immense value in industry settings, especially where traditional machine learning techniques fail, such as
The ability to tackle critical problems when data is lacking, like launching a new product
Building powerful, predictive models that are difficult to overfit
Explainability is built in, and it’s already expressed in the terms of your business
Best of all is that the design techniques we propose here are such that when you get one the benefits above, the rest usually come for free.
All of this and more will be illustrated through concrete examples found in both publicly available data as well as proprietary data we use here at Perpay.
Thanks for the excellent colab notebook, it definitely cast the putting example in newer light.
I hope this is the right place to share my answers to the 3 questions:
Reasoning about how demand for a particular product/commodity evolves over time as a function of physical events that lead to conversion
Reasoning about the behavior of a distributed computing system (for eg kafka) which displays a mix of deterministic (software behaves deterministically, most of the time), and probabilistically (under heavy loads/spikes a distributed compute environment will behave stochastically, even though the behavior of the software itself is deterministic at an individual node level)
Several times, in principle, and in several domains, but without formalizing a DGP in code. Been looking to find the right software toolkit to be able to do this in a repeatale and reproducible manner and have been getting tripped up by a lack of understanding of how to model complicated multi-time step PGMs. Also find it difficult to express hybrid DGPs which are a mix of deterministic and probabilistic links using existing tooling.
Using a DGP to run realistic simulation on a business model can inform experimental design for estimating specific sensitivities of business metrics to policy variables
This can also plug into programmatic recommendations for business strategy pivots