I created a PyMC GPT on OpenAI!

ulfaslak · January 10, 2024, 2:45pm

As most do these days, I run most simple problems by ChatGPT to code faster. It’s great for boilerplate stuff, but it produces dangerous hallucinations when you start asking questions that require knowledge of framework APIs.

So I created a custom GPT, which is instructed to only answer questions about PyMC. I’ve provided the GPT with ~200.000 lines of documentation (just wrote an ugly little depth-first recursive scraper and ran it on the example gallery, the “Learn” pages and API docs).

The PyMC GPT is here. Happy to get feedback. Been using it myself a little bit and it’s not perfect, but it definitely does a better job at answering PyMC questions than the default GPT4 model.

jessegrabowski · January 10, 2024, 5:44pm

Cool! Have you considered adding the discourse to the training corpus? It’s quite easy to scrape, you can just add .json to the urls and you get everything you need back. I think the back-and-forth nature of the dialogue would be a good fit for training an instruct model.

jaharvey8 · January 10, 2024, 6:39pm

Is the intent here that you’ll definitely need to pay for chatGPT+ to use this?

ulfaslak · January 11, 2024, 7:42am

Thanks! Didn’t know about the .json thing, will definitely do this!

ulfaslak · January 11, 2024, 7:45am

Yes, I should have added that as a disclaimer. It’s based on GPT4, and I don’t think you can use that without a subscription.

ulfaslak · January 11, 2024, 9:26am

OK spent 10 minutes on it and I can’t find a simple way to get a list of all topics. You know how to do that?

jessegrabowski · January 11, 2024, 9:33am

They’re just sequential numbers. Also the title name isn’t required. For example you can get to this thread by using https://discourse.pymc.io/t/13612. A loop over a the numbers will never 404, so you can just check the errors field of the return json and skip if it says “The requested URL or resource could not be found.” For example there’s no https://discourse.pymc.io/t/1.

ulfaslak · January 11, 2024, 9:57am

Of course . Adding Arviz docs now and this will be up soon too!

Geoff_Nordling · January 17, 2024, 1:23pm

Very cool. This is trivial, but it’s funny to test: Regular ChatGPT 4 and your new GPT give very different answers when asked “what is bambi?”

leventov · February 26, 2024, 7:55pm

@ulfaslak thanks a lot for creating this project! I think such an LLM-based assistant is very important to grow the adoption of PyMC and reduce the initial barrier. I think the project should be promoted on the PyMC website and in the docs.

I’ve had a conversation with PyMC GPT about hierarchical modelling in the battery cell manufacturing domain: https://chat.openai.com/share/c9444b2c-0923-4b4e-b29e-4b4e4a3ced3a. It looks decent to me, although often answers were overly vague. The only blunder that I noticed is the suggestion to use Theano for vectorisation instead of PyTensor. But I’m unfamiliar with PyMC, I’m sure there are more.

ulfaslak · February 27, 2024, 1:54pm

I’m glad you like it .

It’s not perfect though. I’m keeping the docs I have scraped for ingestion in another AI later and will of course update in this thread when I do that.

The main problem with GPT4 is that it doesn’t actually consume the provided knowledge to update its state, it relies on RAG (retrieval augmented generation) to produce answers. Those retrievals happen when the UI displays a “Searching Knowledge” icon, and is not really different from e.g. Bing searches.

Google Gemini has a 10M context window, which could actually fit all of these docs, but I think OpenAI will probably beef up their custom models quite soon, so stay tuned for updates.

Topic		Replies	Views
Let's introduce ourselves!	35	5071	February 19, 2019
Introducing PyMC-Bot: An AI Assistant for PyMC Questions (Alpha) Announcements	2	482	April 22, 2025
Gausian Mixture Model takes too long to sample v5 modeling	15	643	November 6, 2023
PyMC Docathon - Elevate Open Source Documentation! Events	0	269	November 6, 2023
New PyMC and Aesara Tutorial Sharing aesara	0	507	June 7, 2022

I created a PyMC GPT on OpenAI!

Related topics