Sanity check: How fast should sampling be?

mina · May 6, 2020, 11:12pm

I’m trying to understand what’s normal. This model hovers around 2 seconds per draw and which results in expected 1-3 hours ETA on the progress bar.

How long does sampling usually take for simple models on datasets of this size (1 million observations)?

import pymc3
import numpy as np
import pandas as pd


def standardize(x):
    return (x - x.mean()) / x.std()


if __name__ == "__main__":

    diamonds = pd.read_csv(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv"
    )
    diamonds = diamonds.sample(n=1_000_000, replace=True)
    diamonds = diamonds.assign(price_std=lambda x: standardize(x["price"]))
    print(diamonds)

    model = pymc3.glm.GLM.from_formula(
        "price_std ~ C(cut) + C(color) + C(clarity) + carat + depth + table + x + z",
        priors={"Intercept": pymc3.Normal.dist(), "Regressor": pymc3.Normal.dist(),},
        data=diamonds,
    )
    fit = pymc3.sample(init="adapt_diag", model=model,)

ckrapu · May 7, 2020, 2:29am

I think this is a reasonable sampling speed. I ran your script on my own computer and also got a projected runtime of 2 hours after finishing the burn-in phase.

nkaimcaudle · May 7, 2020, 10:30am

I also see about 1.5 expected total hours after 25% done so far.

A few points:

Within the formula you can have standardize(price) directly without manually calculating it
You could also try the log of price, np.log(price) within the formula
I dont know what x, y, z represent for diamonds. You’ve included x and z without y, is that on purpose?

Topic		Replies	Views
Sampling running very slowly for all models? Questions	1	831	April 27, 2020
Timeout on pymc3.sampling.sample Questions	1	883	November 13, 2019
Very long sampling time using simple model v5 sampling	1	47	October 30, 2024
Why does it take about 200 times longer to run the case code on a personal computer than the time shown on the web case?	5	318	January 9, 2024
Hierarchical Model - Slow Sampling ( 1.8 sec per draw ) - Primary suspect is how the multi-level hierarchy is coded Questions	5	838	November 2, 2021

Sanity check: How fast should sampling be?

Related topics