Hello, i need a help with modeling some simple stuff:
A have small and simple generative model, which create users with some dialogs (0-3), each dialog took some time from exponential dist.
A can’t figure out how create this dynamic shape (0-3), so for example i have 1000 users and each can create 0-3 dialogs… so i have dynamic dataset size.
import numpy as np
import numpy.random as rnd
import pandas as pd
rnd.seed(42)
users_count = 1_000
users_rct_prob = np.array([.1, .9]) # control and test group
dialog_prob = np.array([.4, .4, .15, .05]) # prob to write 0-3 dialogs, zero = not wrote
dialog_auto = np.array([.33, .33, .34]) # probs factor [automatize dialog, reduce duration, do nothing]
dialog_auto_mult = np.array([0, .33, 1]) # factor for dialog duration
dialog_duration_mean = 4 # param for duration
def generate_data():
dialogs_users_rct = rnd.choice(np.arange(len(users_rct_prob)), p=users_rct_prob, size=users_count)
dialogs_users_count = rnd.choice(np.arange(len(dialog_prob)), p=dialog_prob, size=users_count)
dialogs_rct = np.repeat(dialogs_users_rct, dialogs_users_count)
dialogs_count = np.repeat(dialogs_users_count, dialogs_users_count)
dialogs_user_id = np.repeat(np.arange(users_count), dialogs_users_count)
dialogs_auto = rnd.choice(dialog_auto_mult, p=dialog_auto, size=sum(dialogs_users_count))
dialog_duration = rnd.exponential(scale=dialog_duration_mean, size=sum(dialogs_users_count))
dialog_duration_observed = dialog_duration - dialog_duration * dialogs_auto * dialogs_rct
df = pd.DataFrame({
'user': dialogs_user_id,
'duration_true': dialog_duration,
'duration_obs': dialog_duration_observed,
'auto': dialogs_auto,
'group': dialogs_rct,
'count': dialogs_count,
})
return df
Here i used choice, but in real example it could be poisson rv.
I finking about create expotential rv with shape (1000, 4) and choose from it only that i need, but i don’t know how to do it.
So can you help me to create pymc model for this generative model?