Hi,
I am new to PyMC but am trying to define my first model. The problem I would like to solve is as follows:
Buyers are viewing the products, but the potential purchase of the viewed products may be postponed for a few days (maximum 20 days). Moreover, the amount they pay when buying may be different than the price of the product at the time of viewing (because they later decide to buy more products at once or buy the same product on sale, etc.). I’d like to model the seller’s total revenue based on today’s views.
My training data includes past transaction and revenues data, but I would like to predict the total revenue for today.
I defined the simple model below:
# Vector of 0s and 1s - 1s when view ends with purchase (transaction), 0 otherwise
t = train['was_transaction']
# Vector of differences between revenue and price (defined only for views with purchase)
residues = (train['Revenue'] - train['Price'])[train['was_transaction']]
coords = {"view_index": np.arange(len(t)), "transaction_index": np.arange(len(residues))}
with pm.Model(coords=coords) as pooled_model:
observed_transactions = pm.Data("observed_transactions", t, dims="view_index")
observed_revenues = pm.Data("observed_revenue", residues, dims="transaction_index")
transaction_probability = pm.Normal("transaction_probability", mu=0.01, sd=0.1)
was_transaction = pm.Bernoulli(
"was_transaction",
p=transaction_probability,
dims="view_index",
observed=observed_transactions
)
mu = pm.Normal("mu", mu=70, sd=30)
revenue_residue = pm.Normal(
"revenue_residue",
mu=mu,
sigma=100,
dims="transaction_index",
observed=observed_revenues
)
After sampling, I have estimates for transaction_probability
and mu
. But now I would like to predict the overall revenue for today. I did this by defining a few additional variables:
with pooled_model:
viewed_product_prices = pm.Data("prices", test['Price'])
was_transaction_predict = pm.Bernoulli(
"was_transaction_predict",
p=transaction_probability,
shape=len(test)
)
revenue_residue_predict = pm.Normal(
"revenue_residue_predict",
mu=mu,
sigma=100,
shape=len(test)
)
overall_revenue = pm.Deterministic("overall_revenue", tt.sum(was_transaction_predict * (revenue_residue_predict + viewed_product_prices)))
posterior_predictive = pm.sample_posterior_predictive(
traces, var_names=["overall_revenue"], random_seed=999
)
This way I have posterior for overall_revenue
.
I don’t think this is the best way to make prediction. Is there any simpler way (without the need to define additional variables “was_transaction_predict” and “revenue_residue_predict”, which are duplicates of “was_transaction” and “revenue_residue”)? Maybe I can formulate the model differently?
Every remark is valuable to me.
Thanks in advance!