Hello,

Hope you are having a nice day!

I am a PYMC beginner, and I am trying to wrap my head around using PYMC to compare two data groups, in the context of A/B testing.

Set-up:

I have two groups of data - group A and group B. Group A has 400 datapoints, group B has 390. Each datapoint is a single float number representing revenue from that particular datapoint. This data is stored in pandas DataFrames called `data_A`

and `data_B`

.

I want to build a simple model that would allow me to compare the posteriors of these two groups, and make claims like â€śGroup A has higher revenue than Group B X% of the time, and distribution of the difference between the two groups looks like thisâ€ť.

In order to do that, I define a simple model:

```
with pm.Model() as revenue_model:
sigma_A = pm.HalfNormal("sigma_A", 1000)
sigma_B= pm.HalfNormal("sigma_A", 1000)
mean_A = pm.Normal("mean_A", mu=5000, sigma=1000)
mean_B = pm.Normal("mean_B", mu=5000, sigma=1000)
revenue_A= pm.Normal('revenue_A', mu=mean_A, sigma=sigma_A, observed = data_A['revenue'])
revenue_B= pm.Normal('revenue_B', mu=mean_B, sigma=sigma_B, observed = data_B['revenue'])
trace = pm.sample()
```

I can then plot the trace to see the posterior sigma and mean parameters for A and B. All four will have chains x draws shape (in my case with defaults itâ€™s 4 x 1000).

But when I draw samples from my posterior predictive, I get revenue_A to have a shape of 4 x 1000 x 400, and revenue_B to have a shape of 4 x 1000 x 350. I understand that PYMC draws a sample from posterior predictive for each observed datapoint, hence why the posterior predictives have that shape.

However, I canâ€™t compare these (e.g. take one from another to get a distribution of difference) because they are different shapes.

Two questions:

- Is this the right approach to answer the question I am asking?
- If so, how could I go about comparing posterior distribution of the revenue?

Thank you very much in advance for you advice and guidance!

**edit** Any suggestions about â€śbest practicesâ€ť and structuring my code differently are most welcome too!