Sorry for replying to this slowly. I took a crack at it a couple times, but I never really thought I hit the mark. I will try again, and I hope others can chime in if I do a poor job.

When you study variation between units with a linear model, you are holding certain aspects of the units fixed, and comparing the variation in the remaining attributes. For example, when you do a regression like \text{Height}_i = \beta_0 + \beta_1 \cdot \text{Female}+ \beta_2 \cdot \text{Weight} + \epsilon_i, you are asking, “what variation in height exists between females of the same weight, given the sample average height?” The fact that variation exists in this dimension allows the parameters \beta_0, \beta_1, \beta_2 to be identified.

So the key idea to all these models, whether it’s boring frequentist OLS with linear algebra or exciting hierarchical models with MCMC, is “what is the remaining variation in the data given a list of factors held constant”.

This should now answer your question about GDP and country. If you have 3 countries numbered 1, 2, 3, and write a model, GDP_i = \alpha_0 + \alpha_1 + \alpha_2 + \alpha_3 + \epsilon_i, I hope you can see that there is no remaining variation in the data from which to measure \epsilon_i, and the model is not identified.

So let’s connect this to your actual problem. From each of these three countries, you have 10 customers, each of whom either purchased or didn’t. You include the “country fixed effects”, or hierarchical intercepts in the Bayesian jargon, and estimate a model:

\text{Purchased}_i = \alpha_0 + \sum_{j\in\mathcal{J}}\alpha_j \cdot \left [ \text{Country}_i = \text{Country}_j \right ] + \beta X_i + \epsilon_i

Where \mathcal J is the set of all countries, and \left [ x = y \right ] is an indicator function that evaluates to 1 if true and 0 otherwise. The key insight is that these variable intercepts in the country dimension capture **everything** that varies between countries, **everything** that could be written as I wrote that GDP equation about. Inside these terms is the GDP, but also language, history, culture, institutions – everything that makes a country a country is bundled up and given to you in the \alpha_i term.

The core theoretical idea underpinning all this it might be helpful to know (or not) is the Frisch–Waugh–Lovell theorem, which states that multivariate linear regressions do **conditional** variance decomposition. The \beta vector from the formula above is **not** the effect of the X_i on \text{Purchased}_i, it is the effect of X_i on purchased *given the* \alpha_j*s*! In essence, it is the same as having first done the GDP regression I presented above on every variable in X_i first, then finally run the Purchased regression omitting the \alpha s (this is the point about the M_x matrix in the wiki article, which I linked despite being impenetrably dense in typical wiki fashion; my apologies).

So the short answer to to your question of how to use GDP in this setup is “you can’t”, but the longer answer is “you don’t need to, because the country-specific intercepts already do it”. If you insisted on the issue, for example because you want to know the causal effect of economic growth on consumer activity, then you either have to turn to some clever psuedo-experimental design like instrumental variable regression, or add a time dimension so you can exploit inter-temporal variation in GDP.

The final approach is to use extremely informative priors. All the above conversation is mostly grounded in frequestist statistics, which is at the mercy of the cruel mistress of matrix inversion. In Bayes, we have a bit more flexibility with identification. At core the principles remain the same - you are trying to exploit variation in the residuals to identify fixed sub-spaces in the data - but everything is more “fuzzy” because you get to inject some additional variation via the priors. If you write the regression:

\text{Purchased}_i = \alpha_0 + \sum_{j\in\mathcal{J}}(\alpha_j + \gamma \cdot \text{GDP}_j) \cdot \left [ \text{Country}_i = \text{Country}_j \right ] + \beta X_i + \epsilon_i

Setting a strong prior on \gamma might let you achieve identification and get posterior samples, despite the lack of variation in the data.