# Dealing with random missing values in a GLM model

hey everyone,
I was digging around the [documentation][Bayesian Missing Data Imputation — PyMC example gallery] and trying to understand how to model my problem but I couldn’t understand how I should fix a missing data problem.
The model is the following, I have a linear regression with 5 variables and 5 beta coefficients, there is some sparse experimental data that provide me a good estimations (for a very small amount of rows) some values for beta, I tried to model the problem using the following snippet:

``````import pymc as pm
import arviz as az
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
n = 100
k = 5
X = np.random.normal(size=(n, k))
beta_true = np.random.lognormal(size=k)
Y = np.dot(X, beta_true) + np.random.normal(size=n)

df = pd.DataFrame(X, columns=[f'x{i}' for i in range(1, k+1)])
df['y'] = Y

# Assume experimental_beta is a DataFrame of the same size as X that contains the observed beta values
experimental_beta = pd.DataFrame(np.random.normal(size=(n, k)), columns=[f'beta{i}' for i in range(1, k+1)])
experimental_beta[np.random.rand(*experimental_beta.shape) < 0.5] = np.nan # drop 80% of values

with pm.Model() as model:
beta = [pm.Normal(f'beta{i}', mu=0, sigma=10, observed=experimental_beta[f'beta{i}']) for i in range(1, k+1)]
sigma = pm.HalfNormal('sigma', sigma=1)
mu = pm.math.dot(df[df.columns[:-1]], beta)
mu_sum = pm.math.sum(mu, axis=1)  # Sum over columns
Y_obs = pm.Normal('Y_obs', mu=mu_sum, sigma=sigma, observed=df['y'])
trace = pm.sample(2000, cores=8)
``````

however, pymc automatically starts to make an imputation distribution generating one possible distribution for each missing value, in my model what I want is that all the missing values (and the non-missing of course) come from the same distribution, how can I model that problem?
thanks !!!