In the proposed methodology, reda counts (RNA sequencing) are analyzed using a negative binomial distribution model, as the observed variance significantly exceeds the mean. To evaluate the model’s performance in predicting read count parameters, a single-group scenario is considered. Noise reduction in read count prediction is the primary objective, and to achieve this, a synthetic dataset is created using the following code:
xreal = np.random.randint(1, 50, 10)
By incorporating plausible values for the parameters alpha_noise
, beta_noise
, gamma_noise
, and the dispersion rj
, the predicted read count is obtained:
alpha_noise = 0.7
beta_noise = 0.3
gamma_noise = 0.25
# dispersion
rj = 5
xpred = xreal * alpha_noise + xreal * beta_noise + xreal * gamma_noise
xExpirm = pm.NegativeBinomial.dist(mu=xpred, alpha=rj).eval()
To account for the absence of real counts prior to noise addition, xi parameters are incorporated into the model. Furthermore, a regularized coefficient is included to address potential differences in noise between parameters. Prior information for the noise parameters is as follows: alpha_noise
ranges from 0 to 1, beta_noise
varies between 0.5 and 0.25, and gamma_noise
has no specific knowledge, falling within the range of 0 to 1. Additionally, the dispersion parameter lacks substantial prior information.
After running the model, the predictions obtained are unsatisfactory. Could anyone provide guidance or suggestions on potential errors or improvements that could be made to enhance the model’s performance?
with pm.Model() as model:
alpha_noise = pm.Beta("alpha_noise", 3, 2)
beta_noise = pm.Beta("beta_noise", 3, 2)
gamma_noise = pm.Beta("gamma_noise", 3, 2)
# Prior for the unknown xi values
xi = pm.Uniform("xi", lower=0, upper=np.max(xExpirm), shape=len(xExpirm))
# Regularized coefficients
reg_param = 0.1 # You can adjust this parameter based on your knowledge about the problem
reg_alpha = pm.Laplace("reg_alpha", mu=alpha_noise, b=1 / reg_param)
reg_beta = pm.Laplace("reg_beta", mu=beta_noise, b=1 / reg_param)
reg_gamma = pm.Laplace("reg_gamma", mu=gamma_noise, b=1 / reg_param)
count_data_est = reg_alpha * xi + reg_beta * xi + reg_gamma * xi
r = pm.Exponential("r", 0.5)
obs_data = pm.NegativeBinomial("obs_data", mu=count_data_est, alpha=1/r, observed=xpred)
trace = pm.sample(15_000, tune=20_000,target_accept = 0.9)