In the proposed methodology, reda counts (RNA sequencing) are analyzed using a negative binomial distribution model, as the observed variance significantly exceeds the mean. To evaluate the model’s performance in predicting read count parameters, a single-group scenario is considered. Noise reduction in read count prediction is the primary objective, and to achieve this, a synthetic dataset is created using the following code:
xreal = np.random.randint(1, 50, 10)
By incorporating plausible values for the parameters
gamma_noise, and the dispersion
rj, the predicted read count is obtained:
alpha_noise = 0.7 beta_noise = 0.3 gamma_noise = 0.25 # dispersion rj = 5 xpred = xreal * alpha_noise + xreal * beta_noise + xreal * gamma_noise xExpirm = pm.NegativeBinomial.dist(mu=xpred, alpha=rj).eval()
To account for the absence of real counts prior to noise addition, xi parameters are incorporated into the model. Furthermore, a regularized coefficient is included to address potential differences in noise between parameters. Prior information for the noise parameters is as follows:
alpha_noise ranges from 0 to 1,
beta_noise varies between 0.5 and 0.25, and
gamma_noise has no specific knowledge, falling within the range of 0 to 1. Additionally, the dispersion parameter lacks substantial prior information.
After running the model, the predictions obtained are unsatisfactory. Could anyone provide guidance or suggestions on potential errors or improvements that could be made to enhance the model’s performance?
with pm.Model() as model: alpha_noise = pm.Beta("alpha_noise", 3, 2) beta_noise = pm.Beta("beta_noise", 3, 2) gamma_noise = pm.Beta("gamma_noise", 3, 2) # Prior for the unknown xi values xi = pm.Uniform("xi", lower=0, upper=np.max(xExpirm), shape=len(xExpirm)) # Regularized coefficients reg_param = 0.1 # You can adjust this parameter based on your knowledge about the problem reg_alpha = pm.Laplace("reg_alpha", mu=alpha_noise, b=1 / reg_param) reg_beta = pm.Laplace("reg_beta", mu=beta_noise, b=1 / reg_param) reg_gamma = pm.Laplace("reg_gamma", mu=gamma_noise, b=1 / reg_param) count_data_est = reg_alpha * xi + reg_beta * xi + reg_gamma * xi r = pm.Exponential("r", 0.5) obs_data = pm.NegativeBinomial("obs_data", mu=count_data_est, alpha=1/r, observed=xpred) trace = pm.sample(15_000, tune=20_000,target_accept = 0.9)