So I did the lognormal model and kinda works. The results are disappointing though.
Data generation:
import numpy as np
import pymc3 as pm
import scipy.special as ss
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import theano.tensor as tt
corr=.8
_sigma=np.array([[1,corr],[corr,1]])
_mus=np.array([-1,2])
_sigmas=np.array([.5,.5])
N=10000
u=np.random.multivariate_normal((0,0),_sigma,size=N)
c=sp.stats.norm.cdf(u)
generated=sp.stats.lognorm.ppf(c,_sigmas,scale=np.exp(_mus))

Model:
def lognormal_cdf(values,mu,sigma):
return pm.math.invprobit((tt.log(values)-mu)/sigma)
with pm.Model() as model:
#hackish way to create cholesky mat. that yields a covariance mat. with 1s on the diagonal and corr anywhere else
corr = pm.LKJCorr('corr',n=2, eta=3.5)[0]
chol = tt.set_subtensor(tt.eye(2,2)[([1,1], [0,1])],[corr,tt.sqrt(1-tt.pow(corr,2))])
# r = pm.Uniform('r',lower=-1, upper=1)
# cov = pm.Deterministic('cov',
# tt.stacklists([[1., r],
# [r, 1.]]))
mus=pm.Normal('mus',0,1,shape=2)
sigmas=pm.HalfNormal('sigmas',1,shape=2)
ps=lognormal_cdf(generated,mus,sigmas)
cs=pm.math.probit(ps)
us_dist = pm.MvNormal('copula',mu=np.zeros(2), chol=chol,observed=cs)
obs= pm.Lognormal('obs',mu=mus,sd=sigmas,observed=generated)
trace=pm.sample(1000)
Trace:
As you can see correlation is higher than true value (~0.92 vs 0.8), also sigma is higher (~0.85 vs 0.5)
If i simulate the posterior bivariate lognormals I get:
sample=pm.sample_ppc(trace,model=model,vars=[us_dist,mus,sigmas],samples=10000)
sumulated_c=sp.stats.norm.cdf(sample['copula'])
sumulated=sp.stats.lognorm.ppf(sumulated_c,sample['sigmas'],scale=np.exp(sample['mus']))

I’ve tried to use LKJCorr with high eta instead of uniform for correlation but results were similar.
Any idea on how to improve results?
@junpenglao : Do you remember why (in your notebook) you choose to input the observed values twice in the models? (i.e. once the transformed values as an observed multivariate normal, once the untrasformed as the observed marginals).
It makes sense and helps a lot the model (I tried without doing this trick and the model is even worse) but i guess the likelihood of the model becomes a “mixture” between a proper copula one and a model that has independent marginals. Am i missing something?
