Random Intercepts and Slopes (Correlated) Convergence Issues

I did find one mistake in the code. But even with this correction, the estimates (especially for the correlations) are off (the first rho should be -0.3 ish not positive 0.4). I can get “correct” estimates using brms or rstanarm so I must be doing something wrong in terms of the pymc3 code?

id_idx=wages.new_id.values  #array of the group IDs
N_ids=len(wages.id.unique()) # number of unqiue group IDs

dim_corr=3 #dimension of the correlation matrix (int, slope for exper and slope for hgc.9)



with pm.Model() as m_wages:
    
    #distribution LKJCholeskyCov pulls the needed number of standard deviations
    #LKJCholeskyCov will extract as many as needed
    sd_dist = pm.HalfCauchy.dist(beta=2) 
    
    #eta is the parameter of how informative the prior is (figure 13.3 of rethinking) for the correlation
    # n is the dimension of the covariance matrix (here 3 x 3)
    packed_chol = pm.LKJCholeskyCov('chol_cov', eta=1, n=dim_corr, sd_dist=sd_dist)
    
    
    #n  The number of rows of the triangular matrix - corresponds to dimension of the corr matrix above
    #lower : If true, assume that the matrix is lower triangular.
    chol = pm.expand_packed_triangular(n=dim_corr, packed=packed_chol, lower=True)

    #The cholesky decomposition only includes L where S=LL.T so this gets to the full covariance matrix
    cov = tt.dot(chol, chol.T)
    
    
    # Extract the standard deviations and rhos#################################################################
    ##########################################################################################################
    
    #The diagonal of the covariance matrix are the variances
    sigma_id = pm.Deterministic('sigma_id', tt.sqrt(tt.diag(cov)))
    #these are the variance components (intercept and the 2 slopes)
    variance_components=pm.Deterministic('variance_components',tt.power(sigma_id,2))
    
    #This converts to correlation matrix
    corr = tt.diag(sigma_id**-1).dot(cov.dot(tt.diag(sigma_id**-1)))
    # the '3' (dim_corr) here says how many indicies to pull out
    r = pm.Deterministic('Rho', corr[np.triu_indices(dim_corr, k=1)])
    
   
    
    #can use chol or covariance matrix directly according to docstring
    #This is the distribution of random effects : intercepts and slopes for all the observations in the data
    ab_id = pm.MvNormal('ab_id', mu=0, chol=chol, shape=(N_ids, dim_corr)) # Population of varying effects
    # Shape needs to be (N_ids, 3) because we're getting back both a and b for each group
    
    #the grand intercept and the 2 other fixed effects
    # these are assumed independent from the random effects
    fixedEffects = pm.Normal('fixedEffects', 0., 10., shape=3) 
    
    
    #linear model
    #order here will dictate order for covariance matrix
    '''
                intercept  exper  hgc.9
    intercept
    exper
    hgc.9
    
    '''
    A =  fixedEffects[0] + ab_id[id_idx, 0] #this is PIE_0_i in Singer/Willett book (fixed intercept + random intercepts)
    BE = fixedEffects[1] + ab_id[id_idx, 1] #this is PIE_1_i
    BH = fixedEffects[2] + ab_id[id_idx, 2] #this is PIE_2_i
    
    
    mu = A +  BE * wages['exper'].values +BH * wages['hgc.9'].values # linear model
    sd = pm.HalfCauchy('sigma', beta=2)      # prior stddev within ids (the error term sigma_e in Singer / Willet)
    resid_var=pm.Deterministic('resid_var',tt.power(sd,2)) 
    
    lnw = pm.Normal('lnw', mu=mu, sd=sd, observed=wages['lnw'])  # likelihood
    trace_wages = pm.sample(10000, tune=10000)