I am doing seismological inversion recently, and pymc helped me a lot. Thanks for providing such a good computing tool. I encountered some problems in practical application. As a beginner, I would like to ask you some questions:

My sampling target is the two-dimensional space of speed-depth. Using Thiessen polygons, I can use three points in this two-dimensional space to build a three-layered velocity model. The horizontal coordinate of each point represents the speed of the layer, and the average of the adjacent vertical coordinates represents the depth of the layer. Each model has a unique identifier. Yesterday I asked how to use pymc to generate an increasing depth sequence. Although it is indeed possible, the mcmc result is not very ideal. I wonder if there is a way to directly sample 3 points (depending on the prior information) directly from the two-dimensional space of speed-depth to calculate the posterior probability under this speed model. Through continuous sampling, the final speed-depth two-dimensional probability is formed (as shown in the figure).

In addition, I would like to ask if the number of model layers can be added to the sampling in this idea, such as a range of 2-4. Because the final result is the two-dimensional space of speed-depth, the overall dimension of the model has not changed.

Hi!

Iām not 100% sure what you mean. You can of have any number of dimensions you like in an observed data. For example, here is an example of using the broadcasting rules to fit a stack of independent regressions all at the same time. Or here is a general example showing how to fit data with a Multivariate Normal.

It would be useful if you could post a little bit more about your model. Do you have a generative model (in math, I mean) that you could share, or a minimal pymc or numpy code snippet showing how you would generate artificial data? Either of these would help clarify your question.

1 Like

So what doesnāt this model do that you want it to do? Everything seems to be in order to me.

I donāt understand whether `his model`

here refers to the code above or rffw.rffw?

My goal is to obtain the dimensions and probability distribution of vs and depth from the observed rf_matrix and amp data. My current method of building speed and depth models makes the convergence very slow. I hope to directly obtain the posterior distribution of velocity-depth by sampling directly from the velocity-depth space. Do you have any methods and suggestions? Can pymc.Mixture set the dimension (For example, the number of layers) of vs and depth for sampling via the broadcasting rules ? Thanks so much.

Should have been āthis modelā, e.g. you PyMC model.

You are sampling both velocity and depth in the model. Do you want to do that jointly? Or do want to be able to have a dynamic number of variables? The 2nd isnāt possible, you need specialized MCMC samplers to handle random variables with random dimensions. What you could do is specify a reasonable maximum then try to disable contributions from depth above a certain level, see here for example.

I understand. Thanks for your answer.

I still have two questions:

- How to jointly? Is it possible to embed the model in both the first and second ways (jointly and specify a reasonable maximum)?
- Can I include the depth and velocity dimensions as regularization priors in the coefficient matrix? If so how can I implement this in pymc?

@bob-carpenter Hi, Do you have any suggestions on this topic?

Reversible jump MCMC for variable numbers of parameters used for Dirichlet process (DP) modeling of number of clusters is unlikely to work in practice. Clustering is hard enough already as a combinatorial problem. One thing you can do is take a large number of clusters relative to the data and then you can approximate the DP model. It will still be a challenge to fit, especially if the data is high dimensional and heterogeneous.

It is indeed difficult to get good cross-dimensional convergence results. By the way, pymc seems to have difficulty in performing Reversible jump MCMC now.

Hi, Is the jointly you mentioned the same as this example? @jessegrabowski

You could use a copula but they are quite difficult to work with. I would suggest just starting with a multivariate normal.

For regularization you can use a Laplace prior (which strong peaks on zero), or some flavor of regularization prior. I like the continuous spike-and-slab proposed here that seems to work better with NUTS than traditional choices like horseshoe. Example here (check the last 2 sections).

Thanks for reply.

Is copula significantly better than multivariate normal in terms of performance?

Copulas are just schemes to draw correlated draws from arbitrary distributions. Thereās no ābetterā, just different. Itās like asking if a Beta prior is more performant than a Normal prior.

I see, thank you. That is indeed the case, multivariate normal will be much more convenient.