GSoC 2022 Project Discussion

Hi all,
I’d like to contribute to the below projects

  1. Increase support for batched multivariate distributions.
  2. Multi-output Gaussian Processes.

I’m familiar with Statistics and some general computing libraries in Python, such as Pandas, Numpy, Scikit-Learn, and git. I’d love to work on this.

Could you point me in the correct direction?



1 Like

@purna135 the title of the link doesn’t match the project the link points to. Can you clarify which of the two you are interested in?

1 Like

My apologies : (
By mistake, I used the incorrect title.

Please have a look at the modified link and title.

1 Like

I was looking through the gsoc-2022 organization list and couldn’t find PyMC as a participating organization.

Is PyMC taking part in GSOC this year?

PyMC is a sub-org under NumFOCUS umbrella. Since NumFOCUS is selected in this year’s organizations, PyMC will also be taking part in GSoC.

The main step here would be writing a proposal on your project of interest. The mentors can review and iterate to help you improve it. Refer to the Contributing guide setup by NumFOCUS on how to write a good proposal.



Thank you, @Sayam753 : )

Hi @OriolAbril, @ricardoV94, @Sayam753, could you perhaps advise me on the projects listed below or any prerequisites I should follow?

You should start reading any references and links that are in the idea description and requirements, if some of that were not clear, ask here tagging or asking us to tag the potential mentors. I am not a potential mentor on any project so I can’t help with anything project specific for now, and probably won’t be able to help until you get to documenting the new features.


Hello, @ricardoV94, @Sayam753, and @lucianopaz!

I’m following the instructions you provided, as described in issue 5383, and I’m hoping to finish this project over the GSoC term.
However, this is my first time applying to GSoC, so I’m a little lost.

Could you kindly advise me on how to prepare a proposal or any prerequisites I should be aware of?

a little bit of myself, I am a final-year student pursuing Master’s in Computer Applications. I took part in a PyMC sprint organized by Data Umbrella. Open source is something that I am very interested in. Some of my most recent commits in PyMC are #5583, #5505, #5459 and now I am working on #5076, #5005.


There are some resources about writing proposals in the numfocus gsoc repo (there is even a template proposal): gsoc/ at master · numfocus/gsoc · GitHub (shared previously by Sayam too) and in the summer of code website: Writing a proposal | Google Summer of Code Guides


Hello, Community!

Could you kindly share some resources for learning more about the Bellow Project and PyMC?
Increase support for batched multivariate distributions
I’d like to work on this project during GSoC 2022, but I’m not very good at Bayesian statistics.

To learn more about PyMC, there are tons of resources mentioned in the README file. To get started with the project, PTAL on this GitHub comment.


I have drafted a proposal for GSoC. Could you please take a look and let me know how I can make it better?

I’m not sure how to describe the “Schedule of Deliverables.” Kindly advise

Link to the proposal

1 Like

Hi Purna

I’ve checked your proposal and left some comments in the linked Google doc. To work on “Schedule of Deliverables” section, I suggest exploring current implementations of Ops and multivariate distributions. Doing so, can help you to estimate the deliverables by decomposing the work on Ops and distributions in manageable chunks.

Feel free to ask any further questions!
Thanks, Sayam

1 Like

Thank you a lot, @Sayam753
I’ll work on the feedback.

Predicting the time it will take to finish a task is very difficult, even for people who have worked on those topics for years. However, writing timelines and the timelines themselves are still extremely useful even if the times they say are not trustworthy.

Like Sayam said, writing a detailed timeline forces you to divide the work in chunks and to show you are aware of all the tasks that need to be completed to finish the project. Timelines also show you understand the dependencies between such tasks (and some timeline representations like the gnatt charts bring that to the extreme), i.e. you can’t start a task before the expected end time of all the tasks it depends on.


Sure @OriolAbril, Thank you very much for all of your help.

in Increase support for batched multivariate distributions · Issue #5383 · pymc-devs/pymc · GitHub
OrderedMultinomial and WishartBartlett distributions aren’t mentioned; do they also support batched dimensions?

The OrderedMultinomial is just a handy parametrization over the Multinomial, so it should already be supported.

The WishartBartlett uses the Wishart under the hood. We will probably remove or move those away, so don’t worry about them.