GSoC Project - 2023

Hi all,
I am Shreyas Singh, a former software developer at Accenture Japan and an Engineering Physics graduate from the Indian Institute of Technology (IIT) Roorkee. I had been inclined toward statistics and probabilistic programming since my undergraduate years when I pursued a minor in Mathematics. I have been involved with PyMC and Pytensor for a few weeks and I’ve made four contributions so far, all of which have been merged.

I wish to work on the project ‘Support automatic derivation of arbitrary censoring logp’ for this GSoC. I went through a few examples where pm.Censored works currently with Censored regression and I’m in the process of understanding how the log-probability graph is generated for censored and non-censored parameters in PyMC. I then intend to go through the parsing of graphs in Pytensor, although I’ve gotten a preliminary idea about it through the docs.

I am a bit late in drafting a proposal but I will definitely try my best to formulate an approach to this project in the next few days. I would gladly welcome any suggestions from the mentors as well.
@ricardoV94 @larryshamalama

Thanks!

2 Likes

Hi Shreyas! Thanks for your interest in participating in GSoC with us. We look forward to seeing your proposal, and please feel free to contact Ricardo or Larry (or me) if you have questions in the meantime.

3 Likes

Hi @shreyas3156 just a heads up that another participant is writing an application to the same project: Gsoc 2023 proposal feedback

This is not a problem, but at most one participant can be picked for a project. If you have another project you feel equally interested feel free to target that instead. If this project speaks to you personally, by all means go ahead with a proposal. The quicker you can share something, the earlier we can provide feedback.

Best of luck!

@ricardoV94 Thank you for your suggestions! I understand and I will share a draft as soon as possible. The prospect of using my knowledge of Graph Theory and the canonicalization of censored distribution graphs appealed to me in this project. If the proposal does not turn out to be as suitable for the project, I would gladly shift my focus to one of the other projects, ‘Better tools to interpret complex regression models’.

Looking forward to the proposal :slight_smile:

Hi @larryshamalama @ricardoV94, I had a few questions about the project and I was hoping you could shed some light on them.

  1. For the pdf of an interval-censored distribution, I just wanted to confirm, based on the pymc.Censored and the binning examples,

The pdf in the interval should be 0 and the pdf at the cutoff points should be the difference in the CDFs at that point and the previous one?

  1. Are there more ways other than using pm.math.switch() to perform interval censoring? (I came across IfElse too but it does not execute both the branches of the clause so I don’t think it can be used)

  2. Is it acceptable to define a new ScalarOp in PyTensor or is it something we generally want to avoid?

  3. Since we use MeasurableVariables in the IR graphs and preserve the RV mappings, is the autodifferentiation carried out on the IR graph (through its logp) or the original graph? I ask this in light of the idea of defining a new MeasurableElemwise to replace the Elemwise{Switch} Op in the IR graph but I’m unsure if it will break posterior sampling.

@shreyas3156 Sorry for the delay in answering. All good questions, let me see if I can reply:

Yes, that’s my understanding

I think switch is the most natural, with the last condition being the “didn’t match anything → logp=-np.inf”

We generally want to avoid it but can be considered if it seems like a clear winner. Are you thinking of a Switch with multiple branches?

Differentiation is carried out in the logp graph, so the IR doesn’t matter for that.

2 Likes