How to model parameters of a time series?

Hi! I have a toy dataset of posts and comments of a site. I want to model (and predict) the number of posts.

About the data:

  • The number of posts has a very strong weekly seasonality (the number of posts over weekends is twice lower than on weekdays), monthly seasonality (with a low season during summer) and a trend.
  • A post can be deleted (like spam) or not deleted. It can have a score (number from -100 to +100) and it can have some comments.

From the data I see that:

  • If a post is deleted, the author is less likely to create another post.
  • If a post is negatively scored, the author is less likely to post again.
  • If a post has comments, the author is more likely to post again.

I want to model the number of posts and predict it for the next day / month / year. I used Prophet and it worked very well. Now I am investigating if and how to create a model to look at what-if scenarios. For example, how the number of posts will look like in the long run if users of the site become more active in comments or if there is a group of users who start downvoting / upvoting all posts.

In the case of the deleted / not deleted state I can present the total number of posts as a sum of two models: deleted posts and not deleted posts and use Prophet for each independently. But it gets tricky when I try to think of how to model deleted / not deleted plus score and comments.

I would appreciate it if you can suggest a way to model this use case and / or link where I can read more about possible approaches.

Depending on how complex you want to model the dynamic, if you are restricting to using Prophet (i.e. regression like model), it means you need to fit a model with corresponding predictor so you can do posterior prediction conditioned on different parameter set. More specifically, mapping the question you have into a predictor (i.e., a column of value that has the same length as the number of time step):

“how the number of posts will look like in the long run if users of the site become more active in comments”
→ a predictor of current comment. You can fit another time series of the number of comments over time (probably better to have number of comments per post). Since delay effect is likely, you can use the average of the last week or last month.

Once you create this predictor, plug into Prophet as a new predictor of your time series model. After fitting, you can see the impact of this predictor by looking at the coefficient (just 1 scalar number), and you can increase or decrease the number to investigate what-if scenarios.
You can do the same for score. For “a group of users who start downvoting / upvoting all posts” you can do something like “percentage of post downvote versus post view per user”?

Outside of regression like model, you can do more fancy thing like agent base simulation (model each user). You can actually consult some of the covid simulation and SIR model for that.

1 Like