Tracking inbox state

Today I worked on the logic for tracking the state of posts in a user’s inbox. I’ve abstained from jumping too quickly into the implementation phase, to avoid painting myself into a corner and potentially having to rewind data architecture decisions prematurely.

It’s worth mentioning a few articles that are helping guide my thinking:

Lee Byron’s post architecting a Facebook-like activity feed
→ The definition of event sourcing and various related articles floating around the internet

Some questions I’ve been pondering:

→ How should prioritization work?
→ Should posts ever automatically be dismissed when read?
→ How should I represent the “reason” a post is in the inbox?
→ Should I use fanout on read or write to assemble the inbox?
→ How will Level prevent inbox bloat for the user?

I’ll admit, I’ve felt a bit paralyzed by the scope of all these decisions the last few weeks. Rather than let the paralysis continue, I decided to start nibbling away at the problem over the weekend and into today.

So far, I have:

→ Created a log table to track changes in the relationship between a post and a user (marked as unread, marked as read, dismissed from the inbox, subscribed, unsubscribed, etc.)
→ Created a table to track views of posts
→ Added a field to the post-user join table called inbox_state to hold the current state: read, unread, dismissed, or “excluded” (similar to “dismissed,” but means the post was never in the inbox)
→ Refactored the “create post” and “create reply” operations and added a bunch of post-create tasks: subscribe any @-mentioned users, mark the post as “unread” for any subscribed users, insert log entries, and propagate pubsub events for the front-end to intercept
→ Added filtering by inbox state on the posts GraphQL collection

My primary concern is the sheer amount of work that happens during the create operation. For example, if a post has hundreds of people subscribed to it, then hundreds of database records would be inserted/updated after every single reply is sent with my current implementation.

This is a challenge of scale that I don’t expect to manifest in the near-term (at least not while I’m testing with early users and validating my product assumptions). For now, updating state at write-time makes fetching the correct posts at read time very performant, without having to join a bazillion tables together. Down the line, some or all of this fan-out operation may need to shift to read-time or period background refresh jobs.

There’s much more on this topic to come in future posts!

 
26
Kudos
 
26
Kudos

Now read this

Full-text search with Postgres

This week I built very basic search capabilities into Level. In the past, I’ve used Elasticsearch. I had some hesitations: Synchronizing data stores is a pain It’s another piece of infrastructure to administrate It makes it harder and... Continue →