Chatter: static generation for online comments

Recently there seems to have been a swell of interest around personal websites and blogs, and along with that I've noticed more discussion about embeddable comment engines and open source alternatives. A project I spent a few hours on early last year seems to approach this task in a different way, so I thought I'd tell you a little about it.

Comment boxes are commonly found all over the web. Disqus is one of the more popular solutions for implementing one on your own website, but it brings along ads and all the tracking that come along ads on the web. There are open source comment engines as well, but all that I am familiar with require a long running web service, which can be more effort than it's worth for smaller sites or sites that are statically generated. Ideally what I want is something I can set and forget, that requires no maintenance and doesn't cost me anything if it doesn't get used much.

So now there's chatter, a statically generated comment engine. Ok so "engine" might be overselling it, it's more of a comment "box". Regardless, static generation means we can serve our comments directly from S3, which pairs nicely with AWS Lambda to allow us to regenerate our comments on demand without the need to maintain a server. The pricing model of these two services scales with usage, which means in many cases means our maintenance and running costs round down to 0. Like always, these benefits come with tradeoffs, the biggest of which is probably S3's eventual consistency.

Dealing with eventual consistency

We're using S3 as our datastore, which does not guarantee strong consistency. Instead, S3 is an "eventually consistent" system - the changes that we make may not be visible to us for a short amount of time until the change propagates throughout the rest of system. (I think DNS is probably the most widely known example of an eventually consistent system?) In practical terms, the issue this poses for us is that we might miss recent comments when we're generating our comment index. To better illustrate the issue and the tradeoffs of our solution, the way chatter works with S3 is like this:

The issue rears its head around the second step. If our indexing function runs before the comment is fully propagated, the list operation it executes on our bucket may not contain the latest files, which means the generated index will also not contain the latest comments. We could try to grab the latest version of the index file and add our comment to the end of it, but that would introduce a bigger issue which risks completely losing any comments. If two comments were submitted in quick succession the second lambda function is likely to get a stale copy of the index file, thereby overwriting the changes from the first lambda function. Instead, chatter does the simple thing and delays the generation of the index file, which has tradeoffs of its own.

  1. If S3 is especially slow to propagate changes our comment will be missing from the index anyway. It will be eventually be included in subsequent regenerations, so it is not permanently lost.
  2. Experience wise, somebody could think their comment hadn't actually been posted if they were to refresh the page. My workaround for this is to push a little more complexity to the client-side code, where we display the comment from local storage until it appears in the index.

We could avoid this issue entirely if we used a datastore with stronger consistency guarantees. I'm not sure if there's a landscape for pay-by-the-minute style data stores, but I imagine most would fail the price point requirement.

How do I use it?

Getting chatter running is somewhat simple thanks to zappa - install the few dependencies + tell it your S3 bucket and you should only need to run zappa deploy prod to be up and running. The problem right now is chatter doesn't come with a frontend part, so integrating it into your site is a custom job. That is something that would be good to change, but for now all that exists is a small example on the gh-pages branch. The one benefit of this the ability to customise it entirely - you could run it entirely without relying on javascript (if that's your thing), by having zappa generate an html index instead of a json one.

You can see it working below, or find it on GitHub.

;