Setting up a Bullet-proof Blog - my notes

These are my notes, made as I did the work. Its more of a journal than a how-to. In particular, I didn’t record all the details, but instead noted things that I found non-intuitive or that I just messed up. Good news is I found Amazon’s web interface to be extremely straightforward and well documented. There was only one place where a text box demanded XML input, and for that there was excellent documentation and a blob to just cut and paste. Better yet, the CloudFront web interface ended up filling it in for me.

I also had a secondary goal: complete the work as fast as possible, with minimal distraction. This process took me seven hours, which included three hours monkeying about with different markdown tools. This jury of one is still out on writing with Markdown.

A New Blog

I need a new blog. A place to write about whatever shiny thing has my attention. Although its exceedingly unlikely, I’d like my blog to be able to handle the load if I ever post something interesing enough to make it to the front page of Hacker News. I’ve seen too many posts on Hacker News where the first post is something along the lines of “Author here: our website went down under the load. Here is a copy <link to some quickly made copy of just that one page>”. I appreciate that this site may not appear to be any better than these hastily erected facades, but it has one important difference: its the original. There isn’t some other version of it waiting to show blank pages for the first 30 minutes while I figure it out. I’m figuring this shit out first.

(For non-HNers, let me just say that Hacker News is a site run by venture capitalist Paul Graham and nothing to do with breaking into other people’s computers)

So. Is there a way to run a blog, at low cost, that can survive a hit from HN?

Task for Today

  1. Build a basic site, laid out like a blog.
  2. Host it to Amazon S3
  3. Configure Amazon Cloudfront
  4. New: Configure Route 53

Arguably, that step 3 isn’t required. S3 will host a site just fine, even if 1,000, 000 visitors show up in an hour. But there’s something to be said for how fast a page loads. It just feels more professional if its fast.

Also, I don’t often get complete freedom to build a site from scratch - I’m usually working with existing sites and I’m involved because they don’t scale.

Build a basic site

Ok, here it is. Two bits of markdown, converted to html using multimarkdown and a makefile (seriously), with a tiny bit of css.

Host it on Amazon S3

  1. Sign up for Amazon AWS
  2. Create bucket
  3. Set permissions
  4. Enable static website hosting

The second example on the S3 documentation page is for enabling anyone to view the content.

At this point in the setup process my S3 bucket is set up to allow anyone to view it. This helps me test everything is working. It would be viewable at http://jamiebriant.com.s3-website-us-east–1.amazonaws.com/setup .html.

The use of jamiebriant.com turned out to be a problem, though easily rectified.

Configure Amazon Cloudfront

Cloudfront is Amazon’s CDN.

First, a discussion about caching. This is what the docs say about that:

You can control how long your objects stay in a CloudFront cache before CloudFront forwards another request to your origin. Reducing the duration allows you to serve dynamic content. Increasing the duration means your customers get better performance because your objects are more likely to be served directly from the edge cache. A longer duration also reduces the load on your origin.

“load” here is what usually brings down a website and takes it offline. I’m not exactly worried about “load” on my origin, since the origin is S3. I think the S3 servers can handle the load. However, “load” equates to GETs and GETs are what I am billed for. So I’m motivated to reduce load.

In “internet-speed” terms, a blog post is a tremendously slow-changing object. What’s the right balance? Answer: I don’t really know yet. I could tell Cloudfront to cache my data for 24 hours, but there’s no guarantee that an edge server will actually bother to keep the content for that long (anyone know?).

What if I need to edit the page? With a high TTL, edits to the page wont be seen by the cache. Not quite true. If an edge server has already cached the page, then it wont update the page when I make changes. Its decision to update the page is entirely based on time: when did it first fetch the page, and how long has it been since then. It doesn’t care that I’ve made some super critical updates that the world must see!

There is a method to handle this situation: cache invalidation requests. Its possible to tell Cloudfront to invalidate a page, for a price. It costs $5.00 for 1000 requests, and the first 1000 are free. So it seems entirely reasonable for this little blog to tell Cloudfront to cache forever and then send invalidation requests if a page ever changes.

Experience suggests that after posting, I’ll be unable to resist tweaking it, and there’s always the possibility that I’ll need to respond to comments or recognize insights. So I’ve toyed with the idea of setting the TTL to a “low” value like 5 minutes, and then increasing it over time. Until I’ve got it working, its just pure speculation about which is easier. So lets make it work!

Configure CloudFront

Ok, that was pretty straightforward. The help offered on the page was enough for me. The only thing I ran into was the zone apex issue. I want my blog to be at the zone apex: http://jamiebriant.com, however S3 and Cloudfront use CNAMEs and CNAMEs don’t work on the zone apex. There is a workaround, but it requires using AWS DNS service, Route 53.

For now, http://d36rc9nrovayl6.cloudfront.net is my CloudFront URL.

Configure Route 53

I can use my existing DNS provider to point www.jamiebriant.com at d36rc9nrovayl6.cloudfront.net. I actually want to point jamiebriant.com (without the www) at d36rc9nrovayl6.cloudfront.net. To do that I need Route 53. Bonus, route 53 is highly configurable, even with the browser-based control panel (when I first used it, it was API command line tools only). Its also “basically free”. Its not actually free, but its pennies.

  1. Navigate to Route 53 Console
  2. Whack the big “Create Hosted Zone” button in the middle of the screen.
  3. Give it my domain name “jamiebriant.com”
  4. The page now shows me a list of name servers to update my DNS records with.
  5. Update registrar
  6. Woha hold your horses. I have a bunch of MX records and things set up already!
  7. Ok, copy my existing domain records to Route 53.
  8. Now update my domain’s DNS records.

Wow. Step 8 was a frustrating process. Turns out my registrar silently fails if you give it a domain with period at the end - the format DNS records use. It doesn’t display an error. It just goes back to the main page. It took me a couple of goes to figure out what the problem was. I will be switching this domain to namecheap.com too.

Zone Apex Hackery

Route 53 can create aliases for records, but the types of the record can match. It can also create aliases to Elastic Load Balancers or to S3 buckets. Unfortunately, it cannot point them at CloudFront, which seems like a glaring omission.

It can create an alias for the A record of jamiebriant.com, but only to another A record. If www.jamiebriant.com had an A record I’d be done. However for CloudFront to work, www.jamiebriant.com must be a CNAME. If CloudFront supported A records I wouldn’t be using www at all. What to do?

Well an S3 bucket can be configured to redirect all requests to a different address. So I created another bucket, completely empty, that redirects all requests to www.jamiebriant.com. Then I use Route 53 to point jamiebriant.com at this new bucket, and that, in turn, sends redirects to the browser.

Here’s my problem: I created my bucket as “jamiebriant.com”. I tried to create a new bucked “redirect.jamiebriant .com” and have it redirect to “www.jamiebriant.com”. That worked. What didn’t work was the Route 53 Alias. An Alias handles the IP address, not the domain name. The Alias for “jamiebriant.com” gives the browser the IP address of S3 itself. The browser then sends a request for “jamiebriant.com” to that IP address, which retrieves my S3 bucket “jamiebriant.com” and not “redirect.jamiebriant.com”. If I was just using S3, that would be perfect! I’d have my website running on my zone apex. Unfortunately, CloudFront doesn’t work.

Fixing it was fairly straight-forward. I moved my content to a new bucket, “www.jamiebriant.com”, and I changed the S3 bucket “jamiebriant.com” to redirect everything to the new bucket. Now Route 53 returnes the S3 IP, the browser hits S3 bucket “jamiebriant.com” and it returns HTTP 301 “Moved Permenently” to “www.jamiebriant.com” and finally our browser is talking to CloudFront!

Gotchas

When I first update cloudfront to point at the new www bucket, the index.html worked but blog/setup.html came back with access denied, even though accessing that path through s3 directly worked fine.

Updating Content

The two options mentioned earlier were a) setting TTLs on documents in S3 or b) manually telling CloudFront to invalidate. The CloudFront web interface made option b) entirely trivial for my current blog of two pages.

What about user interaction?

Some thoughts for later…

For this blog I’m entirely happy if discussion happens “offsite”. In fact I’d rather this page had an active discussion on HN or slashdot because that will drive traffic to my site.

Here are two reasons a blog might want to include a comments section:

  1. Controlling the communication, e.g. moderating posts.
  2. Capturing contact information for future use.

I question whether this requires special blogging software. I think this can be done with facebook and twitter software in the browser. That’ll be a post for another time.