aws

EC2 Disk Overview

Amazon’s EC2 service is really neat, but its disk subsystem has some peculiarities that are not initially obvious. Up until very recently, root directories (’/’) at EC2 were limited to 10Gb, a limit defined by the maximum size of an Amazon Machine Image (AMI), essentially a template of an EC2 instance. In order to use more disk space, Amazon provides ephemeral disks that one can format and mount anywhere on the file system. However, in order to get persistent storage, one has to use network-attached EBS volumes, a sort of limitless in capacity but bound in I/O wonder of Amazon architecture. There are clear performance implications in choosing how to configure an EC2 instance’s disk subsystem, so I recently benchmarked some various ephemeral and EBS RAID configurations.

Ephemeral disks

Pros:

  • Free (included in cost of EC2 instance)
  • Stable, predictable performance on par with a standard physical hard disk
  • Abundant storage (up to 1.7TB on a c1.xlarge)

Cons:

  • Ephemeral - if the instance shuts down, all data is lost
  • Average random seek performance (6-7ms seek times per spindle)
EBS Volumes
Pros:
  • "Highly available" - AWS claims to provide redundancy and a lower failure rate than physical disks
  • Portable - an EBS volume can be connected to any instance in a single availability zone
  • Backups - can easily create snapshots
Read more...
Months ago Amazon announced S3, which promised unlimited, fast, and inexpensive storage of any kind as a web service. For $.15/gig/month storage and $.20/gig/month bandwidth, it instantly gives anyone with some programming knowledge the ability to use an enterprise class storage network with zero up front cost. Anyway, today I stumbled upon jungledisk and elephantdrive. JungleDisk seems more like a project than a commercial venture, since you download one of their clients and plug in your own s3 account. You pay nothing to jungleddisk (for now) and pay Amazon for only what you use at s3. Elephantdrive is definitely a commercial venture and completely hides their affiliation with s3, but they do extend amazon’s SLA to the end user. I signed up at Elephantdrive, but unfortunately for now they only have a Windows client and so I’m forced to wait until their cross platform comes out. JungleDisk’s linux client (I believe written in C# and then mono’d) seems to work in that it allows me to upload and download files using my personal s3 account. There’s something wrong with it however, in that it pegs my CPU at 100%. I verified this on two machines and posted on their forums to see if there’s a known reason. Amazon is truly proving themselves a technology company and not just a glorified online bookstore. With bittorrent support, we’re bound to see some more really cool stuff in the near future. Check out this Flickr to S3 backup script.