









Amazon's EC2 service is really neat, but its disk subsystem has some peculiarities that are not initially obvious. Up until very recently, root directories ('/') at EC2 were limited to 10Gb, a limit defined by the maximum size of an Amazon Machine Image (AMI), essentially a template of an EC2 instance. In order to use more disk space, Amazon provides ephemeral disks that one can format and mount anywhere on the file system. However, in order to get persistent storage, one has to use network-attached EBS volumes, a sort of limitless in capacity but bound in I/O wonder of Amazon architecture. There are clear performance implications in choosing how to configure an EC2 instance's disk subsystem, so I recently benchmarked some various ephemeral and EBS RAID configurations.
Pros:
Cons:
Pros:
Cons:
For this testing, c1.xlarge instances were used due to their high CPU performance, memory capacity, "I/O Performance: High" (according to Amazon), and 4 available 450GB ephemeral disks.
I created 5 c1.xlarge instances with 5 configurations: 4xEphemeral RAID0 local disk, single EBS, 2xEBS RAID0, 4xEBS RAID0, 8xEBS RAID0. All instances were created in the us-east-1b Availability Zone and all EBS volumes attached were newly created specifically for this test. Testing was done using bonnie++ on fast mode (-f flag, skips per-char tests).
mdraid was used to create RAID0 arrays with a chunk size of 256k, for example:
mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=2 /dev/sdi1 /dev/sdi2
blockdev is used to set the read-ahead buffer to 64k:
blockdev --setra 65536 /dev/md0
XFS is used as the filesystem:
mkfs.xfs -f /dev/md0
Finally the RAID array is mounted with noatime at /mnt/md0:
mkdir -p /mnt/md0 && mount -o noatime /dev/md0 /mnt/md0
I logged the results of Sequential Writes, Sequential Reads, and Random Seeks. bonnie++ was run 6 times on each instance.
bonnie++ averages
|
Sequential Throughput
|
Four ephemeral disks in a RAID0 configuration has extremely high throughput and an acceptable random seek performance. The ephemeral array results are almost a 4x scale of the same test of my desktop's 7200RPM desktop drive, which is what one would expect out of a RAID0 array of physical hard disks.
The EBS results are a little less predictable. A single EBS does not have the throughput of a single ephemeral drive. The 2xEBS RAID0 shows almost twice the throughput of the single EBS volume, while the 4xEBS RAID0 and 8xEBS RAID0 instances do not scale much higher than the 2xEBS RAID0 instance for throughput. Since EBS volumes are access via network, this indicates that EBS volume throughput is limited by the gigabit interface.
Random Seek Times
|
Random Seeks
|
The ephemeral array does about 165 random seeks per second, which is comparable to a desktop hard disk.
EBS random seek performance, however, is not easily predictable. The volumes that make up the 4xEBS RAID0 instance clearly are higher performing than those of the other instances. Is EBS performance more of a property of the EBS volumes or the instance?
Another interesting result I noticed (but didn't include in these graphs) is the deviation of performance from one run to another. The standard deviation between the runs was much smaller for the ephemeral drives than for the EBS volumes.
I attached the two EBS volumes from the poorly performing 2xEBS RAID0 instance to the fast 4xEBS RAID0 instance and re-ran the tests. If the performance of the two EBS volumes improves when attached to the 4xEBS RAID0 instance, then perhaps we can attribute the difference to the instances, but if the performance is the same, then we can blame the EBS volumes themselves.
Results:
| Configuration | Seq W/s | Seq R/s | Ran. Seeks | Seeks/EBS |
|---|---|---|---|---|
| 2xEBS Volumes on 4xEBS Instance | 110146 | 91555 | 795.6 | 397.8 |
The I/O channel is more or less saturated, but we still see the same poor random seek performance that the 2xEBS RAID0 instance exhibited with these two same EBS volumes. This leads me to believe that the seek times are inherent to the individual EBS volumes themselves.
To confirm, I mounted the high performance volumes from the 4xEBS RAID0 instance and the poorly performing volumes from the 2xEBS RAID0 instance to the 8xEBS RAID0 instance. I wanted to test if we can "export" the high the performance from the 4xEBS RAID0 instance to the 8xEBS RAID0 instance. I then repeated the bonnie++ tests.
Results:
| Configuration | Seq W/s | Seq R/s | Ran. Seeks | Seeks/EBS |
|---|---|---|---|---|
| 8xEBS RAID0 (benchmark for instance) | 39238 | 90403 | 1629 | 204 |
| 2xEBS Volumes on 8xEBS Instance | 108108 | 94189 | 735.3 | 368 |
| 4xEBS Volumes on 8xEBS Instance | 125459 | 93972 | 9285 | 2321 |
Once again, the 2xEBS volumes are still poorly performing and the previously fast 4xEBS volumes are still fast (they were even faster than before). At this point, the evidence is pretty clear that the performance of the EBS volumes are inherent to the volume itself, since they exhibit the same level of performance regardless of the EC2 instance that mounts them.
I wanted to test if EBS performance varies over time, so I created a new c1.xlarge instance, in another EC2 availability zone. This new c1.xlarge instance had 4 new EBS volumes, configured as a 2xEBS RAID0 array (two EBS volumes unused) and a 4xEBS RAID0 array (all four used). I ran bonnie++ over two weeks.
Results:
| Configuration | Seq W/s | Seq R/s | Ran. Seeks | Seeks/EBS | |
|---|---|---|---|---|---|
| 2xEBS RAID0 | 107513 | 92681 | 2642 | 1321 | (week 1) |
| 4xEBS RAID0 | 112326 | 94844 | 7829 | 1957 | (week 1) |
| 2xEBS RAID0 | 35799 | 68619 | 215 | 108 | (week 2) |
| 4xEBS RAID0 | 88012 | 92863 | 623 | 156 | (week 2) |
The same instance using the 4 same EBS volumes show a huge discrepancy in performance from week to week. During the first week, the 4 EBS volumes perform admirably. During the second week however, performance drops dramatically and the 4xEBS RAID0 volumes don't even seem to saturate the gigabit channel. This doesn't bode well for EBS performance predictability.
During the second week, I ran 'iostat -x -m 240' while I ran bonnie++ to see if I could identify the poor performance.
2xEBS RAID0:
avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 0.47 11.99 0.02 87.51
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
md0 0.00 0.00 0.00 615.20 0.00 25.42 84.63 0.00 0.00 0.00 0.00
sdi1 0.00 0.06 0.00 307.41 0.00 12.71 84.67 148.55 483.20 3.25 100.02
sdi2 0.00 0.04 0.00 307.60 0.00 12.71 84.61 10.49 34.10 1.19 36.67
sdi3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Notice the particularly poor 'await' time of /dev/sdi1, one of the two members of the /dev/md0 RAID0 array. Because of how poorly /dev/sdi1 was performing, the entire /dev/md0 array exhibited poor performance. During a read request, the file file system sends a request to the /dev/md0 device, and mdadm determines that it needs to split the request into two separate requests, one that goes to /dev/sdi1 and /dev/sdi2. If one of the two members of the array is poorly performing, as in this situation, it becomes a bottleneck for the entire array. (As an aside, a few hours after seeing these poor numbers, I re-ran bonnie++ on these 4 EBS volumes, and they were once again fast.)
These tests show a tremendous variability in EBS performance, not just between one EBS volume and another but also within a single EBS, from one point in time to another. Like CPU time on a shared server, an EBS volume's performance is dependent on how busy the entire EC2 ecosystem is. However, AWS guarantees a certain number of CPU cycles and amount of RAM in an EC2 instance; it's not clear that AWS provides a similar guarantee for EBS performance.
There are a few key takeways:
You should also really read Joe Stump's good writeup and Heroku's Getting Good IO from Amazon's EBS.
Memory is really strange. On the one hand, I'm amazed at how fast the last three years of my life has gone by. I remember walking into my hotel room on the day I landed in Stuttgart, a full month before Rebecca would come, and putting down my bags and really wondering what I had gotten myself into. I had decided to take a job in a city I had visited only during my interviews, in a country I had only spent a few days in as a tourist, and here I am sitting down in my hotel room, in need of a shower, exhausted from jet-lag, only then grasping that I had committed myself and Rebecca to living in Stuttgart for at least three years. At that moment I could only hope that we had made the right choice to come. It turned out to be one of the best decisions of our lives.
Remembering specific events, though, and time doesn't seem to move so quickly. I think about my first weekend in Germany, when a colleague invited me to an Onion Festival in the medieval town of Esslingen, and it seems appropriately placed about three years ago. Then I remember when our friend Laurel visited, our first visitor, I think, and how we went out to a besenwirtschaft (a uniquely south-west Germany gem, in which vineyard-owning families sell their own wine out of their living rooms) and got extremely intoxicated with a super friendly German couple. We ended up getting invited to their home for a few more bottles of wine, and Rebecca got sick in their bathroom just as our taxi pulled up. I remember all of the festivals - the Hamburg Fischmarkt, Karnival, the Weindorf, and of course the Bierfests (Germans love to find a reason, any reason, to have a festival). I remember all our visitors - our families and lots of friends from home - who took advantage of us living in Stuttgart and allowed us to share our newly found love of Germany with them. I think fondly of all the trips we took - the Turin Winter Olympics, Sardinia, the Lake District, Poland, and so many more. The more memories I conjure up and place into a mental timeline, the more it seems like it really has been three, full, years since I stepped into my room at the Millennium Hotel, and I'm both at once happy for the experience and sad that I can no longer call Stuttgart home, even if it means I no longer have to walk up 6 flights of stairs to be home.
We hope to go 2 for 2 on picking cities randomly and moving without any prior connection, and so far Austin has really been a great place. Many great things about German culture are embraced in Austin - love for the outdoors and festivals being the two most obvious. There are even biergartens, and the town of Fredricksburg, located in the center of Texas wine country (another huge similarity to Stuttgart!), was founded by Germans, and I think the German influence on the local culture shows. There's even a local waterpark called the Schlitterbahn.
I think we're off to a good start.
I went to Brussels last weekend for FOSDEM 2008, which was held at ULB Campus Solbosh. The free event was a good way to check in with the overall Open Source community and to see all of the interesting things people outside my normal circles are working on.
Friday Night Beer Event
Things got off to an memorable start on Friday night. I timed my arrival so that I could attend the Friday night "Pink Elephant" beer event held at the Delirium Cafe. I met up with a colleague, and we had a few good beers while chatting with other FOSDEM attendees. Lots of people had their gadgets out for others to play with. I got to play with a EeePC and a Nokia 810 while my iPhone was passed around. I even picked up the presence of a OLPC OX-1 over wifi, but was never actually able to find it.
After a few hours of drinking beer and talking about software, we met up with a few more friends to go to dinner at an underwhelming yet wistfully overpriced restaurant in the middle of the tourist trap. I had another beer or two over dinner, and so when we left the restaurant, I was a little toasted.
For some reason (playing with my phone?) I was straggling behind as we walked out when these two guys sidled up to me and started dancing, singing yelling, and doing some weird line dance kick between my legs. In my drunken state, I was a bit confused but thought they were just drunk too and danced along. After a few moments of this silliness, they walked off. I luckily had a moment of clarity and thought it best to check my pockets. Wait, my wallet is missing. Yup, it really is still missing. The two guys hadn't taken more than 20 steps down the street, so I ran up to the nearest one, forcefully grabbed his shoulder, and demanded, "Give me back my wallet." He looked a bit surprised and immediately pointed to his accomplice. I turned to him and without a word, he reached into his coat pocket and handed over my wallet. I took it from his hands, and strangely enough, we just parted ways. The entire episode lasted probably 30 seconds or so, and my friends, who were only a few steps ahead, missed it all.
Talks
The next morning I was a bit slow getting up and got to FOSDEM about an hour late, missing the opening keynote (it didn't help that I stayed up for a few more hours playing poker with the hotel staffer and his friends, but that's another blog post). I pretty much spent Saturday in the Janson auditorium listening to the big talks - "How a large scale opensource project works" with Robert Watson, "Perl 6" with Patrick Michaud, and "Unicoding with PHP 6" with Andrei Zmievski. I also squeezed in some quick 15-minute "lightning" talks about smaller open source projects like Alfresco, OpenAFS, and Squeak.
I was even slower getting up on Sunday morning* and missed the Drupal opening talks by Dries. I did catch Kris Buytaert's "Drupal and MySQL High Availability", which was quite good. In addition, I took the opportunity to see a talk on CakePHP and Mozilla's upcoming Prism.
Thoughts
My colleagues in attendance weren't too enthusiastic about this year's FOSDEM. Their main complaint was that it has become a little too commercialized with seemingly marketing-oriented talks, rather than more in-depth code talks. While I can understand this sentiment, I think the problem is mainly with their expectations of FOSDEM. FOSDEM should be a venue for projects to open up to people outside of their core community. A code-driven, detailed talk about the intricacies of the Form API in Drupal 6, for example, would only be digestible by experienced members of the Drupal community, most of whom would be familiar with the FAPI in the first place. Higher-level talks allow small projects, such as Squeak and CakePHP, to attract people like me who have a passing interest and may even be pulled in enough to try the stuff out.
Some of the speakers were certainly better than others. FOSDEM (and Open Source in general) is a pretty international affair, and because the conference was conducted in English, there were varying levels of English public speaking abilities. Overall, however, I thought the speakers were quite good and spoke to the subject matters well. My only complaint is that FOSDEM seems to be outgrowing its britches. There were lots in attendance, and at times, it was a little bit difficult walking through the masses to get to the talks in time. That probably speaks to the growing popularity of OSS, which is always a good thing.
*I discovered the Grand Casino Brussels on Saturday night and was there until almost 4 in the morning waiting on a seat at the Hold 'em table. Generally casinos in Europe are quite stuck up about dress code and appearances (to the point of making you rent an evening jacket), but I found Brussels casino to be very welcoming. You still won't find flip-flops and t-shirts like you would at some places in Vegas, but at least you can walk in reasonably dressed. Anyway, at 11PM I was #3 in line for a seat and only got to #1 by 3:30am before I had had enough and just left. They had two tables of €5/€10 NL Texas Hold'em, but apparently they sometimes also have €10/€20 limit as well.
Pretty scary stuff, even if you trust all of your users:
victor@mercury ~ $ ./exploit ----------------------------------- Linux vmsplice Local Root Exploit By qaaz ----------------------------------- [+] mmap: 0x100000000000 .. 0x100000001000 [+] page: 0x100000000000 [+] page: 0x100000000038 [+] mmap: 0x4000 .. 0x5000 [+] page: 0x4000 [+] page: 0x4038 [+] mmap: 0x1000 .. 0x2000 [+] page: 0x1000 [+] mmap: 0x2ac3dee3c000 .. 0x2ac3dee6e000 [+] root mercury ~ # whoami root
What's really amazing is that news of this vulnerability didn't really hit the mainstream web until today, but yet on Friday there was already a kernel patch. There's even an in-memory hotfix that you can use (I tried that too - it works) if you prefer to wait until an official kernel makes it downstream. Open source is amazing.
Had this been proprietary software, no one would have known about it except for the all the people exploiting it. Servers all over the world would get owned, and the software company wouldn't even discover it for a few more weeks. Or worse, they would know about it, but would hope to keep it hush-hush until the next Patch Tuesday.
For the last few years I've been using Gmail exclusively and have been forwarding emails to @victortrac.com to my Gmail account. Google's spam filters are the best I've ever seen, and the interface is elegant and fast, and combined with loads of storage and IMAP access, Gmail is nearly the perfect email application. The XMPP integration is just icing on the cake.
Because of these features, I voluntarily gave up having a customized email address on my personal domain to take advantage of Google's infrastructure and technology. The decision was fairly easy - I was deluged in spam and GMail's web client was better than any other thin or thick client available. By forwarding my domain's email to my Gmail account, I was letting Google's wonderful anti-spam technology work its magic. This allowed me to retain some use of my previous email address, but as I started to use XMPP (aka Jabber or as Google calls it - Google Talk) I became more and more dependent on my Gmail identity. Sure, I had other Jabber IDs, but it was just too convenient having a unified email address and Jabber ID provided by Gmail.
However, let's say that in five years Google shuts down or, more likely, another company comes along and provides a better service or product. By this time your Gmail identity has evolved into a unified presence, communications, and identification address where anyone can reach you at any time and is also your OpenID login to the majority of sites on the internet. If you've spent 10 years building this identity around a Gmail address, you're not in a great position to easily transition. By using Google Apps on a domain that you own and control, you've at least separated the address from the services and would be able to move around as you want. It's like being able to live all over the world, moving to where the grass is always greener, yet still always having a constant mailing address.
So today I registered and migrated victortrac.com to Google Apps, allowing me to use all of Google's great software on my personalized address. The registration process is really quick and simple, and the actual migration part is just a handful of DNS changes depending on what services you want to switch over to Google. For me it is just email and chat, and Google's documentation made it clear which MX servers I need to point my domain to.
For XMPP, however, the documentation isn't very complete. According to this page, you need to add the following SRV records to your DNS server (replace gmail.com with your own domain):
_xmpp-server._tcp.gmail.com. IN SRV 5 0 5269 xmpp-server.l.google.com.
_xmpp-server._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server1.l.google.com.
_xmpp-server._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server2.l.google.com.
_xmpp-server._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server3.l.google.com.
_xmpp-server._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server4.l.google.com.
_jabber._tcp.gmail.com. IN SRV 5 0 5269 xmpp-server.l.google.com.
_jabber._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server1.l.google.com.
_jabber._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server2.l.google.com.
_jabber._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server3.l.google.com.
_jabber._tcp.gmail.com. IN SRV 20 0 5269 xmpp-server4.l.google.com.
The _xmpp-server._tcp and _jabber._tcp SRV records tell the requesting server to look at Google's XMPP servers when there's an XMPP request. There are two minor problems here:
This means that Google's example only really adds s2s functionality to the thin client built into Gmail or Google's GTalk thick client, which contradicts this help page for configuring Pidgin to work with your Google Apps domain (there's a whole thread on Google groups about people following Google's directions exactly but not being able to connect properly with Pidgin).
In order to get a third party client to connect to Google's XMPP servers, you'll have to manually configure a "Connect to server" to go directly to talk.google.com. The better solution, however, is to add another set of SRV records (again, replace gmail.com with your own domain):
_xmpp-client._tcp.gmail.com. IN SRV 5 0 5222 xmpp-server.l.google.com.
_xmpp-client._tcp.gmail.com. IN SRV 20 0 5222 xmpp-server1.l.google.com.
_xmpp-client._tcp.gmail.com. IN SRV 20 0 5222 xmpp-server2.l.google.com.
_xmpp-client._tcp.gmail.com. IN SRV 20 0 5222 xmpp-server3.l.google.com.
_xmpp-client._tcp.gmail.com. IN SRV 20 0 5222 xmpp-server4.l.google.com.
With these additional records, when XMPP clients try to log into your domain.com, your DNS server responds down the list and tells it to check on port 5222 on one of Google's servers.
I'm only a few hours into my migration over to Google Apps, but I think it'll be a good fit for me. Now if only Google would roll out OpenID.... :)