First off, I have to give many deepest apologies to David C. in the Sun/Nauticus group who was kind enough to send us a pair of N1400Vs for demo purposes. They've been here longer than I care to own up to, waiting on me to demo them. Its definitely not been for lack of interest…but rather has been related to an all consuming Sun Cluster project that's about 8 weeks over schedule! In any event, I FINALLY was able to put the SC project on hold (more to come on this soon!), in order to get the SunFish swimming

(the N1400V was codenamed SunFish, which is infinitely cooler ;-) ).

A little background:

DigiTar uses load balancers like a Frenchman uses perfume…everywhere and often. Given our predilection for them, we need the most powerful web switches we can get our hands on. They're the sinew that holds us together. Up until now we've been deploying pairs of Nortel Alteon 184s everywhere we needed load balancing, which has included our MySQL fail-over strategy (which the SC project is supposed to replace…ugh!). Our Alteons have been rock solid, and frankly, are my favorite pieces of gear. They work…all the time…every time. We've truly abused them, and they keep humpin' it up the mountain… If you need a load balancer, you can't go wrong with an Alteon and I can't say enough nice things about these bad boys.

Unfortunately, going back to “needing the most powerful ones”, the 184s are starting to get tapped out. Also, we'd like to consolidate down into fewer of them. So just about the time the SunFish arrived, we were getting ready to replace our 184s with a fewer number of Alteon 2424-SSLs. (If you've never seen an Alteon 2424 do its magic…you're missing out. Its a beast! Tapping it out is a challenge.) Alas, our local Sun evangelists asked if we'd looked at the Sun load-balancers…and thus like so many of our odysseys of late, began our ascent into another Sun journey of discovery…(thanks Jamison & Elizabeth!)

Its only a phone call…

Like every one of our technology disruptions, this one began with a tiny little con call…what could it hurt? Right? On the other end of that call was David C. David put up with every single pushy (and somewhat Alteon-bigoted) question we had. At the end of the phone call we were a bit intrigued, but it was the presentation numbers after the call that sold us on ripping out our beloved Alteons.The fact that we've been told by our “sources” that a lot of the original Alteon engineers went to Nauticus (pre-Sun) didn't hurt. ;-)

So what sold us? Well, the L4-L7 load balancing throughput was more than 3x what we were expecting from the 2424-SSLs, and the SSL acceleration throughput was so much higher I can't even mention it without embarrassing Nortel. If the 2424-SSLs are beasts, then the SunFish are 8000lb silverbacks on a steroid-regimen that would make Barry Bonds permanently sterile. And the REALLY ridiculous part… the SunFish (N1400) are the babies of the line. There is an N2120 with twice the performance.

Outside of the performance, what really convinced us to attempt a heart transplantwas the ability of a SunFish to slice itself into 10 virtualSLB switches. One thing we had tried to do early on was consolidate multiple SLB groups into a single Alteon switch. The problem was security. Because the Alteon (like the SunFish) is first and foremost a switch (the source of its power), it is almost impossible to segment SLB groups on secure subnets from SLB groups on insecure subnets. Even using VLANs we've occasionally seen packet leakage in testing. So here was what was buzzing around in our brains:

1.) We want to consolidate down to fewer web switches (load balancers).
2.) Secure subnets have to be absolutely segregated from insecure subnets.
3.) One SunFish pair could easily replace 5 of our Alteon pairs.
4.) SunFish can slice themselves into completely separated vSwitches.

hmmm….I wonder….could we collapse down to a single pair of SunFish per facility?

As with many things in DigiTar's history, Providence has introduced what we didn't know we needed at the precisely right time…

Enough already…where are those first impressions?

I'm running out of time tonight…so here's a quick run-down (there'll be more, I promise):

  • SunFish are truly…completely…utterly different animals from any other web switch. The concepts of vSwitches make it necessary, but boy is it worth it.
  • Any schlocker can configure the whole thing from an incredibly slick Flash WebUI…and the CLI ain't half bad. It isn't the Alteon CLI but, hey, nothing 'cept JUNOS has a better CLI than Alteon's WebOS.
  • Its a little harder than it should be to find the documentation. The SunFish OS 3.0 documentation is spotty as all get out. If you can configure this puppy from 'em, there's $20 waitin' for ya. The trick is to grab the System Configuration Guide for the N2000 OS 2.0. The N1000 and N2000 series run the same OS, and thankfully the syntax has stayed the same from 2.0 to 3.0.
  • The System Configuration Guide for the 2.0 OS is a gem. Its crystal clear and logical. If you have a firm grasp ofSLB fundamentals, this is all you need. To be honest I can't say enough about this guide. Compared to the steaming piles of cow-manure that are the WebOS guides, the N2000 guide really blows ya out of the doors. Nothing stinky about it…even the English is crisp…smells of starch. ;-)
  • Alright…I still can't shut up about the guides. Thanks Sun for not making us go to gold-plated classes to configure this thing. Nortel…well you kind of suck about that…but that was your intention no?
  • Inside of 12 hours I had the guide read, my development servers configured for the test, and the web switch configured and running properly. That's pretty incredible to me. It took weeks of pain and suffering to get the first Alteons up over 2 years ago. Considering that this was a much more complicated setup than that one two years ago, its pretty amazing. Its a testament to the guides. Keep in mind that in both cases (the Alteons 2 years ago, and the SunFish today), there was zero help or training involved outside of the printed word.
    • Test Config:
      • 1 SunFish load balancer.
      • 1 SunFire X4100 sliced into 4 Solaris Zones.
      • 2 of the Zones are tagged to VLAN 4001 on e1000g2 (Intel NIC 2).
      • 2 of the Zones are tagged to VLAN 4002 on e1000g3 (Intel NIC 3).
      • 2 vSwitches…one for the web servers on VLAN 4001, and one for the mail servers on VLAN 4002.
      • 2 VIPs on the shared vRouter which connects to the “Internet” and the two backend vSwitches.

  • Its kind of strange and cool to design an entire web switching infrastructure virtually. If you've ever set up a complicated virtual net in VMware, it's similar…but oh so much more seductive.
  • The two vSwitches are abso-positively separated. Despite having their VIPs on the same shared vRouter, neither the mail vSwitch nor the web vSwitch (or the real servers on them) can talk to each other. The wall between the vSwitches is steel-belted and brass riveted. That alone will keep my brain running with possibilities tonight!
  • The SunFish WebUI, while it blows the doors off Alteon's WebUI and Java client (AlteonEMS) for configuring the web switch, needs some work to be as good as the AlteonEMS for day-to-day monitoring and ops. One feature I'd really like to see grafted into the SunFish CLI and WebUI is the Alteon's conception of “Apply”. Whenever any absent minded bloke (like myself) changes the configuration on an Alteon, those changes don't become active until you type apply. One of the cool side-effects of this paradigm is that you can darn near reconfigure an entire Alteon while its running, and have that baby instantly acquire all the facets of the new config when you type apply. Without apply you'd have to take the web switch completely out of battery to do a similarly drastic config change. It also means that in a mission-critical environment, you don't have to sweat bullets about typos taking down an SLB group or the switch with a fat fingered command.
  • Also, the AlteonEMS has far superior heads-up displays of the health and status of all your load-balancing groups. You can find the displays in the SunFish WebUI, but they're scattered throughout the UI. There is no location you can see all of your groups, real servers and virtual servers on a single pane of glass like you can on the AlteonEMS. The AlteonEMS's single pane of glass also color-codes the individual elements (red, yellow or green) so you can instantly see whats dead and dying. If I'm wrong about this, PLEASE tell me. Its a feature we're going to miss.
  • Unfortunately, the SunFish also inherits the Alteon's method of doing HA. We were really hoping to abandon VRRP and an SLB-hacked version of it on our web switches. For whatever reason, the SunFish does things the same way. While I'm sure there's some esoteric config out there pairing a web switch with a normal router using VRRP…is it really worth the hassle? I'd like to see a truly custom HA protocol where the two web switches merge as one…with stateful failover, and auto-sync of configuration between the units. Many firewalls now have this feature…lets bring it on over eh? VRRP is great for routers. End of story.

That's all the blabbering there is for the moment.Thus far, the SunFish is an incredible piece of engineering. I'm not quite ready to call it the JSF meets C17 of load-balancers, but I can hear the after burners warming up…

Tomorrow is dedicated to redundancy and SSL-offload! I'm stoked!

Technorati Tags: ,