“Do we need another storage protocol?”
Those were the words spoken by the CTO of a large storage firm to a prospective Coraid customer. The CTO had taken the prospect out to dinner to convince him not to buy Coraid.
It didn’t work.
We know about the exchange because the customer told us as he placed his rather large order. What the CTO questioned about was the proven and driving technology behind our Coraid EtherDrive SAN System, the ATA-over-Ethernet storage protocol.
ATA-over-Ethernet has a number of fans. But why did I invent a new storage protocol? Let me explain.
The need
In late 1999, I started thinking about what to work on next, and I realized that the new machine room had changed. Instead of a few larger machines, we now had a lot of Intel x86-based servers. Instead of operating systems with names like UNIX, VMS, OS/400, AOS, Primeos, and the very, very rare ITOS/MSOS on the CDC Cyber 18/20, all these tiny one and two rack unit boxes were all Windows or Linux.
Each of these boxes had a couple of disk drives bolted into them. Putting the disk drive in the box with the memory and processor was an anomaly, a historical consequence. Mainframes and minicomputers all had disk drives separate from the processor. One could add as many disk controllers and drives as one wanted, up to some limit.
But these Windows and Linux boxes were different. The disks were captured inside. Simply, these boxes were just IBM PC/AT clones that had been stuck into a chassis that you could then put into a rack. They’re still like that today. The most common motherboard format is called “ATX.” That stands for Advanced Technology eXtended, and was used in desktops back when there were desktops. The ATX form factors are size variations based on the original IBM PC/AT.
More compute power when it’s needed,
more disks when they’re needed.
So, I thought it was time to liberate the disk drive from its sheet metal prison. I wanted to be able to add CPU servers when I needed more of that, and add more disk space, to be used on any of my servers, when I needed more disk space. There is not a right ratio of compute to data storage. It all depends on what you’re doing and where you’re in the life cycle of your applications.
In short, I wanted to scale out and not just up server with its disks would be Ethernet.
Ethernet is an amazing bit of technology, and using it for storage is a no-brainer. Over my 40 year career I’ve seen it change from a 10 Mb/s single copper coaxial cable with a cross section about the size of a nickle, to the thin light blue fiber carrying 10 Gb/s to my Plan 9 terminal. Yet, from a protocol point of view, it’s still pretty much the same as it always was. The packet format is exactly the same.
But first, I checked out the alternatives. When I thought about how to get the disk drives out of the boxes, it was only natural that I looked at the existing technologies and see if I could just use those.
The other storage technologies (Fibre Channel, SCSI, and SAS) had either volume limitations or architectural limitations. SCSI had a maximum of sixteen devices on a single cable which was limited to about six meters in length . I wanted essentially unlimited devices and cable lengths.
The cable for the serial version of SCSI, called SAS, could be only slightly longer, ten meters, but it did let you put more devices in a network. SATA was even worse, with only a one meter cable, and only a kludge to let you talk to more than a signal disk. SATA systems are all point-to-point. But the switching functions for SAS implemented in expanders was complex, requiring connections to be setup and then torn down for each set of operations. These had a requirement for reliable delivery at the lowest level, which works against their use in an expanded configuration. It made things very complex.
Fibre Channel (FC), with its point-to-point, switched, or token ring arbitrated loop, certainly expanded the options, but at a very high expense. At the time, putting a $3,000 board in a $1,000 server seemed like something to avoid. The technology was really more appropriate to mainframe-like systems, like the large Sun Microsystem boxes. FC was like HIPPI SCSI meets IBM’s ESCON fibre channels. Even the name evokes the bus and tag IBM channels.
Neither was the complexity of FC attractive. In large enterprise machine rooms, there were these highly paid experts who spent their time doing not much else but tweaking all the settings on the FC equipment, settings like “queue depth.” This is how many outstanding messages can be sent from the host to the storage array. It’s kind of like having to manually set the round-trip time on a TCP connection. There are multi-million dollar installations of FC equipment where the customer doesn’t even have a key to the equipment. The vendor does everything for them. I wanted something simpler. And cheaper.
The costs were not in line with what I wanted to do. The nice thing about Ethernet at the time, and is still true today, is that since it sells in higher volume, its bit rates grow at a faster rate than FC’s, and it closes drops faster as well.
Time has proven me right, with Ethernet now at 400 Gb. The specification for 200 and 400 GbE was released in December of 2017. Coraid has shipped somewhere between 10,000 and 15,000 SAN systems based on our ATA-over-Ethernet block storage protocol. We’re still shipping it, now that my new company has our original software and trademarks.
What about iSCSI? I’ll talk more about that in a later blog post, but when I first started working on ATA-over-Ethernet, there wasn’t any iSCSI. When I first heard of the draft spec, I thought “Great! I’ll just use that.” But when I looked at it, I saw that it used TCP/IP. It was an Internet Engineering Task Force (IETF) specification because it was meant to run SCSI over long distances, not for use in the same machine room.
That’s not how iSCSI is used today. No one uses block storage across the internet.
The other problem was that iSCSI is very heavy weight. The original draft justified this by saying that all storage arrays were very large systems that could handle a complex stack. I wasn’t looking for that. I was looking for a small shelf of disks with multiple connections to a network that would be light weight. One could imagine disks having an RJ45 connector right on the controller board instead of the current SATA/SAS connectors.
That meant that I didn’t want to use TCP, or even IP, with their addresses and extra overhead. So, in the end, I saw we needed something simpler. We needed the Unix of Ethernet block storage SAN, compared to the mainframe iSCSI stuff.
Next time I’ll explain more about the thinking that went into the protocol and how someone using it should think about it. I’ll talk more about the protocol itself. At the time, many IT folks had no idea one could use Ethernet without using TCP.
With all the horsepower of the CTO of a very large storage company, one that certainly had a lot of iSCSI, FC, and NFS products, the customer could still see the need for simpler and cheaper SAN systems.
And we do too.