Audrey was lucky when she recruited Roger Cheung to be vice-president of engineering. He has that rare gift of being able to lead in both a large company and a small startup. This is a feat because each situation requires a completely different skill set.
Large companies are all about the processes needed to keep parallel activities from creating total chaos. Small startups, on the other hand, have to be cautious with resources, both for what they’re attempting to provide and how they go about doing it. Many startups founded by engineers from large companies have fallen right on their faces when they tried to do things in a big-box company way.
Roger was that rare manager who worked in the biggest of companies, AT&T, yet was still an engineer at heart. He knew how to get things done with a small number of highly creative people. And in the case of Adaptive, that’s certainly what he had.
"So? What’re you going to do?" Roger asked. I’d just stormed into his cubicle and complained that using the OS Mach would be a huge mistake. I knew that the learning curve for everyone to come up to speed on how to use it was going to kill the product.
“I don’t know yet,” I said.
“Well, you’d better be figuring it out.”
As I left, I was both comforted and terrified. Roger trusted me enough to leave the choice up to me, but I had the responsibility of coming up with a plan that wouldn’t sink the company.
Roger had the view that if you gave people the responsibility to get something done, you also have to give them the authority to make decisions and then the authority to implement them as they see fit. In my two-year tenure at Adaptive, that philosophy would come up again.
I left the large room full of cubicles, walked past a fridge filled with small bottles of flavored soda water and beyond the piano room, through the executive suites (a slightly smaller room full of cubicles) and into my 1989 Honda Civic.
As I drove the few miles down 101 from Redwood City towards Betsy and my temporary housing in Mountain View, a non-stop stream of P-3 Orion submarine patrol planes were flying the pattern around Moffett Field. At first, I thought they were training with practice drills, but I’d later learn that I’d witnessed the end of a 12-hour patrol shift. The sounds of four-engine turboprops became an enjoyable part of my routine trips home for lunch.
I’ve spent nearly 30 years navigating Silicon Valley and I still haven’t figured out the traffic on the Bay Shore Freeway, also known as California 101. Some days it flowed freely. Other days, it was jammed. Atlanta was simple. One too many cars on the wrong road and 80 mph traffic became a parking lot. All I had to do was avoid those roads in the morning rush hour and from about 4:00 to 7:00 p.m. when folks headed home.
Silicon Valley is different. People get to work and go home at unpredictable times, which means the traffic jams are at odd times too. It’s all well past predictable. Well, I say it’s unpredictable, but that was the 1990s. They’ve fixed it since then; the number of cars on the roads are far past capacity so Atlanta-like traffic jams are guaranteed daily.
Back then, in the days when north San Jose was still covered in fruit trees and you might catch a glimpse of Bob Noyce out and about, the traffic was light but unpredictable.
In the springtime the temperature was mild and the sun was always out, and you almost didn’t mind getting caught in a jam. You could roll your windows down and enjoy the sunshine. Back in Athens, Georgia, we had seasons. Air conditioning was a must and although winters were mild, they were still winters. After the move, Betsy and I realized we didn’t need A/C at all. It even got into my head that I didn’t need a jacket in California. True enough, I never wore a coat while I lived there.
After my contemplative lunch, I pitched the idea of doing a Unix port and using TCP to a few of my co-workers, the folks who were going to be writing the control software for the Adaptive DS3/DS1 cross-connect system.
The team knew how software was developed in Silicon Valley in 1990. It entailed writing a book on how the system would be built, then defining modules and their interfaces, followed by coding of those modules and finally integrating them together. Schedules and pert charts were all the rage, and the development group was busy at work on this first magnum opus.
A few of my peers wrote the original Network Equipment Technology (NET) product out of which Adaptive spun. They were using a lot of what they already knew, and designing that into the 3/1. They designed multiple threads they called managers which they assumed could pass messages. I was pitching using TCP/IP to do that.
Today TCP is used to pass messages everywhere, but in 1990 this wasn’t true. Although its RFC is dated 1981, it would be another year before Tim Berners-Lee would put up the first website to use TCP for all the things we thought were too inefficient. Starting a TCP connection requires three packets traded back and forth to set up and acknowledge the initial sequence numbers.
HTTP and HTML use new TCP connections like people use paper towels. New request? New connection. An image on the page? New connection. End of the page? Close the connection. TCP connections were viewed as free.
To offset this high connection startup cost, the early web browsers would open eight TCP connections to a website with the assumption that all eight would be needed and that it was faster to open them in parallel. But the overall effect was to force all the other engineering, the servers, routers, and IP stacks, to figure out how to make things faster. And the result was that today, people think nothing of processes on the same box talking to each other.
But not in 1990 Redwood City. The developers were amazed when they discovered the messages between two TCP sockets on the same box would still go all the way down to the data link level, below the socket layer, below the TCP and IP layer, all the way to the loopback interface, and then make it all the way back up the stack again. It all just seemed to them to be way too expensive in machine cycles.
So, what to do?
Milan, the due diligence fellow who had visited my previous employer and had asked all the really good questions, had a cubicle right across the aisle. He, as had several others, had been on the team that developed the bread and butter product for NET. Their work had helped fund this startup. He was Yugoslavian and spoke with a British accent. He had made quite a lot from his early NET stock and really enjoyed buying and fixing up houses in San Francisco. He drove down from the city every day.
He was most famous in the company for a bit of hardware engineering. The older NET product used an operating system designed by someone who tried to make a go of it as a startup selling operating systems for embedded systems. Several of these companies got started before a handful became dominant. Years later, when NET was going public, they discovered during their pre-IPO due diligence that they didn’t have the license to the OS. They found the OS’s creator, who had long since come to his senses and was teaching at a university, and nervously asked him for a license. I think he signed over the rights for free. He was not greedy.
His operating system expected the hardware to have something called an MMU, or Memory Management Unit. MMU hardware translates the memory addresses a program sees into the actual addresses the memory system sees. This does two things. First, it protects processes from each other. A process can only access memory it’s allowed to. Second, it gives process the idea that it’s the only process on the system, so it can be assembled to run at a fixed memory location.
Early mainframes, with one exception I’ll mention in a moment, had no MMU. At first they ran only one program at a time, but when you combine the fact that the machines were worth millions and the tape drives, card readers, and line printers were very slow making the mainframe program spend most of its time waiting on the peripherals, people realized they could load more than one program in the machine at one time. This way one program (or job) could be running while the others were waiting on the line printer or card reader.
When they wanted to run a program, they found a place in memory large enough to put it, and called a part of the operating system called a relocating loader. The software development tools create object files called load modules. It was this that was loaded into memory. These load modules were ready to run except for the very last step of relocating the code to a particular place in memory to run. The closest approximation we have today is the “dot-oh” files (.o) created by modern compilers.
So when you wanted to run a program, the job entry subsystem found your load module, called the relocating loader, and fed the binary to it. It read, relocated, and stored the program into its spot, then attached it as a task in the operating system. So, early IBM mainframe operating systems, like MVT and MFT, ran all the programs in the same memory and each were free to write over one another.
I said most operating systems did this. There was one exception.
At the same time IBM was evolving what would become MVS, Seymour Cray and James Thornton designed into their CDC 6600 hardware ways to relocate and protect each job. They designed two registers into the CPU, a relocation address (RA), and a field length register (FL).
When a program was to be executed, the operating system found enough space in physical memory for it, and loaded the program there. While the program was executing, the base address of the location would be loaded into the RA register, and the size of the program would be loaded into FL.
All programs think they are running starting from memory location zero, finding a block of memory large enough for the program, reading the program into that memory, and setting the relocation register to that address. This address would be added to any memory address the program used to fetch data or instructions, or store data back into memory. The program said “put this at location 42” and the machine would add 200 to that. The field length register would cause any address greater than its contents to stop execution and tell the operating system that this program is illegal.
So, in 1964, Cray and Thornton had an early MMU.
When IBM finally got around to implementing an MMU on some of their mainframes in 1967, they broke a lot of programs that had inadvertently spilled their allotted space. But this was a good thing, in that it made it possible to spot bugs in the program that had been elusive before. Today, we make sure the location ‘zero’ can never be used to check for bugs in just the same way. It’s easy to use a nil pointer and not have the problem manifest itself until much later in the program.
The IBM Dynamic Address Translation, a phrase that I really like and used almost as-is in another context, made a major contribution to the art, in that it broke the translation into two steps. Instead of just a base and limit register, it had two tiers of translation, the segment table and the page table. The top-most address bits indexed an entry into a table of pointers to page tables. Each page table had a number of entries to base addresses of a chunk of memory, in the case of the 360/67, 4,096 bytes.
The virtual address that the processor saw was converted into a physical address by taking some of the high bits of the virtual address and using it to look up in the segment table the address of one of the page tables. Some middle bits in the virtual address were used as an index to find a page table entry that held the physical address of a page. The 12 least significant bits were used to specify a byte in that page.
This had many advantages. First, one didn’t have to have a contiguous chunk of memory to load the program. All you needed was enough pages. They could be anywhere in memory. Second, a program could grow without having to copy it. For a program to grow in the RA/FL scheme, you had to find a space to hold the entire new program size, move the whole program, free the old space, and update the RA and FL. With the IBM scheme, all you had to do was add more pages to the page table.
Today, almost all MMUs work this way.
The cycle of invention in the microcomputer age worked the same way it had in the mainframe era. In the mainframe days, each feature was implemented by a collection of modules wired together, and each module had discrete components, individual transistors, resistors, capacitors, and diodes, all taking space and power. As a result, the pressure to keep it all small and simple was tremendous.
In the microcomputer, it was the die size that put pressure on designs. All silicon economics are based on yield from the wafers that devices are cut from. The capital costs and operational costs dominate a silicon foundry’s economics. The percentage of good devices from the wafer determines the cost of a single device.
There will be a number of imperfections on each wafer, places where the cooking in of boron or phosphorus were a bit too strong or a bit too shallow, places where wires created by depositing metal over the whole wafer and etching it off did their job a bit too well and removed some of the wire, or didn’t work well enough and left shorts. A single flaw ruins the entire device, which leads to the issue of final device sizes.
So, if you had a single device for the whole wafer, you were guaranteed not to get a single working device. There are always flaws. If you had a gazillion very tiny devices you would get very high yields, because only the tiny spot would be bad. So there’s a limit to the die size of the device before yields, and profits, become inefficient.
Because of that, most early 32-bit microprocessors lacked an MMU. There wasn’t room in the transistor budget of a given part for it. In this case, there wasn’t one on the Motorola 68000. It was expected you either used it in an embedded system running a single binary image, or you would add your own MMU with TTL chips on the board.
The hardware they made for NET’s success had a 68000, and the operating system they chose required an MMU. What would the MMU be like? How would it be designed? Who would design it?
Milan decided to tackle the problem. I never learned the exact design of the MMU, but I seem to recall that it was a clever arrangement of some fast static RAM chips, that only added a clock or two to any memory cycle. It was called the Milan Momirov Marvelous Memory Management Unit, or MMMMMU! It was similar to the IBM scheme.
I explain all this because what comes next has to do with MMUs as much as with networking. At a loss for an idea of what message passing scheme to use, I had an idea.
“So, Milan,” I asked in a flash of insight. “What message passing technique did the software in the NET system use?”