Silicon Valley 90’s Style: MMU Message Magic

Other posts in this series:
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8
Part 9
Part 10

It was cold in San Francisco, especially on top of one of the downtown skyscrapers in the financial district. Audrey and an engineer were there troubleshooting a communication failure for a client. Why the top of the building? Networks used a lot of line-of-sight radio in 1982, and high buildings were perfect places to avoid putting up a tower. Usually you saw two 48-inch dishes, one over the other, because as the sun rises and sets it can bend the radio signal. The lower disk catches the signal.

There’s other equipment up there too, usually things like muxes, to pull some of the signal apart and put it back together.

On this windy, cold morning in 1982, with all of San Francisco in view, the engineer let out a loud profanity.

“What is it?” Audrey asked. “Did you find it?”

“Yeah, the piece of junk!” he erupted as he wiggled the boards out, cleaning contacts and looking them over for faults.

“Not their best work.”

Audrey had been thinking a lot about the telecommunication business over the past few days. She knew there would be big changes and with them big opportunities. Only a few days prior, it was announced that the monopoly that was the Bell Systems, AT&T, was going to be broken into smaller companies, and that, for the first time in U. S. history, other makers of equipment would be able to compete with Western Electric, AT&T’s manufacturing arm.

Audrey saw that large companies would be looking for better service, more flexibility, and, just maybe, better boxes like the one that was giving her fits on top of a San Francisco skyscraper.

“Can you make a better one?” she asked.

“With one arm tied behind my back in my sleep!” was the answer.

The result was a new company, Network Equipment Technology.

Or that’s the way I remember hearing the story in 1990 as I helped Audrey, and Charlie Giancarlo, the two people I met with at DTS back in Georgia, do it again with Adaptive.

I was the kernel guy, the mountain man, something Audrey referred to me as because when she met me I was living in rural Georgia in a two room cabin with a tin roof. She had been somewhat worried about me finding a place to live in Silicon Valley. We landed in an apartment on University Ave, Palo Alto. We’re flexible.

I was helping a team made up of a few of the original NET developers, who had experience doing this. Others on the team were young new hires.

I had talked them out of using Mach from Mt Xinu. I now was entrusted with getting the developers a kernel running on the target platform on which they could run their embedded system. They wanted a Unix system of some sort. That’s why they hired me.

But I had a problem. They needed an interprocess communication mechanism. Programs running in Unix are fooled by the operating system into thinking they are the only ones running on the machine. This is the essence of timesharing. The developers would write different programs, each to manage a different aspect of the function of the piece of equipment. Each process had its own memory that was separate from all the other processes running.

The challenge was, how to share information but also go really, really fast. Some of the messages between processes had to go very quickly. Some had to do with equipment failing and switching to the standby boards, or switching to a different circuit.

The appliance was called the Adaptive SONET Transmission Manager or STM/18. The customer fed 1.544 Mbps T1’s into the STM/18, and connected it to a number of 45 Mbps T3s, and a large company could save a lot on the costs of links; a few T3s are much cheaper than a bunch of T1s.

The cool thing about the STM/18 was that it routed the T1s through the network created by the STM/18 nodes. There were routers. There were circuit switches. Connect this T1 in San Fran to that T1 in Whippany over a low latency path, and the STM/18s would all work together to figure out a path, all done automatically.

So, not only must the IPC mechanism that I had to invent need to efficiently pass messages between processes on the same box, it would need to pass messages to processes on different boxes.

As I said earlier, I pitched just using TCP, but I had to think of something else.

In the meantime, I had a Unix port to do. We obtained a license for the Unix source code, which, at the time, was a mere $28,000 for the version I wanted. For the latest version, the newly minted System Vr1, the price was $45,000. I didn’t want it. I had my own version of the Unix system that I always started from. At each new company I would take out the original source code and do a new port.

The $28,000 got me a version of 32V, which was the same version that Berkeley, only a few miles away, across the bay, used to create the now famous BSD version.

For Adaptive I used a version of the Portable C Compiler written by Steve Johnson at Bell Labs, and modified it to generate Motorola 68000 instructions by MIT. With MIT’s compiler, assembler and loader, I would take the Unix kernel code and re-port it to each new embedded system.

Adaptive had chosen the Motorola MVME147 VME bus card. The equipment has a six rack unit cage the various cards slipped into, and was connected to the switching boards in the shelf above.

The MVME147 was great. It had 4 MB of RAM, which had been the entire memory configuration of most DEC VAX’s only 10 years earlier, an AMD LANCE 10 MbE chip, and a Western Digital 33C93 SCSI controller. I had an entire Unix system on one board.

I bought a four-slot VME bucket, and modified a PC/AT power supply to wire to the bucket. The ’147 card went into the bucket (card cage), and it was wired to an old VT100 terminal I found lying around. I also talked the hardware guys out of an EPROM programmer, a device to burn erasable programmable read-only memories, and a UV eraser.

EPROMS were the ancestors of flash memory. They had little windows on the top of their ceramic package. You would put it into a socket on the programmers, download software over a serial port from my Sun 3/60 workstation, which the program would burn into the EPROM. You would pop the EPROM out of the programmer socket and insert it on the ’147 card and boot it.

When the software failed, you took it back out of the socket, put it into the UV eraser and 20 minutes later all the bits were back to being 1’s. The programmer just turned some of them to zeros.

I had a handful of these chips, so I kept a batch in the eraser all the time.

The first task was to get the serial chip to output something. I had, if I remember right, a single LED that I could turn on or off. I used it to debug the software to write characters out the serial chip. When that worked, I used the serial chip to debug things using printf().

The software was compiled in C on the Sun using the compiler tools from MIT that I had modified to run under SunOS. I wrote a program to drive the programmer over one of the serial ports on the Sun.

It was about that time that one of the Stanford PhDs popped into my cube to tell me something, took one look at the assembler I was using to get things started and started backing out.

“Is that assembler!?” he gasped.

“Yep. Can’t do Unix ports without it,” I said.

He forgot what he came in to say. I wondered what was so shocking to him. I still don’t know.

It was also about that time I wheeled my chair out of the cube and over to Milan’s cube and asked my best question of the half of the decade. “Milan, what did the NET product use for IPC?”

What I got was that there had been numbered ports. Each part of the software was assigned a well known number, and you could send a message to a port. These messages were small data structures that contained part of the work of creating virtual circuits thought the switch. Simple. Clean. My mind started swirling.

I scooted back into my cube and just thought. I don’t think I did anything the rest of that day.

Sometimes people ask me how I code. I learned from reading the code from the Bell Labs Unix folks, and from listening to them describe things. First, I think about the problem, what I’m trying to do. Then I just let my mind play over various ideas. I might, in the rare circumstance, draw something out on paper. You know, bits of rectangles, arrows and things. I try to break the problem into layers. Mostly I only want two layers: one layer of fundamental functions, and the layer above that uses them to do the thing I’m trying to do.

Then I start writing. I usually start with some data structure and read something into it. I use printf() to debug that part, and then grow the software until it does everything.

I once had lunch with an engineer who had worked for Seymour Cray. He said he would come in his Bermuda shorts with some circuit sketched on a piece of paper and say “build this.” All that I expected. Then the fellow said something that surprised me. He said that several times during the making of the machine, Seymour would come in and say forget all that you were doing. He was going to change everything. In those early days I still had a fundamental misunderstanding of how real innovation happens. You can’t always know where things will lead. You will get into a corner and have to back out and redo everything. For me, software is perfect. You can redo things and people will never know. I do it constantly.

It’s how black swans are made.

Now I was thinking about how to pass messages around. I was doing this while I was working on the Unix port. One key part of the design of a port is the design of the memory system. In 1990, paging was all the rage, and the MMU on the 68030, the processor on the ’147 card, was designed to allow paging. I wasn’t going to do it that way. It didn’t make any sense to page in an embedded system. Either all the application fits into memory or we get more memory, or make the application smaller. Appliances and general time sharing systems have different constraints.

In fact, in those days I liked to lay out virtual memory contiguously in physical memory. I knew how large the text segment should be, and once the program stabilized, the data segment pretty much fixed. So I laid the text (the instructions) in one segment, the data in another, and the stack in yet another. Each of these are contiguous in physical memory.

The 68030 had a great feature called “early exit.” Instead of putting the address of a page table in a segment table entry, you can put a physical address and set a bit that would create the page table entry on the fly. I didn’t even need page tables.

In the Adaptive design, I used 256 segments, each 16MB long. The MMU allowed for different page sizes. I chose 512 bytes, a convenient size. The segment entry specified a limit to the number of pages in the segment. The text and data segments were aligned to multiples of the page size. The stack segment grew downward, specified by a bit in that segment entry.

The kernel lived halfway down in memory, at 0x80000000. That has been a standard since at least the VAX. The MIPS processor has it hardwired into the architecture.

But the instructions, data, and stack only used three of the 128 segments I could use for user memory. That’s when it hit me.

I’d use a new segment for sending messages around. I’d create an inventory of message buffers, that can be up to 64 KB long. I added a new system call to allocate a buffer into the program’s address space. The return from the system call was a pointer into this new segment, and only that process had the message mapped in.

The application would allocate a message by calling smsgalloc() with the size of the message needed. The address in the message segment would be returned. The application would fill out the message and call the system call msgsend() which would be passed the pointer to the message and an address header which contained the port number to send the message to.

At that point the message would be queued using a file descriptor that could be used with the message receive system call. The smgrecv() took a file descriptor and a pointer to a message address header to know where the message came from, and mapped the message into the receiving process’s address space. No data was copied.

The file descriptors would be allocated by calling msgbind(), which would return a file descriptor used to receive messages. The Unix select() and streams functions worked with message files.

After explaining all that at the white board for the developers, and with a bit more detail, I turned to the group sitting at the conference table.

There was silence. Slowly they looked at each other. Then kind of smiled and nodded. It felt to them just like the system they had used on the NET equipment. They knew exactly how to use it. I knew how to build it. It was blazingly fast, even for the 20 MHz 68030, because it never copied memory, just mapped it in and out of the process’s address space. The code was only 589 lines.

My cubical was a pool of light in a room that was after hours dark. The sound of the small cooking fan I bolted to the top of the VME cage, and the Sun 3/60 create a pool of white noise to go with the light. Coding kernel stuff is a long process and leads to long and lonely nights. Suddenly I realized I wasn’t alone.

“I heard you came up with something,” said Charlie G who was just finishing his day.

“Yep. I finally figured out how to ask the right question.”

“Oh?” he asked.

“Yes. I asked how they did it on the NET box. Then I just gave them a version of that. My Dad has a saying. ’you gotta give ‘em what they want.’ ”

“Good saying. Night.”

And he was gone.

And I turned back to the SCSI device driver.

Silicon Valley 90’s Style: MMU Message Magic

About the Author