How to Think About Virtual Machines Part 5

Posts in this series:

Part 5

IDEAS ARE AMAZING things. One minute, they don’t exist in any real sense of the word. The next, they are permanent monuments of thought never to be unlearned, never unseen. Before an idea’s arrival in someone’s mind, there is no possibility of even a hint of its existence, and after someone has thought of it, how could it not be the most obvious thing in the world. Intuitively obvious to the most casual observer.

The arrival of the virtual machine (VM) is very much like that. And the beginning of the birth of the idea of VM starts in an unlikely place.

Trouble at the Top of IBM

There was turmoil in the walnut lined offices of the corporate building of IBM in Manhattan. Vin Learson was upset. Bob Evans was upset. Even Tom Watson, the old man’s son, who had taken over as CEO only a few years ago, was upset.

It was 1964, and IBM had earlier in the year announced their newest product line of machines, the System/360. The April announcement had included six models, the 30, 40, 50, 60, 62, and 70. The machines were still in the process of being designed and debugged, and the software was just beginning to turn into the world first really large software disaster. While the general response from the customers of existing machines was positive, a response from a large educational institution, a very influential one at that, had not been very encouraging.

The customer was a prickly bunch of researchers with pretty deep pockets, the Massachusetts Institute of Technology. On this hot late summer day, they had just announced that for their big new project, Multics, they were going with a machine from General Electric, not a machine from IBM. Interestingly, both OS/360 and Multics would be parallel versions of a disaster we now call second systems effect. But at the time each was supposed to be the next great leap forward toward a utility of computing, OS/360 for batch and Multics for timesharing.

With timesharing, MIT had come up with a very different way of using the computer. While almost all of IBM’s customers were operating their machines in a way that was merely a slightly refined version of the way they used their punch card operations of prior years - punching data, read and transferred the data to tape, processing it on large mainframes writing the results onto more tapes, or sending it to a printer - an MIT researcher named John McCarthy had thought up a way to use the giant brains. In his vision the computer and new software gave a gaggle of users, all sitting at the printing keyboards of printing teletypes, the illusion of having the machine all to themselves. The users used the terminals to create, compile, run software in a very intimate, personal way. And foreign to IBM.

The idea had been brought into reality at MIT as the Compatible Timesharing Service, CTSS, by Fernando "Corby" Corbato and his merry band of systems programmers. The system was simple in concept. Like the new batch operating systems that would run multiple jobs at the same time, switching from one to the other as each was delayed waiting for I/O from the much slower tape drive operations and other I/O devices, a timesharing system would use the tick of the computer’s clock to interrupt a running program and switch to a previously paused program. Switching the single processor between the different programs was so fast that, to the users, it looked like a dozen machine, each user getting his own.

The positive affect on development of software was enormous. It was immediately habit forming. The biggest single thing it did was to reduce the cycle time from idea to testing of code. With timesharing, it was possible to think, edit, compile, and test your program dozens of times in a single hour. Since, as we would later learn, software is iteratively grown, not written, nor engineered, this rapid iteration was a huge boon to productivity, far superior the coding forms and keypunches used in batch systems.

IBM had been MIT’s machine of choice for CTSS. A model 7094 36-bit number cruncher had been modified to have an extra bank of 32K core memory, that would hold the operating system. Called the B memory, it was where CTSS ran user programs, while the control program was safe in the A memory, inaccessible from the user program running in the B memory.

The mechanism CTSS used for timesharing was simple. The 7094 had an interval timer installed which would fire an interrupt every quanta seconds, which was something like 1/60th of a second. A user program running in the B memory would be interrupted and the machine would switch its mode of operating to the control program in the A memory. The control program would decide if there was another program waiting with a higher priority than the interrupted one, and if there was one, it would switch from the current program to higher priority one.

There was a performance problem with CTSS. Two user programs could not share the B memory; only one could be in the memory at a time. The user programs had to be rolled out to disk and the new program rolled in every time you switched from one to another. This was because there was no way to protect each program running in user memory from one another. So the old process would be moved out, and the new process read in and restarted where it had been interrupted earlier.

Obviously this was less efficient than having multiple user programs in the B memory at the same time and merely switching between them. It took time to write the old program to disk and read in the next. This was clearly an area where a new computer architecture could speed things up a great deal. If, like the batch jobs, each process could be protected from each other, then we could fit as many as could be into core at one time and just switch between them, rolling them out and end as a secondary level of scheduling.

In 1963, MIT had been awarded a $3M grant to build a bigger version of CTSS. The much more ambitious project aimed to be nothing less than a campus wide information utility. It was called, Multiplexed Information and Computing Service, MULTICS, which would lead directly to the Unix operating system. (But that’s another story.) For Multics, MIT wanted a new machine.

Enter the IBM System/360

MIT had heard about the System/360 before most anyone else did. IBM had visited MIT, describing the new architecture, itself a new word for how the computer looked to software, in great detail. But the Multics folks saw a performance problem. In a batch system, jobs were waiting to run in the form of what in OS/360 speak is call a load module, a relocatable binary image of the program. When OS/360 started a new job, memory would be allocated for it and the new job relocated to that location. All the jobs addresses were adjusted to use the new memory locations. Ten different jobs would be loaded at ten different memory locations, all in the same memory space. To protect each job from one another, the memory system has the concept of a key, a small number that uniquely identified each job from the other. The processor would have a current key value for while running a given job. If it tried to access memory not in its set of pages, a system trap would occur and the job would come to an abnormal end.

But this arrangement was a problem for the timesharing system designers. The timesharing system rolls programs out to disk and back again. The exact location they will be rolled back to isn’t known before hand. Each program needs to be fooled into thinking it is running from the same location, no matter where it really is in memory. This implies some sort of Dynamic Address Translation, the translating of a virtual or logical address the program thinks it’s running at, to some different and arbitrary physical address where the instructions are really located.

It needed something like the paging system used on the Cambridge Atlas machine with its virtual store, in use for a couple of years at that point.

The IBM representatives were reluctant to commit to, or even lead the MIT folks into thinking it might, make an architectural change to the ramping up System/360 work. Such changes to the architecture would mean a great deal of turmoil in a development process that was implementing the architecture in six completely different systems, hiding the real hardware differences with microcode.

Plus, IBM’s customers were punch card folks. They were, only a dozen years before, had been users of card sorters, multiplies, and tabulators, all very much batch operations. Timesharing was very new and very strange. No one in those walnut lined offices had ever used a computer. The idea of having a large clunky teletype, a device clerks used to send Telex messages around the company, gracing the modern office decor, clattering away at 10 characters a second, when the executive doesn’t even know how to type, made it very hard for them to see any value in timesharing. Like DEC’s Ken Olsen a decade later not understanding why anyone would want a computer instead of a terminal on their desk, it would have been a miracle if any of the executive would have seen the value of timesharing.

So they said “no” to MIT’s request to add DAT to the 360. But now they had lost business to one of their competitor. Before, they had wanted to place their machines in places like MIT so as the students would leave, they would take with them the idea of using IBM machines, subtly installed in their minds.

Hearing the news of MITs choice of General Electric as the computer vendor for Multics, Watson looked at Learson and said, “Maybe we should look into this timesharing.”

Next week, it turns out there was a group in already in Boston who also had an interest in timesharing. And they were part of IBM.

How to Think About Virtual Machines Part 5

Trouble at the Top of IBM

Enter the IBM System/360

About the Author