Salvation from Intel: RISC

One good thing to come out of the disaster that’s Intel’s security vulnerabilities - Spectre and Meltdown - is the renewed interest in more modern RISC processors. The MIPS, the PowerPC, the Berkeley RISC-V, and the ARM processors without speculative execution, are all getting new attention as a solution to the massively complex Intel processors that drive all of today’s cloud.

But how to understand what the word RISC means? Can the average information technology person understand the difference in CISC and RISC? You can. It’s really that simple to understand.

We bring a deep understanding to the engineering that goes into our Coraid EtherDrive SAN System, which gives us insight into these issues. In this post I’ll explain a central feature of RISC, hopefully in a way that anyone who can install Linux or Windows can understand.

The central feature of RISC architecture is its use of registers. The first processor with the central feature of RISC was the Control Data 6600, first developed in 1965 and designed by the father of supercomputers, Seymour Cray. Before the 6600, machines had one or more accumulators, a part of the processor that kept the data while it made calculations on it.

In all Von Neuman computers, which is to say all modern computers, the central feature is that programs and data are in a “large store”, which today we call “memory.” You know it as DIMMs. These memories were organized into “words” of 36, 18, or 16 bits.

The data and the program are both in memory. The processor reads one instruction and decodes what to do based on the value of that instruction. Usually the instruction needs data which it gets from memory. Subsequent instructions might make more calculations. When finished, the result is stored back into memory.

In our modern machines, this cycle is repeated over and over again, billions of times a second.

In the original computers, there was a single register in the CPU called the “accumulator.” It was called that because it mostly accumulated a value from a sequence of instructions. In fact, programmers didn’t think about “loading” the accumulator - they thought of clearing it and adding a value. You will find the mnemonic CLA in the instruction set of the 1958 IBM 709, for example.

Some computers also had what were called “B” registers. These registers could be used by instructions to specify locations in memory for values to be used in the operation. For example, one could specify that a B register held an address of a word in memory that was to be multiplied to the accumulator. For C programmers, think pointer registers.

As computer construction transitioned from vacuum tubes to less expensive discrete transistors, more complex instructions added more addressing modes. One could specify, for example, a constant offset to be added to the valid address in a B register to create what was called the “effective address.” The effective address specifed which word to use in the instruction.

As computers of the late 1950s and early 1960s grew ever more complex, the number of B registers grew. The Control Data Corporation 1604 had six B registers. They also grew more accumulators, which were still separate from the B registers.

In fact, the accumulators and B registers were two different sizes. Most accumulators of those early machines were 36 bits long and the most common size of the B register was 18 bits. The above mentioned CDC 1604 had 48 bit accumulators (two, in fact, to hold the results of multiplication) and 15 bit B registers.

Imagine having 48 bit numbers. The number of bits in the B registers reflects the size of memory. Fifteen bits were used in the CDC 1604 because the machine could only have 32K words of memory. That’s right, only 32,768 48-bit words of memory.

Enter that genius midwesterner, Seymour Cray.

Born in Chippewa Falls, Wisconsin in 1925 and educated at the University of Minnesota, Cray joined one of the only two companies designing computers in 1951, Engineering Research Associates. The other company was Univac.

After first helping work on vacuum tube computers and then transistor computers, he left ERA in 1957 to cofound Control Data Corporation (CDC). His first machine at CDC was the 1604, which was, as he later described it, a conventional machine of its day.

CDC’s customers needed ever faster machines. They needed as many computations per second as they could get. Seymour gave a lot of thought to how to make a much faster machine.

The results were ground breaking. It was like nothing that had been before it. It had four large chassis arranged in an X pattern. Instead of the single arithmetic logical unit, it had 10! It could execute more than just a single instruction at a time.

And it didn’t have any instructions that operated from memory.

That’s right. No instructions adding to the accumulator, or saving the results. Not only that, there were no memory instructions at all. No load. No store. When I first saw the instruction set I had to ask, how could I get my data into the CPU?

What the 6600 had were eight accumulators, X0 thru X7, along with eight B registers, B0 thru B7 and eight A registers, also known as address registers. But these B registers were not the B registers of the computers that came before it. There were just places a program could do simple math, like loop counters and the such.

The real magic happened when using the eight A registers, A0 thru A7. These were truly revolutionary. How they were used was so new, it took a bit for me to realize what was going on.

To load a value from memory into, say X1, I had to load that address of the value into the A1 register. Once I did that, the value of the word at the location contained in A1 would be fetched and stored in the X1 register. Move a value into A2 and the value at that memory address would appear in accumulator X2. Magic!

To store data once the calculations were done was done in a similar manner. Loading a memory address in A6 or A7 would store the value in X6 or X7 would be stored at that address.

This was quite a radical design. Why did he do it? Why get rid of all the loads and stores with the use of index registers?

The answer was that in order to go fast, one had to separate the memory operations from the arithmetic and logic operations. You needed to put the value in registers and operations on them there, saving them when you’re finished.

Cray realized that he could overlap the execution of fetching instructions, fetching data for the X registers, doing the computation, and storing the results, all in parallel, only if he broke the “operate from memory” philosophy that had been used in all other machines.

This is exactly how twenty years later, the creators of the RISC processors viewed things.

When Moore’s Law made it affordable to put more than a simple processor on a single die of silicon, folks from Stanford and Berkeley designed the first RISC chips. (Before that, John Cocke had built an experimental minicomputer that did the same.)

The modern RISC architects readily acknowledge their debt to Cray. They have load and store instructions that add a value from a register to a constant to get an effictive address. Once in a register, the other instructions can do the “mathy” things. This allowed them to keep a full pipeline of computation steps.

James Thornton, one of the people who helped Cray on the CDC 6600, wrote a great book on how the machine worked titled Design of a Computer: The Control Data 6600 The main photo in this post is the cover of his book.

To go fast, Cray had to be as simple as possible. Simplicity was not an optional feature, a nice-to-have. It was essential to his being able to separate the memory access from computation and get the performance improvement he needed.

Likewise, today we need to get back to simple in order to go fast. Go fast, safely, that is.

Salvation from Intel: RISC

Enter that genius midwesterner, Seymour Cray.

About the Author