Intel Flubs Again

Posted on by Brantley Coile

The Wall Street Journal reports that Intel is telling motherboard vendors not to install its latest fixes to mitigate the Spectre and Meltdown exploits. (See my previous article about why Coraid EtherDrive Storage products are impervious to these exploits.)

I know a lot about Intel processors. I have used them since I developed the PIX FireWall in 1994, the first Network Address Translation appliance, and when I invented the LocalDirector web balancer. I was still using Intel processors when I created Coraid's EtherDrive Ethernet block storage SRX and VSX appliances. We still use Intel today.

Intel is great at doing the CMOS part of the chip production thing, but they don't to computer architecture very well.

I have never liked the Intel instruction set. To be fair, neither does Intel, from what Intelians have told me in the past. They have tried to switch to another instruction set several times over the years. I960 RISC in the 1980's for example, and, more recently, the itanium are just two efforts. But, they are stuck having to support these antique instructions that have been crusted over with additional complexity over the 40 year history of the architecture. (Coincidentally, the same number of years I have been doing technology.)

This bad instruction set is the root cause for the current Spectre and Meltdown disasters.

There are two kinds of computer instruction sets: complex instruction sets (CISC) and reduced instruction sets (RISC). The early computers were programmed in assembler, that is in machine code, directly using the instruction set of the computer. Computer designers naturally evolved instruction sets to make the job of programming in assembler easier.

CISC instruction set has complicated addressing modes which require as much or more execution as the instructions that use them. You can, for example, load a byte using two registers containing addresses, add a constant, and shifting one of the registers by one, two, or three bits to the left. (Multiplying them by 2, 4, or 8). Such as

ADD RAX, $1269(R8*8, R9).

All this created instruction sets with builtin speed bumps. Or, as we say of the old Cherokee PA-28 airplane, it has a builtin head wind.

The Intel instruction set is terrible even by CISC standards. The instruction set started back in 1978 as a 16 bit processor. Then in 1983 it got protected memory. A couple of years later the 16 bits was doubled to 32, and the CPU got paging memory hardware. This never ending cycle of redesigns secreted layers of complexity: XMM, SSE1 through SSE5, AVX, changing addressing modes. Then, the 64 bit addition added more complexity, even splitting the bits for the register designations between different bytes in the instruction format.

The results are a like Wilde's The Picture of Dorian Gray. The instruction set is a picture of Intel's sins.

By the way, for great CISC instruction sets, look at IBM's original System 360 instruction set and the DEC VAX, which was a bit too opulent for my taste, but was still much better than IA-32/IA-64 from Intel.

But all these are slower than RISC for a lot of reasons.

The main reason is that CISC doesn't pipeline well. The first person to realize the computer would go faster if the instructions were simpler was Seymour Cray, inventor of the super computer. From the mid 1960's until his death in 1996, Cray designed a string of computers that could be called RISC. There were no complicated addressing modes. He loaded values into registers, and then operated on the contents of these registers, saving the value back into memory only when finished with them.

His first super computer, the CDC 6600, was also the first super-scaler with ten functional units executing instructions in parallel. One could also easily program the machine in octal!

The name John Cocke has to be mentioned along with Cray in regard to inventing RISC. At IBM, Cocke realized that because almost all programs were no longer written in assembler but in higher level languages, the instruction set could be made simpler, allowing it to pipeline better. The compiler didn't care that two instructions were used where one had been used previously. Cocke built an experimental computer, the 801, in 1975, wrote a compiler for it, and demonstrated the benefit of RISC. (John was a fellow Southerner as well.)

In the early 1980's, VLSI techniques opened the possibility of putting up to 50,000 gates on a single die, and two RISC projects at Stanford and Berkeley developed a pair of chips. One, the MIPS, spun out and became a successful company. The architecture is still a favorite of mine.

The ARM folks, reading of the RISC project at Berkeley, learned how to design their own RISC processor to avoid the complexity of the Motorola 68000 processor. The result is the second most popular instruction set today.

But poor old Intel was stuck with their crusty complex instructions. Efforts to switch were all unsuccessful. So, they came up with a brilliant idea: add circuits to turn the CISC instructions into RISC instructions, just do it internal to the processor.

But by doing this they had to add a lot more pipeline stages, and adding a lot of stages to their pipeline meant that even the RISC micro-operations could stall. Instructions have to know, for example, if a memory fetch is allowed, or which branch needs to be taken. These all depend on information from other parts of the machine being available.

To avoid such slow downs, they used the technique of "speculative execution." The processor goes ahead and executes the instructions anyway, without the needed information, and throws away the results if they find the instructions should not have been executed, say, when they discover that a user program is fetching a byte from protected kernel memory.

This, it turns out, is too late. The side effect of loading the byte, and using its value for more fetches, is enough to leak the byte's value.

As an aside, long pipelines have gotten to be a fad even for ARM micro-architectures. The current ARM A-15 has 15 stages and does out-of-order speculative execution. ARM instructions are more streamlined than Intel. The real benefits of a large number of pipeline stages for a RISC processor, it seems to me, would be small. Maybe there was "pipeline stage envy" going around the chip design business for a while. Maybe they just wanted to have speculative execution and that added more stages to the pipeline. So now they are subject to the Spectre exploit too.

Whatever the case, it turns out speculative execution is a bad idea, as the Spectre and Meltdown exploits have shown. Maybe it's time to discard all the complexity and humbly return to simplicity. Simpler processors can be smaller, having fewer gates. Fewer gates means cheaper processing. One could put more of them on a die the same size as a Xeon and handle a lot of web requests in parallel.

One last comment. The winner of the most beautiful architecture has to be Niklaus Wirth's RISC-5 (not to be confused with Berkeley's open effort). Wirth designed the processor to be implemented on an inexpensive FPGA and run his equally elegant Oberon operating system. A description sufficient to implement his processor is all of two pages! Three pages if you count the description of the peripherals. See Wirth's RISC Architecture

So, crusty, old CISC caused longer pipelines. Longer pipelines begat speculative execution. Speculative execution caused one of the worst computer disasters in memory. Pun unintended.

Intel Warns Its Patches for chip Flaws Are Buggy

←Previous | Blog Archive | Next →